SEARCH KEYWORD -- DATA ENGINEERING



  The hidden risk of passing slice as function parameter

In Go's source code or other open source libraries, there are lots of cases where a slice pointer is passed to function instead of slice itself. This brings up a doubt why not passing slice directly as its internal is backed by an array pointer to point to underlying data? For example, in log package, the formatHeader function takes a parameter buf as type *[]byte instead of []byte. func (l *Logger) formatHeader(buf *[]byte, t time.Time, file string, line int) {} Let's understand the r...

   GOLANG,SLICE,SLICE POINTER     2020-12-13 06:11:14

  Format JSON data on Ubuntu

JSON now becomes a very popular data format because of its simplicity and light-weight. Nowadays many RESTful APIs will offer a choice of exchanging JSON data between the server and client. Sometimes the data may not be formatted and it cannot be easily read by human beings. It's frequently desired that the unformatted JSON data should be formatted before read. Today we will show a few ways to format JSON data on Ubuntu. Assume we have a json file test.json with below content. { "title": "Test"...

   RUBY,PYTHON,NODEJS,JSON,JQ,PERL,LINUX,UBUNTU,YAJL     2016-08-17 11:05:09

  What is cache penetration, cache breakdown and cache avalanche?

When designing and developing highly available system, cache is an very important consideration. It is useful to cache some frequently accessed data so that they can be accessed quickly and also cache can protect the downstream system like DB from being hit too often.  To provide better cache design in large systems, some problems may need to be considered first. In this post, we will talk about some frequently discussed cache problems and mitigation plans. Cache penetration Cache penetrati...

   SYSTEM DESIGN,CACHE PENETRATION,CACHE BREAKDOWN,CACHE AVALANCHE     2020-04-10 08:43:00

  JSON in JavaScript

When sending an AJAX request to the server, the response can have two formats : XMLHttpRequest.responseXML to access data with XML format and XMLHttpRequest.responseText to access data with string format. XML is the standard data transfer format, but one weakness is it's troublesome to parse and retrieve the data. JSON(JavaScript Object Notation) is a light weight data interchange format, we call it the JavaScript object representation. The advantage of using JSON as the data format is itself is...

   JSON,JavaScript     2013-05-04 23:25:57

  A plugin to update last_error in Delayed Job

delayed_job is a process based asynchronous task processing gem which can be ran at background. It will fork the specified number of processes to execute the tasks asynchronously. The task status is usually stored in the database so that it can be easily integrated into a Rails application where asynchronous job execution is desired. Normally when a job fails to execute or error occurs, it would save the error into the database with the column last_error. Ideally all these will be handled b...

   RUBY,RUBY ON RAILS,DELAYED JOB,LAST_ERROR     2017-11-18 13:05:49

  Video website in big data era

Big data initially means the large data set which is not able to be analyzed, but later it was derivatized to the method to analyze huge amounts of data in  order to gain great value.This is a form which gradually gets attention, It's difficult to analyze these data and it's also difficult to store these data and it needs some unprecedented way, Now in China many companies use the open source Hadoop distributed data cluster to meet the needs of data statistics. Since we can get segmented d...

   Netflix,Big data,Data mining     2013-04-11 04:20:40

  Signature sign/verification demo in Java

Digital signature is commonly used in areas where data authentication and integrity are required. It is extremely important to have signature while transferring sensitive data from one peer to other peers through network since there might be malicious applications or man-in-the-middle attacks which may alter the data along the way. Java provides some APIs to generate and verify digital signature. One important class is Signature.  When generating the signature, a private key needs to be pa...

   SECURITY,JAVA,SIGNATURE     2015-11-21 09:48:12

  Why Most of us Get Confuse With Data Quality Solutions and Bad Data?

How to fix this misunderstanding is what Big Data professionals will explain in this post. The C-level executives are using data collected by their BI and analytics initiatives to make strategic decisions to offer the company a competitive advantage. The case gets worse if the data is inaccurate or incorrect. It’s because the big data helps the company to make big bets, and it impacts the direction and future together. Bad Data can yield inappropriate results and losses. Some interesting ...

   BIGDATA     2018-02-21 06:01:35

  A guide on installing and running Clickhouse on macOS

ClickHouse is a high-performance open-source columnar database management system developed by Yandex. Here are some of the key features of ClickHouse: Columnar storage: ClickHouse uses a columnar storage format, which allows it to efficiently store and retrieve data by column, rather than by row. This results in much faster query performance, especially for analytical and aggregate queries. Real-time data processing: ClickHouse is designed to handle real-time data processing and can handle bill...

   CLICKHOUSE,MACOS     2023-02-15 06:04:55

  Empty slice vs nil slice in GoLang

In Go, there is a type called slice which is built on top of array. It's a very convenient type when we want to handle a group of data. This post will explain a subtle but tricky difference between empty slice and nil slice. A nil slice is a slice has a length and capacity of zero and has no underlying array. The zero value of slice is nil. If a slice is declared like below, it is a nil slice. package main import "fmt" func main() { var a []string fmt.Println(a == nil) } The output will be t...

   GOLANG,JSON,EMPTY SLICE,NIL SLICE     2018-10-18 09:25:21