SEARCH KEYWORD -- HADOOP DEVELOPMENT



  Why to opt for Hadoop?

Hadoop is a open source that stores and processes big data. The framework is written in Java for distributed processing and distributed storage of very large data. Hadoop is Scalable. It is a scalable platform because it stores and distributed large amount of data sets to hundreds and thousands of servers that operate in parallel. Traditional database systems cannot process large amount of data. But, hadoop enable business to run applications involving thousands of Terabyte data. Hadoop is ...

       2015-09-22 10:17:43

  Make Big Data Collection Efficient with Hadoop Architecture and Design Tools

Hadoop architecture and design is popular to spread small array of code to large number of computers. That is why big data collection can be made more efficient with hadoop architecture and design. Hadoop is an open source system where you are free to make changes and design new tools according to your business requirement.   Here we will discuss most popular tools under the category Hadoop development and how they are helpful for big projects. Ambari and Hive– When you are designing...

   HADOOP ARCHITECTURE,HADOOP HIVE ARCHITECTURE,HADOOP ARCHITECTURE AND DESIGN     2015-09-17 05:24:44

  Data governance Challenges and solutions in Apache Hadoop

Do you understand meaning of data governance? This is taken as most critical part of an organization that deals with sensitive data of an enterprise. If organization wanted to know who is accessing their sensitive data and what action has been taken by the viewers then data governance is wonderful solution to consider. In this article, we will discuss on data governance solutions and what are the challenges that are faced by organization during implementation of data governance. We will also dis...

   HADOOP DEVELOPMENT,HADOOP INTEGRATION     2015-10-26 08:06:29

  Build Hadoop environment in Linux

Hadoop standalone installation: 1. Install JDK Install JDK with below command: sudo apt-get install sun-java6-jdk Configure Java environment, open /etc/profile, add below contents: export JAVA_HOME = (Java installation directory) export CLASSPATH =".:$JAVA_HOME/lib:$CLASSPATH" export PATH = "$JAVA_HOME/:PATH" Verify installation of Java Type java --version, if it outputs Java version information, then Java is successfully installed. 2. Install SSH Install SSH with below command: sudo ...

   Hadoop.Linux,Configuration     2013-07-31 23:22:27

  Hadoop or Spark: Which One is Better?

What is Hadoop? Hadoop is one of the widely used Apache-based frameworks for big data analysis. It allows distributed processing of large data set over the computer clusters. Its scalable feature leverages the power of one to thousands of system for computing and storage purpose. A complete Hadoop framework comprised of various modules such as: Hadoop Yet Another Resource Negotiator (YARN MapReduce (Distributed processing engine) Hadoop Distributed File System (HDFS) Hadoop Common Thes...

   COMPARISON,HADOOP,SPARK     2018-11-22 07:08:57

  Embrace open source

In past few days, there are many tech news which are related to open source. For example, Microsoft enables Linux on its Windows Azure cloud, Facebook open sourced its C++ library Folly and Samsung joined Linux foundation. Now more and more big companies realize the power of open source and are willing to contribute to the open source community. It will benefit not only developers but also these big companies as well.By providing some open source libraries or projects, developer may reduce their...

   Open source,Microsoft,Samsung,Facebook,Linux     2012-06-06 05:37:59

  Data Scientists and Their Harder Skills than Big Data

The field of data science is often confused with that of big data. Data science is an aid to decision makers in a company with a logical approach.  Who is a Data Scientist?  A Data Scientist reviews a huge collection of data(that may extend to a couple of terabytes of disk space or thousands of excel sheets). This humongous chunk of data is not feasible for being handled, sorted and analyzed by a single person. Here we require the help of data science, and most recently, the field of A...

   BIG DATA     2017-12-13 04:22:55

  The Giant Mafia

There is an old Chinese saying "Things of a kind come together. People of a mind fall into the same group.". In the wave of Web 2.0, there are many emerging IT giants coming out the world. And many of them are founded by a group of people who previously worked together at the same company such as PayPal and Facebook. This is called giant mafia. Let's see what people from the big IT giant have done after leaving the original company. The PayPal mafia Peter Thiel, co-founder and CEO of PayPal bef...

   Facebook mafia,PayPal mafia,Twitter mafia     2015-04-04 10:32:00

  Twitter to sponsor Apache Software Foundation

Twitter recently made a commitment that they would sponsor the Apache Software Foundation, it will become its official sponsor. The Apache Software Foundation is a nonprofit organization, it can provide the organization and management, legal and financial support for open source projects. As we all know, Twitter loves open source, and its engineers are often engaged in open source community to provide technical support. Twitter team is also responsible for the related construction of the o...

   Apache,ASF,Twitter,Sponsor     2012-04-20 12:08:06

  Cleansing data with Pig and storing JSON format to HBase with Pig UDF

Introduction This post will explain you the way to clean data and store JSON format to HBase. Hadoop architect experts also explain Apache Pig and its advantages in Hadoop in this post. Read more and find out how they do it. This post contains steps to do some basic clean the duplication data and convert the data to JSON format to store to HBase. Actually, we have some built-in lib to parse JSON in Pig but it is important to manipulate the JSON data in Java code before store to HBase. Apache Pig...

   JSON,HADOOP ARCHITECT,APACHE HBASE,PIG UDF     2016-06-10 01:13:41