Build Hadoop environment in Linux

  sonic0002        2013-07-31 23:22:27       4,718        1    

Hadoop standalone installation:

1. Install JDK

Install JDK with below command:

sudo apt-get install sun-java6-jdk

Configure Java environment, open /etc/profile, add below contents:

export JAVA_HOME = (Java installation directory)
export CLASSPATH =".:$JAVA_HOME/lib:$CLASSPATH"
export PATH = "$JAVA_HOME/:PATH"

Verify installation of Java

Type java --version, if it outputs Java version information, then Java is successfully installed.

2. Install SSH

Install SSH with below command:

sudo apt-get install ssh

Configure SSH to login to local PC without password:

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

Press "Enter", two files will be created in ~/.ssh/ : id_rsa and id_rsa.pub . These two files appear in pair, similar to the key and lock.

Then add the id_rsa.pub to the authorized keys:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Verify installation of SSH:

type ssh localhost, if it shows login success, then SSH is successfully installed.

3. Switch off firewall

sudo ufw disable

Note: This step is very important, if the firewall is not switched off, then you may encounter cannot find datanode issue.

4. Install Hadoop(Take version 0.20.2 as an example)

Download Hadoop from http://www.apache.org/dyn/closer.cgi/hadoop/core/

Install and configure Hadoop

Single node configuration:

There is no configuration needed for single node Hadoop. In this mode, Hadoop will be considered as a single Java process.

Pseudo-Distributed Mode

Pseudo-Distributed Mode is a cluster with only one node. In this cluster, the local machine is the master as well as the slave, it's the namenode as well as the datanode and it's the jobtracker as well as the tasktracker.

Configuration:

Modify below files in conf directory:

In Hadoop-env.sh:

Add exportJAVA_HOME = Ã¯Â¼Ë†JAVA installation directory)

In core-site.xml, modify below contents:

<configuration>
	<!-- global properties -->
	<property>
	    <name>hadoop.tmp.dir</name>
	    <value>/home/zhongping/tmp</value>
	</property>

	<!-- file system properties -->
	<property>
	   <name>fs.default.name</name>
	   <value>hdfs://localhost:9000</value>
	</property>
</configuration>

In hdfs-site.xml, modify below contents:

<configuration>
	<property>
		<name>fs.replication</name>
		<value>1</value>
	</property>
</configuration>

In mapred-site.xml, modify below contents:

<configuration>
	<property>
		<name>mapred.job.tracker</name>
		<value>localhost:9001</value>
	</property>
</configuration>

Format Hadoop file system:

bin/hadoopnamenode -format

Start Hadoop:

bin/start-all.sh

Verify installation of Hadoop. Type below URL in browser. If they can be opened normally, then Hadoop is successfully installed.

http://localhost:50030(mapreduce

http://localhost:50070

5, Run instance

Create two files locally:

echo "Hello World Bye World" > file01
echo "Hello Hadoop Goodbye Hadoop" > file02

Create an input directory in hufs:

hadoop fs -mkdir input

Copy file01 and file02 to hufs:

hadoop fs -copyFromLocal /home/zhongping/file0* input

Run wordcount:

hadoop jar hadoop-0.20.2-examples.jarwordcount input output

Check result:

hadoop fs -cat output/part-r-00000

Source : http://jingshengsun888.blog.51cto.com/1767811/1261385

CONFIGURATION  HADOOP.LINUX 

       

  RELATED


  1 COMMENT


tao [Reply]@ 2013-08-05 00:25:16

what's hadoop?



  RANDOM FUN

A programmer in a room full of PMs