重新使用Hadoop YARN 2.4.0

因为写论文需要，增加一章case study。把之前做过的Hadoop job Scheduler 实验数据加进来。一上午都在弄之前搭建好的hadoop cluster，突然发觉自己没有整理笔记的习惯。还好2014年5月有个零星的记录，能够让我回忆起来具体的细节。通过自己在evernote上的收藏。找到了一些有用的信息和命令。参考人家的经验结合自己的实践，是最快速的学习和回顾方法。
（参考网上笔记（已经保留在evernote笔记）－ Hadoop 2.2.0 (YARN)搭建笔记）

记录一下有用的命令：格式化HDFS：
$HADOOP_HOME/bin/hdfs namenode -format
注意，如果不是第一次格式化，那么需要先清理tmp目录中的数据（默认的tmp目录在/tmp/hadoop-下面），以及hdfs-site.xml 中配置的dfs.datanode.data.dir目录，否则可能出错。

$HADOOP_HOME/sbin/start-all.sh
该命令会依次启动hdfs和yarn，分别调用start-hdfs.sh和start-yarn.sh;
start-hdfs.sh依次启动namenode、secondarynamenode、datanode，也可以单独启动

在master上和 slave上执行jps，输出进程名

查看当前信息的命令：
hadoop dfsadmin -report
https://hadoop.apache.org/docs/r2.4.0/hadoop-yarn/hadoop-yarn-site/YarnCommands.html

查看hdfs上文件
hadoop fs -ls ./

在HDFS 创建目录
hadoop fs -mkdir Input

拷贝loal本机文件到 HDFS：
hadoop fs -copyFromLocal input.txt Input

Run:
hadoop jar ./hadoop-mapreduce-examples-2.4.0.jar wordcount Input Output or yarn jar hadoop-mapreduce-examples-2.4.0.jar wordcount Input Output

merge and check the resutl:
hadoop fs -getmerge Output/* local_output cat local_output

自己写的一段运行脚本

 
#!/bin/bash
rm -rf local_output
hadoop fs -rm -r Input
hadoop fs -rm -r Output
hadoop fs -mkdir Input
hadoop fs -copyFromLocal input.txt Input
hadoop jar ./hadoop-mapreduce-examples-2.4.0.jar  wordcount Input  Output
hadoop fs -getmerge Output/* local_output
cat local_output

Reference
Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.

文章目录