为Hadoop1.2配备Hive


#大数据#


2014-6-14

Apache Hive是建立在Hadoop基础上的数据仓库,为开发人员/使用人员提供了类似SQL的命令行接口。Hive本质上是就在HDFS和MapReduce之上进行了抽象,也就是说Hive的SQL语句会被转换成MapReduce任务,其处理的数据一般放在HDFS中。

如何在单机上为Hadoop配置伪分布式,请参考 Hadoop1.2配置伪分布式

Hive官网在这里。 在hive下载apache-hive-0.13.1-bin.tar.gz,解压后,更名为hive-0.13.1,放在~/下。

设置HIVE_HOME

/etc/profile中添加以下内容:

export HIVE_HOME=/home/letian/hive-0.13.1

在HDFS上建立相关目录

启动hadoop1.2:

$ start-all.sh 

在HDFS中建立目录/tmp,并增加组用户的写权限:

$ hadoop fs -mkdir /tmp
$ hadoop fs -chmod g+w /tmp

在HDFS中建立目录/user/hive/warehouse,并增加组用户的写权限:

$ hadoop fs -mkdir /user/hive/warehouse
$ hadoop fs -chmod g+w /user/hive/warehouse

/user/hive/warehouse是怎么来的?在hive-0.13.1/conf/目录下有一默认配置文件的模板hive-default.xml.template,能找到下面的内容:

<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/user/hive/warehouse</value>
  <description>location of default database for the warehouse</description>
</property>

/tmp相关的配置为:

<property>
  <name>hive.querylog.location</name>
  <value>/tmp/${user.name}</value>
  <description>
    Location of Hive run time structured log file
  </description>
</property>

试用Hive

方便起见,先把hive-0.13.1/bin加入$PATH变量中。

输入命令hive,进入命令行接口:

$ hive

好了,现在我们就能使用熟悉的SQL了,当然,不可能完全相似。

创建一个user表:

hive> CREATE TABLE user (name STRING, age INT, email STRING);
OK
Time taken: 0.495 seconds

查看有哪些表:

hive> SHOW TABLES;
OK
user
Time taken: 0.025 seconds, Fetched: 1 row(s)

使用dfs命令查看HDFS的内容:

hive> dfs -ls /user/hive/warehouse/;    
Found 1 items
drwxr-xr-x   - letian supergroup          0 2014-06-14 10:16 /user/hive/warehouse/user

查看user表的结构:

hive> DESCRIBE user;
OK
name                	string              	                    
age                 	int                 	                    
email               	string              	                    
Time taken: 0.374 seconds, Fetched: 3 row(s)

插入数据:

Hive不支持行级别的插入。

建立文件user.dat,内容如下:

letian 22 letian@123.com

导入数据:

hive> LOAD DATA LOCAL INPATH '/home/letian/user.dat' OVERWRITE INTO TABLE user;
Copying data from file:/home/letian/user.dat
Copying file: file:/home/letian/user.dat
Loading data to table default.user
Deleted hdfs://localhost:9000/user/hive/warehouse/user
Table default.user stats: [numFiles=1, numRows=0, totalSize=25, rawDataSize=0]
OK
Time taken: 0.674 seconds

查看user表的内容:

hive> SELECT name FROM user;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201406140958_0001, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201406140958_0001
Kill Command = /home/letian/hadoop-1.2.1/libexec/../bin/hadoop job  -kill job_201406140958_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-06-14 10:40:49,060 Stage-1 map = 0%,  reduce = 0%
2014-06-14 10:40:51,091 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.06 sec
2014-06-14 10:40:53,107 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.06 sec
MapReduce Total cumulative CPU time: 1 seconds 60 msec
Ended Job = job_201406140958_0001
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 1.06 sec   HDFS Read: 234 HDFS Write: 25 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 60 msec
OK
letian 22 letian@123.com
Time taken: 11.38 seconds, Fetched: 1 row(s)

退出Hive:

hive> exit;

( 本文完 )