bit1129

浏览: 1051322 次
性别:
来自: 北京

最近访客更多访客>>

xiaoyaohen24

yuxin8000

abc951654

zhongqi2513

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

【Spark十七】： Spark SQL第三部分结合HIVE

博客分类：

Spark

Hive On Spark

Spark发行版本里自带了Hive，也就是说，使用Hive时，不需要单独的安装Hive?

Spark SQL supports reading and writing data stored in Apache Hive. However, since Hive has a large number of dependencies, it is not included in the default Spark assembly. In order to use Hive you must first run “sbt/sbt -Phive assembly/assembly” (or use -Phive for maven). This command builds a new assembly jar that includes Hive. Note that this Hive assembly jar must also be present on all of the worker nodes, as they will need access to the Hive serialization and deserialization libraries (SerDes) in order to access data stored in Hive.Configuration of Hive is done by placing your hive-site.xml file in conf/.

When working with Hive one must construct a HiveContext, which inherits from SQLContext, and adds support for finding tables in in the MetaStore and writing queries using HiveQL. Users who do not have an existing Hive deployment can still create a HiveContext. When not configured by the hive-site.xml, the context automatically creates metastore_db and warehouse in the current directory.

scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala> val dbs = hiveContext.sql("show  databases");

///没做操作前只有default
scala> dbs.collect

///枚举所有的数据表
scala>hiveContext.sql("show tables").collect

还可以使用hiveContext的hql语句

scala> import hiveContext._

///创建表
scala> hql("CREATE TABLE IF NOT EXISTS person(name STRING, age INT)")

scala> hql("select * from person");

scala> hql("show tables");

///加载数据,加载数据时，默认的换行符和默认的列分隔符是什么？
///列分隔的语法：row format delimited fields terminated by '/t'

scala> hql("LOAD DATA LOCAL INPATH '/home/hadoop/software/spark-1.2.0-bin-hadoop2.4/data/person.txt' INTO TABLE person;");

问题：

1. 上面的操作，Hive关联的数据库是哪个？

2. 如果已经单独安装了Hive，是否让Spark去操作那个已经存在的Hive？

未完待续

分享到：

【Spark十八】Spark History Server | 【Spark十六】： Spark SQL第二部分数据源 ...

2015-01-10 13:51
浏览 2392
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

【Spark十七】： Spark SQL第三部分结合HIVE

Hive On Spark

未完待续

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

【Spark十七】： Spark SQL第三部分结合HIVE

Hive On Spark

未完待续

评论

发表评论

相关推荐

【Spark109】Windows上运行spark-shell

【Spark108】Spark SQL动态代码生成四

【Spark107】Spark SQL动态代码生成三

【Spark106】Spark SQL动态代码生成二

【Spark105】Spark SQL动态代码生成一

【Spark105】Spark任务调度

【Spark104】Spark源代码构建打包

【Spark103】Task not serializable

【Spark102】Spark存储模块BlockManager剖析

【Spark101】Scala Promise/Future在Spark中的应用

【Spark100】Spark Streaming Checkpoint的一个坑

【Spark九十九】Spark Streaming的batch interval时间内的数据流转源码分析

【Spark九十八】Standalone Cluster Mode下的资源调度源代码分析

【Spark九十七】RDD API之aggregateByKey

【Spark九十六】RDD API之combineByKey

【Spark九十五】Spark Shell操作Spark SQL

【Spark九十四】spark-sql工具的使用

【Spark九十三】Spark读写Sequence File

【Spark九十二】Spark SQL操作Parquet格式的数据

【Spark九十一】Spark Streaming整合Kafka一些值得关注的问题

最近访客更多访客>>