`
bit1129
  • 浏览: 1052239 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

【Spark六十七】Spark Standalone完全分布式安装

 
阅读更多

 

1. 下载并解压Spark1.2.1(with hadoop2.4)

http://mirror.bit.edu.cn/apache/spark/spark-1.2.1/spark-1.2.1-bin-hadoop2.4.tgz

 

2.下载并解压Scala-2.10.4

http://www.scala-lang.org/files/archive/scala-2.10.4.tgz

 

3.配置Scala环境变量:

3.1 设置SCALA_HOME

 

/home/hadoop/software/scala-2.10.4
 

 

3.2.将$SCALA_HOME/bin假如到系统PATH变量中

 

3.3 在console上执行命令scala -version查看Scala安装的版本

 

Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_67).
 

 

 

4. 配置Spark环境变量(vim /etc/profile)

 

export JAVA_HOME=/home/hadoop/software/jdk1.7.0_67
export HADOOP_HOME=/home/hadoop/software/hadoop-2.5.2
export SPARK_HOME=/home/hadoop/software/spark-1.2.1-bin-hadoop2.4
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

 

5. 编辑Spark自带的spark-env.sh文件(所有的节点都做设置)

 

export JAVA_HOME=/home/hadoop/software/jdk1.7.0_67
export HADOOP_HOME=/home/hadoop/software/hadoop-2.5.2
export SPARK_HOME=/home/hadoop/software/spark-1.2.1-bin-hadoop2.4
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SCALA_HOME=/home/hadoop/software/scala-2.10.4
export SPARK_MASTER_IP=192.168.26.131
export SPARK_WORKER_INSTANCES=1
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI_PORT=7087
export SPARK_WORKER_PORT=8077

 

6. 在Master节点修改slaves文件

 

在Master节点上修改/conf/slaves文件,以配置slaves。Slave节点不要做如下的设置

 

192.168.26.133
192.168.26.134

 

7. 启动Master和两个slaves

7.1 在Master节点通过如下命令启动Master和两个Slave

 

sbin/start-all.sh
 

 

 

7.2 启动发现,两个Slave启动了,但是Master没有启动,Master的异常如下:

 

Spark assembly has been built with Hive, including Datanucleus jars on classpath
Spark Command: /home/hadoop/software/jdk1.7.0_67/bin/java -cp ::/home/hadoop/software/spark-1.2.1-bin-hadoop2.4/sbin/../conf:/home/hadoop/software/spark-1.2.1-bin-hadoop2.4/lib/spark-assembly-1.2.1-hadoop2.4.0.jar:/home/hadoop/software/spark-1.2.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/home/hadoop/software/spark-1.2.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/home/hadoop/software/spark-1.2.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/home/hadoop/software/hadoop-2.5.2/etc/hadoop:/home/hadoop/software/hadoop-2.5.2/etc/hadoop -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m org.apache.spark.deploy.master.Master --ip 192.168.26.131 --port 7077 --webui-port 7087
========================================

15/02/18 01:47:16 INFO master.Master: Registered signal handlers for [TERM, HUP, INT]
15/02/18 01:47:17 INFO spark.SecurityManager: Changing view acls to: hadoop
15/02/18 01:47:17 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/02/18 01:47:17 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/02/18 01:47:21 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/02/18 01:47:24 INFO Remoting: Starting remoting
Exception in thread "main" java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
        at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
        at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
        at scala.concurrent.Await$.result(package.scala:107)
        at akka.remote.Remoting.start(Remoting.scala:180)
        at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
        at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:618)
        at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:615)
        at akka.actor.ActorSystemImpl._start(ActorSystem.scala:615)
        at akka.actor.ActorSystemImpl.start(ActorSystem.scala:632)
        at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
        at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)
        at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
        at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
        at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
        at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1765)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
        at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1756)
        at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
        at org.apache.spark.deploy.master.Master$.startSystemAndActor(Master.scala:849)
        at org.apache.spark.deploy.master.Master$.main(Master.scala:829)
        at org.apache.spark.deploy.master.Master.main(Master.scala)
15/02/18 01:47:33 ERROR Remoting: Remoting error: [Startup timed out] [
akka.remote.RemoteTransportException: Startup timed out
        at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:136)
        at akka.remote.Remoting.start(Remoting.scala:198)
        at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
        at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:618)
        at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:615)
        at akka.actor.ActorSystemImpl._start(ActorSystem.scala:615)
        at akka.actor.ActorSystemImpl.start(ActorSystem.scala:632)
        at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
        at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)
        at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
        at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
        at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
        at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1765)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
        at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1756)
        at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
        at org.apache.spark.deploy.master.Master$.startSystemAndActor(Master.scala:849)
        at org.apache.spark.deploy.master.Master$.main(Master.scala:829)
        at org.apache.spark.deploy.master.Master.main(Master.scala)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
        at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
        at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
        at scala.concurrent.Await$.result(package.scala:107)
        at akka.remote.Remoting.start(Remoting.scala:180)
        ... 17 more
]
15/02/18 01:47:37 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/02/18 01:47:37 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports
.

 

 

7.3 指定stop-all.sh将所有的Master和Slaves进程关闭,然后再次启动start-all.sh,Master和两个Slave都启动成功,

7.3.1 Master的启动日志如下:

 

15/02/18 01:54:50 INFO master.Master: Registered signal handlers for [TERM, HUP, INT]
15/02/18 01:54:50 INFO spark.SecurityManager: Changing view acls to: hadoop
15/02/18 01:54:50 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/02/18 01:54:50 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/02/18 01:54:51 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/02/18 01:54:51 INFO Remoting: Starting remoting
15/02/18 01:54:52 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkMaster@192.168.26.131:7077]
15/02/18 01:54:52 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkMaster@192.168.26.131:7077]
15/02/18 01:54:52 INFO util.Utils: Successfully started service 'sparkMaster' on port 7077.
15/02/18 01:54:52 INFO master.Master: Starting Spark master at spark://192.168.26.131:7077
15/02/18 01:54:52 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/02/18 01:54:52 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:7087
15/02/18 01:54:52 INFO util.Utils: Successfully started service 'MasterUI' on port 7087.
15/02/18 01:54:52 INFO ui.MasterWebUI: Started MasterWebUI at http://hadoop.master:7087
15/02/18 01:54:53 INFO master.Master: I have been elected leader! New state: ALIVE
15/02/18 01:54:57 INFO master.Master: Registering worker hadoop.slave2:8077 with 1 cores, 971.0 MB RAM
15/02/18 01:54:57 INFO master.Master: Registering worker hadoop.slave1:8077 with 1 cores, 971.0 MB RAM

可见

7.3.1.1. Master启动并监听于7077端口,

7.3.1.2. Master的WebUI已经启动,访问地址是

7.3.1.3 hadoop.slave1和hadoop.slave2都监听于8077端口,并且已经注册到Master节点上

 

 

7.3.2 Slave端的日志如下:

 

15/02/18 01:54:54 INFO worker.Worker: Registered signal handlers for [TERM, HUP, INT]
15/02/18 01:54:54 INFO spark.SecurityManager: Changing view acls to: hadoop
15/02/18 01:54:54 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/02/18 01:54:54 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/02/18 01:54:55 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/02/18 01:54:56 INFO Remoting: Starting remoting
15/02/18 01:54:56 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkWorker@hadoop.slave1:8077]
15/02/18 01:54:56 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkWorker@hadoop.slave1:8077]
15/02/18 01:54:56 INFO util.Utils: Successfully started service 'sparkWorker' on port 8077.
15/02/18 01:54:56 INFO worker.Worker: Starting Spark worker hadoop.slave1:8077 with 1 cores, 971.0 MB RAM
15/02/18 01:54:56 INFO worker.Worker: Spark home: /home/hadoop/software/spark-1.2.1-bin-hadoop2.4
15/02/18 01:54:56 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/02/18 01:54:56 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:8081
15/02/18 01:54:56 INFO util.Utils: Successfully started service 'WorkerUI' on port 8081.
15/02/18 01:54:56 INFO ui.WorkerWebUI: Started WorkerWebUI at http://hadoop.slave1:8081
15/02/18 01:54:56 INFO worker.Worker: Connecting to master spark://192.168.26.131:7077...
15/02/18 01:54:57 INFO worker.Worker: Successfully registered with master spark://192.168.26.131:7077

 

8 UI展现

 

8.1 Master的UI展示(7087端口):

 

 

 

8.2 Slave的UI展示(8081端口):

 

 

 

9 Spark集群测试:

 

 

9.1. 在Master上启动一个Spark Shell,执行如下操作,发现所有的结果存在于Master,而没有在Worker上执行,这是为什么呢?怀疑bin/spark-shell命令默认的使用本地local作为master,即 spark-shell --master local

 

bin>./spark-shell
scala> val rdd = sc.parallelize(List(1,3,7,7,8,9,11,2,11,33,44,99,111,2432,4311,111,111), 7)
rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[2] at parallelize at <console>:12

scala> rdd.saveAsTextFile("file:///home/hadoop/output")

 
9.2. 退出上面的shell,执行如下的操作

./spark-shell --master spark://192.168.26.131:7077

 上面的IP,如果指定成域名,例如hadoop.master,则提示连接不上,,不知道为什么。

从日志中,可以看出来,Spark的数据已经提交给Worker去执行,

scala> rdd.saveAsTextFile("file:///home/hadoop/output2")
15/02/18 02:37:55 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
15/02/18 02:37:55 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
15/02/18 02:37:55 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
15/02/18 02:37:55 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
15/02/18 02:37:55 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
15/02/18 02:37:55 INFO spark.SparkContext: Starting job: saveAsTextFile at <console>:15
15/02/18 02:37:55 INFO scheduler.DAGScheduler: Got job 0 (saveAsTextFile at <console>:15) with 7 output partitions (allowLocal=false)
15/02/18 02:37:55 INFO scheduler.DAGScheduler: Final stage: Stage 0(saveAsTextFile at <console>:15)
15/02/18 02:37:55 INFO scheduler.DAGScheduler: Parents of final stage: List()
15/02/18 02:37:55 INFO scheduler.DAGScheduler: Missing parents: List()
15/02/18 02:37:55 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[1] at saveAsTextFile at <console>:15), which has no missing parents
15/02/18 02:37:55 INFO storage.MemoryStore: ensureFreeSpace(112056) called with curMem=0, maxMem=280248975
15/02/18 02:37:55 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 109.4 KB, free 267.2 MB)
15/02/18 02:37:55 INFO storage.MemoryStore: ensureFreeSpace(67552) called with curMem=112056, maxMem=280248975
15/02/18 02:37:55 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 66.0 KB, free 267.1 MB)
15/02/18 02:37:55 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop.master:44435 (size: 66.0 KB, free: 267.2 MB)
15/02/18 02:37:55 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
15/02/18 02:37:55 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:838
15/02/18 02:37:55 INFO scheduler.DAGScheduler: Submitting 7 missing tasks from Stage 0 (MappedRDD[1] at saveAsTextFile at <console>:15)
15/02/18 02:37:55 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 7 tasks
15/02/18 02:37:55 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, hadoop.slave1, PROCESS_LOCAL, 1208 bytes)
15/02/18 02:37:55 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, hadoop.slave2, PROCESS_LOCAL, 1208 bytes)
15/02/18 02:38:06 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop.slave1:41802 (size: 66.0 KB, free: 267.2 MB)
15/02/18 02:38:06 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop.slave2:34337 (size: 66.0 KB, free: 267.2 MB)
15/02/18 02:38:24 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, hadoop.slave1, PROCESS_LOCAL, 1212 bytes)
15/02/18 02:38:24 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 28274 ms on hadoop.slave1 (1/7)
15/02/18 02:38:24 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, hadoop.slave2, PROCESS_LOCAL, 1208 bytes)
15/02/18 02:38:24 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 28719 ms on hadoop.slave2 (2/7)
15/02/18 02:38:26 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, hadoop.slave2, PROCESS_LOCAL, 1212 bytes)
15/02/18 02:38:26 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 1908 ms on hadoop.slave2 (3/7)
15/02/18 02:38:26 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, hadoop.slave1, PROCESS_LOCAL, 1208 bytes)
15/02/18 02:38:26 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 2595 ms on hadoop.slave1 (4/7)
15/02/18 02:38:27 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, hadoop.slave2, PROCESS_LOCAL, 1212 bytes)
15/02/18 02:38:27 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 440 ms on hadoop.slave2 (5/7)
15/02/18 02:38:27 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 336 ms on hadoop.slave1 (6/7)
15/02/18 02:38:27 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 193 ms on hadoop.slave2 (7/7)
15/02/18 02:38:27 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/02/18 02:38:27 INFO scheduler.DAGScheduler: Stage 0 (saveAsTextFile at <console>:15) finished in 31.266 s
15/02/18 02:38:27 INFO scheduler.DAGScheduler: Job 0 finished: saveAsTextFile at <console>:15, took 31.932050 s

 

结果发现,虽然Slave1和Slave2已经有了输出目录,可是目录底下是没有数据的

 

9.3 执行workdcount也是如此结果

bin/spark-shell --master spark://192.168.26.131:7077
scala> var rdd = sc.textFile("file:///home/hadoop/history.txt.used.byspark", 7)
rdd.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_ + _,5).map(x => (x._2, x._1)).sortByKey(false).map(x => (x._2, x._1)).saveAsTextFile("file:///home/hadoop/output")

 

9.4  运行Spark自带的SparkPi程序,

 

./run-example SparkPi 1000 --master spark://192.168.26.131:7077
 结果能够输出:Pi is roughly 3.14173708

 

 9.5 执行workdcount(读写HDFS)

bin/spark-shell --master spark://192.168.26.131:7077
scala> var rdd = sc.textFile("/user/hadoop/history.txt.used.byspark", 7)
rdd.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_ + _,5).map(x => (x._2, x._1)).sortByKey(false).map(x => (x._2, x._1)).saveAsTextFile("/user/hadoop/output")

可以得到正确的结果是正确的part-00000~part-00004

 

 

 可见slave1执行了3个任务,slave2执行了两个Task,通过Shuffle,分别读取了661B和1248B

 

 

 

  • 大小: 49.7 KB
  • 大小: 26.9 KB
  • 大小: 67.8 KB
分享到:
评论

相关推荐

    Spark实验:Standalone模式安装部署(带答案)1

    1. 解压Spark安装包 2. 配置Spark环境变量 2. 修改 spark-env.sh 文件,完成以下设置: 1. 设置运行master进程的节点, e

    spark 分布式集群搭建

    Spark standalone 分布式集群搭建,Spark standalone运行模式,Spark Standalone运行架构解析---Spark基本工作流程,Spark Standalone运行架构解析---Spark local cluster模式

    Spark分布式内存计算框架视频教程

    5.Spark 分布式缓存 6.业务报表分析 7.应用执行部署 8.Oozie和Hue集成调度Spark 应用 第五章、SparkStreaming 模块 1.Streaming流式应用概述 2.Streaming 计算模式 3.SparkStreaming计算思路 4.入门案例 5.Spark...

    Spark Standalone架构设计.docx

    Spark Standalone架构设计.docx

    Spark环境搭建——standalone集群模式

    这篇博客,Alice为大家带来的是Spark集群环境搭建之——standalone集群模式。 文章目录集群角色介绍集群规划修改配置并分发启动和停止查看web界面测试 集群角色介绍  Spark是基于内存计算的大数据并行计算框架,...

    springboot整合spark连接远程服务计算框架使用standAlone模式

    springboot整合spark连接远程服务计算框架使用standAlone模式

    Spark Standalone 官方文档翻译

    Spark 官方文档翻译,Spark 单机版部署方式,在CentOS 7 的环境下的开发方法

    Spark-Core学习知识笔记整理

    Spark-Core文档是本人经三年总结笔记汇总而来,对于自我学习Spark核心基础知识非常方便,资料中例举完善,内容丰富。具体目录如下: 目录 第一章 Spark简介与计算模型 3 1 What is Spark 3 2 Spark简介 3 3 Spark...

    Spark Standalone 单机版部署

    Spark standalone 单机版部署,看了网上很多方法,事实证明都是错误的,本人亲身经历,在导师的指导下,成功配置成功单机版。

    独立部署模式standalone下spark的配置

    独立部署模式standalone下spark配置,从乌班图到jak,scala,hadoop,spark的安装 部署

    spark3.0入门到精通

    │ 03-[掌握]-Spark环境搭建-Standalone集群模式.mp4 │ 06-[理解]-Spark环境搭建-On-Yarn-两种模式.mp4 │ 07-[掌握]-Spark环境搭建-On-Yarn-两种模式演示.mp4 │ 09-[掌握]-Spark代码开发-准备工作.mp4 │ 10...

    Spark 2.0.2 Spark 2.2 中文文档 本资源为网页,不是PDF

    Spark Standalone 模式 Spark on Mesos Spark on YARN Spark on YARN 上运行 准备 Spark on YARN 配置 调试应用 Spark 属性 重要提示 在一个安全的集群中运行 用 Apache Oozie 来运行应用程序 Kerberos ...

    Spark的安装(Standalone模式,高可用模式,基于Yarn模式)

    spark的Standalone模式安装  一、安装流程 1、将spark-2.2.0-bin-hadoop2.7.tgz 上传到 /usr/local/spark/ 下,然后解压 2、进入到conf中修改名字 改为 .sh 结尾的 3、编辑 spark-env.sh 4、修改slaves 的...

    spark-2.2.2安装流程

    ClusterManager:在Standalone模式中即为Master(主节点),控制整个集群,监控Worker。在YARN模式中为资源管理器。 Worker:从节点,负责控制计算节点,启动Executor。在YARN模式中为NodeManager,负责计算节点的...

    Spark-2.3.1源码解读

    Spark-2.3.1源码解读。 Spark Core源码阅读 Spark Context 阅读要点 Spark的缓存,变量,shuffle数据等清理及机制 Spark-submit关于参数及部署模式的部分解析 GroupByKey VS ReduceByKey OrderedRDDFunctions...

    Spark的运行模式.xmind

    该资源主要是描述spark运行模式中的spark standalone模式和spark on yarn模式。详细内容可参见本人博客

    Spark Standalone模式集成HDFS配置清单

    Spark Standalone模式集成HDFS配置清单,教你如何配置spark和hdfs平台。由于Linux的防火墙限制,初学者嫌麻烦可以关闭防火墙。

    Spark in Action.pdf

    Apache Spark is a general data processing framework. That means you can use it for all kinds of computing tasks. And that means any book on Apache Spark needs to cover a lot of different topics. We’...

    Apache Spark的设计与实现 PDF中文版

    由于技术水平、实验条件、经验等限制,当前只讨论 Spark core standalone 版本中的核心功能,而不是全部功能。诚邀各位小伙伴们加入进来,丰富和完善文档。 好久没有写这么完整的文档了,上次写还是三年前在学 Ng ...

    三种方式的spark on kubernetes对比

    三种方式的spark on kubernetes对比,第一种:spark原生支持Kubernetes资源调度;第二种:google集成的Kubernetes的spark插件sparkoperator;第三种:standalone方式运行spark集群

Global site tag (gtag.js) - Google Analytics