1. 下载并解压Spark1.2.1(with hadoop2.4)
http://mirror.bit.edu.cn/apache/spark/spark-1.2.1/spark-1.2.1-bin-hadoop2.4.tgz
2.下载并解压Scala-2.10.4
http://www.scala-lang.org/files/archive/scala-2.10.4.tgz
3.配置Scala环境变量:
3.1 设置SCALA_HOME
/home/hadoop/software/scala-2.10.4
3.2.将$SCALA_HOME/bin假如到系统PATH变量中
3.3 在console上执行命令scala -version查看Scala安装的版本
Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_67).
4. 配置Spark环境变量(vim /etc/profile)
export JAVA_HOME=/home/hadoop/software/jdk1.7.0_67 export HADOOP_HOME=/home/hadoop/software/hadoop-2.5.2 export SPARK_HOME=/home/hadoop/software/spark-1.2.1-bin-hadoop2.4 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
5. 编辑Spark自带的spark-env.sh文件(所有的节点都做设置)
export JAVA_HOME=/home/hadoop/software/jdk1.7.0_67 export HADOOP_HOME=/home/hadoop/software/hadoop-2.5.2 export SPARK_HOME=/home/hadoop/software/spark-1.2.1-bin-hadoop2.4 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop export SCALA_HOME=/home/hadoop/software/scala-2.10.4 export SPARK_MASTER_IP=192.168.26.131 export SPARK_WORKER_INSTANCES=1 export SPARK_MASTER_PORT=7077 export SPARK_MASTER_WEBUI_PORT=7087 export SPARK_WORKER_PORT=8077
6. 在Master节点修改slaves文件
在Master节点上修改/conf/slaves文件,以配置slaves。Slave节点不要做如下的设置
192.168.26.133 192.168.26.134
7. 启动Master和两个slaves
7.1 在Master节点通过如下命令启动Master和两个Slave
sbin/start-all.sh
7.2 启动发现,两个Slave启动了,但是Master没有启动,Master的异常如下:
Spark assembly has been built with Hive, including Datanucleus jars on classpath Spark Command: /home/hadoop/software/jdk1.7.0_67/bin/java -cp ::/home/hadoop/software/spark-1.2.1-bin-hadoop2.4/sbin/../conf:/home/hadoop/software/spark-1.2.1-bin-hadoop2.4/lib/spark-assembly-1.2.1-hadoop2.4.0.jar:/home/hadoop/software/spark-1.2.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/home/hadoop/software/spark-1.2.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/home/hadoop/software/spark-1.2.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/home/hadoop/software/hadoop-2.5.2/etc/hadoop:/home/hadoop/software/hadoop-2.5.2/etc/hadoop -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m org.apache.spark.deploy.master.Master --ip 192.168.26.131 --port 7077 --webui-port 7087 ======================================== 15/02/18 01:47:16 INFO master.Master: Registered signal handlers for [TERM, HUP, INT] 15/02/18 01:47:17 INFO spark.SecurityManager: Changing view acls to: hadoop 15/02/18 01:47:17 INFO spark.SecurityManager: Changing modify acls to: hadoop 15/02/18 01:47:17 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 15/02/18 01:47:21 INFO slf4j.Slf4jLogger: Slf4jLogger started 15/02/18 01:47:24 INFO Remoting: Starting remoting Exception in thread "main" java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at akka.remote.Remoting.start(Remoting.scala:180) at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184) at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:618) at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:615) at akka.actor.ActorSystemImpl._start(ActorSystem.scala:615) at akka.actor.ActorSystemImpl.start(ActorSystem.scala:632) at akka.actor.ActorSystem$.apply(ActorSystem.scala:141) at akka.actor.ActorSystem$.apply(ActorSystem.scala:118) at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1765) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1756) at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56) at org.apache.spark.deploy.master.Master$.startSystemAndActor(Master.scala:849) at org.apache.spark.deploy.master.Master$.main(Master.scala:829) at org.apache.spark.deploy.master.Master.main(Master.scala) 15/02/18 01:47:33 ERROR Remoting: Remoting error: [Startup timed out] [ akka.remote.RemoteTransportException: Startup timed out at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:136) at akka.remote.Remoting.start(Remoting.scala:198) at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184) at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:618) at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:615) at akka.actor.ActorSystemImpl._start(ActorSystem.scala:615) at akka.actor.ActorSystemImpl.start(ActorSystem.scala:632) at akka.actor.ActorSystem$.apply(ActorSystem.scala:141) at akka.actor.ActorSystem$.apply(ActorSystem.scala:118) at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1765) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1756) at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56) at org.apache.spark.deploy.master.Master$.startSystemAndActor(Master.scala:849) at org.apache.spark.deploy.master.Master$.main(Master.scala:829) at org.apache.spark.deploy.master.Master.main(Master.scala) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at akka.remote.Remoting.start(Remoting.scala:180) ... 17 more ] 15/02/18 01:47:37 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 15/02/18 01:47:37 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports .
7.3 指定stop-all.sh将所有的Master和Slaves进程关闭,然后再次启动start-all.sh,Master和两个Slave都启动成功,
7.3.1 Master的启动日志如下:
15/02/18 01:54:50 INFO master.Master: Registered signal handlers for [TERM, HUP, INT] 15/02/18 01:54:50 INFO spark.SecurityManager: Changing view acls to: hadoop 15/02/18 01:54:50 INFO spark.SecurityManager: Changing modify acls to: hadoop 15/02/18 01:54:50 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 15/02/18 01:54:51 INFO slf4j.Slf4jLogger: Slf4jLogger started 15/02/18 01:54:51 INFO Remoting: Starting remoting 15/02/18 01:54:52 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkMaster@192.168.26.131:7077] 15/02/18 01:54:52 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkMaster@192.168.26.131:7077] 15/02/18 01:54:52 INFO util.Utils: Successfully started service 'sparkMaster' on port 7077. 15/02/18 01:54:52 INFO master.Master: Starting Spark master at spark://192.168.26.131:7077 15/02/18 01:54:52 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/02/18 01:54:52 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:7087 15/02/18 01:54:52 INFO util.Utils: Successfully started service 'MasterUI' on port 7087. 15/02/18 01:54:52 INFO ui.MasterWebUI: Started MasterWebUI at http://hadoop.master:7087 15/02/18 01:54:53 INFO master.Master: I have been elected leader! New state: ALIVE 15/02/18 01:54:57 INFO master.Master: Registering worker hadoop.slave2:8077 with 1 cores, 971.0 MB RAM 15/02/18 01:54:57 INFO master.Master: Registering worker hadoop.slave1:8077 with 1 cores, 971.0 MB RAM
可见
7.3.1.1. Master启动并监听于7077端口,
7.3.1.2. Master的WebUI已经启动,访问地址是
7.3.1.3 hadoop.slave1和hadoop.slave2都监听于8077端口,并且已经注册到Master节点上
7.3.2 Slave端的日志如下:
15/02/18 01:54:54 INFO worker.Worker: Registered signal handlers for [TERM, HUP, INT] 15/02/18 01:54:54 INFO spark.SecurityManager: Changing view acls to: hadoop 15/02/18 01:54:54 INFO spark.SecurityManager: Changing modify acls to: hadoop 15/02/18 01:54:54 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 15/02/18 01:54:55 INFO slf4j.Slf4jLogger: Slf4jLogger started 15/02/18 01:54:56 INFO Remoting: Starting remoting 15/02/18 01:54:56 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkWorker@hadoop.slave1:8077] 15/02/18 01:54:56 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkWorker@hadoop.slave1:8077] 15/02/18 01:54:56 INFO util.Utils: Successfully started service 'sparkWorker' on port 8077. 15/02/18 01:54:56 INFO worker.Worker: Starting Spark worker hadoop.slave1:8077 with 1 cores, 971.0 MB RAM 15/02/18 01:54:56 INFO worker.Worker: Spark home: /home/hadoop/software/spark-1.2.1-bin-hadoop2.4 15/02/18 01:54:56 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/02/18 01:54:56 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:8081 15/02/18 01:54:56 INFO util.Utils: Successfully started service 'WorkerUI' on port 8081. 15/02/18 01:54:56 INFO ui.WorkerWebUI: Started WorkerWebUI at http://hadoop.slave1:8081 15/02/18 01:54:56 INFO worker.Worker: Connecting to master spark://192.168.26.131:7077... 15/02/18 01:54:57 INFO worker.Worker: Successfully registered with master spark://192.168.26.131:7077
8 UI展现
8.1 Master的UI展示(7087端口):
8.2 Slave的UI展示(8081端口):
9 Spark集群测试:
9.1. 在Master上启动一个Spark Shell,执行如下操作,发现所有的结果存在于Master,而没有在Worker上执行,这是为什么呢?怀疑bin/spark-shell命令默认的使用本地local作为master,即 spark-shell --master local
bin>./spark-shell scala> val rdd = sc.parallelize(List(1,3,7,7,8,9,11,2,11,33,44,99,111,2432,4311,111,111), 7) rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[2] at parallelize at <console>:12 scala> rdd.saveAsTextFile("file:///home/hadoop/output")
9.2. 退出上面的shell,执行如下的操作
./spark-shell --master spark://192.168.26.131:7077
上面的IP,如果指定成域名,例如hadoop.master,则提示连接不上,,不知道为什么。
从日志中,可以看出来,Spark的数据已经提交给Worker去执行,
scala> rdd.saveAsTextFile("file:///home/hadoop/output2") 15/02/18 02:37:55 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 15/02/18 02:37:55 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 15/02/18 02:37:55 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 15/02/18 02:37:55 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 15/02/18 02:37:55 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 15/02/18 02:37:55 INFO spark.SparkContext: Starting job: saveAsTextFile at <console>:15 15/02/18 02:37:55 INFO scheduler.DAGScheduler: Got job 0 (saveAsTextFile at <console>:15) with 7 output partitions (allowLocal=false) 15/02/18 02:37:55 INFO scheduler.DAGScheduler: Final stage: Stage 0(saveAsTextFile at <console>:15) 15/02/18 02:37:55 INFO scheduler.DAGScheduler: Parents of final stage: List() 15/02/18 02:37:55 INFO scheduler.DAGScheduler: Missing parents: List() 15/02/18 02:37:55 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[1] at saveAsTextFile at <console>:15), which has no missing parents 15/02/18 02:37:55 INFO storage.MemoryStore: ensureFreeSpace(112056) called with curMem=0, maxMem=280248975 15/02/18 02:37:55 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 109.4 KB, free 267.2 MB) 15/02/18 02:37:55 INFO storage.MemoryStore: ensureFreeSpace(67552) called with curMem=112056, maxMem=280248975 15/02/18 02:37:55 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 66.0 KB, free 267.1 MB) 15/02/18 02:37:55 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop.master:44435 (size: 66.0 KB, free: 267.2 MB) 15/02/18 02:37:55 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0 15/02/18 02:37:55 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:838 15/02/18 02:37:55 INFO scheduler.DAGScheduler: Submitting 7 missing tasks from Stage 0 (MappedRDD[1] at saveAsTextFile at <console>:15) 15/02/18 02:37:55 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 7 tasks 15/02/18 02:37:55 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, hadoop.slave1, PROCESS_LOCAL, 1208 bytes) 15/02/18 02:37:55 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, hadoop.slave2, PROCESS_LOCAL, 1208 bytes) 15/02/18 02:38:06 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop.slave1:41802 (size: 66.0 KB, free: 267.2 MB) 15/02/18 02:38:06 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop.slave2:34337 (size: 66.0 KB, free: 267.2 MB) 15/02/18 02:38:24 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, hadoop.slave1, PROCESS_LOCAL, 1212 bytes) 15/02/18 02:38:24 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 28274 ms on hadoop.slave1 (1/7) 15/02/18 02:38:24 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, hadoop.slave2, PROCESS_LOCAL, 1208 bytes) 15/02/18 02:38:24 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 28719 ms on hadoop.slave2 (2/7) 15/02/18 02:38:26 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, hadoop.slave2, PROCESS_LOCAL, 1212 bytes) 15/02/18 02:38:26 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 1908 ms on hadoop.slave2 (3/7) 15/02/18 02:38:26 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, hadoop.slave1, PROCESS_LOCAL, 1208 bytes) 15/02/18 02:38:26 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 2595 ms on hadoop.slave1 (4/7) 15/02/18 02:38:27 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, hadoop.slave2, PROCESS_LOCAL, 1212 bytes) 15/02/18 02:38:27 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 440 ms on hadoop.slave2 (5/7) 15/02/18 02:38:27 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 336 ms on hadoop.slave1 (6/7) 15/02/18 02:38:27 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 193 ms on hadoop.slave2 (7/7) 15/02/18 02:38:27 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 15/02/18 02:38:27 INFO scheduler.DAGScheduler: Stage 0 (saveAsTextFile at <console>:15) finished in 31.266 s 15/02/18 02:38:27 INFO scheduler.DAGScheduler: Job 0 finished: saveAsTextFile at <console>:15, took 31.932050 s
结果发现,虽然Slave1和Slave2已经有了输出目录,可是目录底下是没有数据的
9.3 执行workdcount也是如此结果
bin/spark-shell --master spark://192.168.26.131:7077 scala> var rdd = sc.textFile("file:///home/hadoop/history.txt.used.byspark", 7) rdd.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_ + _,5).map(x => (x._2, x._1)).sortByKey(false).map(x => (x._2, x._1)).saveAsTextFile("file:///home/hadoop/output")
9.4 运行Spark自带的SparkPi程序,
./run-example SparkPi 1000 --master spark://192.168.26.131:7077结果能够输出:Pi is roughly 3.14173708
9.5 执行workdcount(读写HDFS)
bin/spark-shell --master spark://192.168.26.131:7077 scala> var rdd = sc.textFile("/user/hadoop/history.txt.used.byspark", 7) rdd.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_ + _,5).map(x => (x._2, x._1)).sortByKey(false).map(x => (x._2, x._1)).saveAsTextFile("/user/hadoop/output")
可以得到正确的结果是正确的part-00000~part-00004
可见slave1执行了3个任务,slave2执行了两个Task,通过Shuffle,分别读取了661B和1248B
相关推荐
1. 解压Spark安装包 2. 配置Spark环境变量 2. 修改 spark-env.sh 文件,完成以下设置: 1. 设置运行master进程的节点, e
Spark standalone 分布式集群搭建,Spark standalone运行模式,Spark Standalone运行架构解析---Spark基本工作流程,Spark Standalone运行架构解析---Spark local cluster模式
5.Spark 分布式缓存 6.业务报表分析 7.应用执行部署 8.Oozie和Hue集成调度Spark 应用 第五章、SparkStreaming 模块 1.Streaming流式应用概述 2.Streaming 计算模式 3.SparkStreaming计算思路 4.入门案例 5.Spark...
Spark Standalone架构设计.docx
这篇博客,Alice为大家带来的是Spark集群环境搭建之——standalone集群模式。 文章目录集群角色介绍集群规划修改配置并分发启动和停止查看web界面测试 集群角色介绍 Spark是基于内存计算的大数据并行计算框架,...
springboot整合spark连接远程服务计算框架使用standAlone模式
Spark 官方文档翻译,Spark 单机版部署方式,在CentOS 7 的环境下的开发方法
Spark-Core文档是本人经三年总结笔记汇总而来,对于自我学习Spark核心基础知识非常方便,资料中例举完善,内容丰富。具体目录如下: 目录 第一章 Spark简介与计算模型 3 1 What is Spark 3 2 Spark简介 3 3 Spark...
Spark standalone 单机版部署,看了网上很多方法,事实证明都是错误的,本人亲身经历,在导师的指导下,成功配置成功单机版。
独立部署模式standalone下spark配置,从乌班图到jak,scala,hadoop,spark的安装 部署
│ 03-[掌握]-Spark环境搭建-Standalone集群模式.mp4 │ 06-[理解]-Spark环境搭建-On-Yarn-两种模式.mp4 │ 07-[掌握]-Spark环境搭建-On-Yarn-两种模式演示.mp4 │ 09-[掌握]-Spark代码开发-准备工作.mp4 │ 10...
Spark Standalone 模式 Spark on Mesos Spark on YARN Spark on YARN 上运行 准备 Spark on YARN 配置 调试应用 Spark 属性 重要提示 在一个安全的集群中运行 用 Apache Oozie 来运行应用程序 Kerberos ...
spark的Standalone模式安装 一、安装流程 1、将spark-2.2.0-bin-hadoop2.7.tgz 上传到 /usr/local/spark/ 下,然后解压 2、进入到conf中修改名字 改为 .sh 结尾的 3、编辑 spark-env.sh 4、修改slaves 的...
ClusterManager:在Standalone模式中即为Master(主节点),控制整个集群,监控Worker。在YARN模式中为资源管理器。 Worker:从节点,负责控制计算节点,启动Executor。在YARN模式中为NodeManager,负责计算节点的...
Spark-2.3.1源码解读。 Spark Core源码阅读 Spark Context 阅读要点 Spark的缓存,变量,shuffle数据等清理及机制 Spark-submit关于参数及部署模式的部分解析 GroupByKey VS ReduceByKey OrderedRDDFunctions...
该资源主要是描述spark运行模式中的spark standalone模式和spark on yarn模式。详细内容可参见本人博客
Spark Standalone模式集成HDFS配置清单,教你如何配置spark和hdfs平台。由于Linux的防火墙限制,初学者嫌麻烦可以关闭防火墙。
Apache Spark is a general data processing framework. That means you can use it for all kinds of computing tasks. And that means any book on Apache Spark needs to cover a lot of different topics. We’...
由于技术水平、实验条件、经验等限制,当前只讨论 Spark core standalone 版本中的核心功能,而不是全部功能。诚邀各位小伙伴们加入进来,丰富和完善文档。 好久没有写这么完整的文档了,上次写还是三年前在学 Ng ...
三种方式的spark on kubernetes对比,第一种:spark原生支持Kubernetes资源调度;第二种:google集成的Kubernetes的spark插件sparkoperator;第三种:standalone方式运行spark集群