[SPARK] 安裝 SPARK 1.6.1 測試環境
在Spark的介紹中,
我們可以看到Spark的相關的專案:
Spark runs on Java 7+, Python 2.6+ and R 3.1+. For the Scala API, Spark 1.6.1 uses Scala 2.10. You will need to use a compatible Scala version (2.10.x).
其中, 需要預先安裝的元件包括:
Java (1.7版本以上)
Scala (2.10版本以上)
在這一篇文章中, 將在ubuntu 16.02上安裝spark的測試環境,
一開始, 我們先安裝java 1.7的版本:
安裝完後, 透過指令, 我們可以看到現在安裝的java version:
接著, 我們去安裝Scala 2.10.4的版本,
解壓縮完scala後, 我們要設定.bashrc中的scala路徑,
.bashrc是個人環境變數設定檔,
用來設定在該環境下的環境變數...
把以下的內容補充在.bashrc下方:
透過以下指令, 我們可以確認scala的版本:
接著, 我們安裝GIT環境,
並且接下來下載spark 1.6.1的版本並解壓縮spark 1.6.1,
透過SBT(Simple Build Tool), 我們接著編譯spark的環境
接下來, 會跑上許多時間...
根據跑出來的log與info資訊,
在此編譯過程中, 會下載hadoop, maven等相關元件,
不過, 有許多warning和error訊息,
這一部分應該要trace一下在SBT的安裝過程到底會抓取那些元件,
如果有更多資訊, 會在之後的文章補充...
在經過一段時間等候後,
我們可以用以下範例測試spark,
以下是執行後的結果:
參考資料:
http://blog.prabeeshk.com/blog/2014/10/31/install-apache-spark-on-ubuntu-14-dot-04/
https://spark.apache.org/downloads.html
http://www.apache.org/dyn/closer.lua/spark/spark-1.6.1/spark-1.6.1.tgz
我們可以看到Spark的相關的專案:
Spark runs on Java 7+, Python 2.6+ and R 3.1+. For the Scala API, Spark 1.6.1 uses Scala 2.10. You will need to use a compatible Scala version (2.10.x).
其中, 需要預先安裝的元件包括:
Java (1.7版本以上)
Scala (2.10版本以上)
在這一篇文章中, 將在ubuntu 16.02上安裝spark的測試環境,
一開始, 我們先安裝java 1.7的版本:
$ sudo apt-add-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer
安裝完後, 透過指令, 我們可以看到現在安裝的java version:
$ java -version
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
接著, 我們去安裝Scala 2.10.4的版本,
$ wget http://www.scala-lang.org/files/archive/scala-2.10.4.tgz
$ sudo mkdir /usr/local/src/scala
$ sudo tar xvf scala-2.10.4.tgz -C /usr/local/src/scala/
解壓縮完scala後, 我們要設定.bashrc中的scala路徑,
.bashrc是個人環境變數設定檔,
用來設定在該環境下的環境變數...
$ vi .bashrc
把以下的內容補充在.bashrc下方:
# SCALA
export SCALA_HOME=/usr/local/src/scala/scala-2.10.4
export PATH=$SCALA_HOME/bin:$PATH
透過以下指令, 我們可以確認scala的版本:
$ scala -version
Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
接著, 我們安裝GIT環境,
$ sudo apt-get install git
並且接下來下載spark 1.6.1的版本並解壓縮spark 1.6.1,
$ wget http://ftp.twaren.net/Unix/Web/apache/spark/spark-1.6.1/spark-1.6.1.tgz
$ tar xvf spark-1.6.1.tgz
透過SBT(Simple Build Tool), 我們接著編譯spark的環境
$ cd spark-1.6.1
$ sbt/sbt assembly
接下來, 會跑上許多時間...
根據跑出來的log與info資訊,
在此編譯過程中, 會下載hadoop, maven等相關元件,
不過, 有許多warning和error訊息,
這一部分應該要trace一下在SBT的安裝過程到底會抓取那些元件,
如果有更多資訊, 會在之後的文章補充...
在經過一段時間等候後,
我們可以用以下範例測試spark,
./bin/run-example SparkPi 10
以下是執行後的結果:
ubuntu@spark:~/spark-1.6.1$ ./bin/run-example SparkPi 10
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/06/01 11:44:06 INFO SparkContext: Running Spark version 1.6.1
16/06/01 11:44:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/06/01 11:44:09 INFO SecurityManager: Changing view acls to: ubuntu
16/06/01 11:44:09 INFO SecurityManager: Changing modify acls to: ubuntu
16/06/01 11:44:09 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); users with modify permissions: Set(ubuntu)
16/06/01 11:44:11 INFO Utils: Successfully started service 'sparkDriver' on port 40120.
16/06/01 11:44:13 INFO Slf4jLogger: Slf4jLogger started
16/06/01 11:44:13 INFO Remoting: Starting remoting
16/06/01 11:44:13 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.144.26:48032]
16/06/01 11:44:13 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 48032.
16/06/01 11:44:13 INFO SparkEnv: Registering MapOutputTracker
16/06/01 11:44:13 INFO SparkEnv: Registering BlockManagerMaster
16/06/01 11:44:13 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-5e1eaf89-3adc-4ebc-915b-3f9683568b34
16/06/01 11:44:14 INFO MemoryStore: MemoryStore started with capacity 511.5 MB
16/06/01 11:44:14 INFO SparkEnv: Registering OutputCommitCoordinator
16/06/01 11:44:14 INFO Server: jetty-8.y.z-SNAPSHOT
16/06/01 11:44:14 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
16/06/01 11:44:14 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/06/01 11:44:14 INFO SparkUI: Started SparkUI at http://192.168.144.26:4040
16/06/01 11:44:14 INFO HttpFileServer: HTTP File server directory is /tmp/spark-e250b06e-9683-45d4-bab1-0f74970e5a4f/httpd-820fd134-dc09-4f6f-9dc5-4fa476a2c3e1
16/06/01 11:44:14 INFO HttpServer: Starting HTTP Server
16/06/01 11:44:15 INFO Server: jetty-8.y.z-SNAPSHOT
16/06/01 11:44:15 INFO AbstractConnector: Started SocketConnector@0.0.0.0:57708
16/06/01 11:44:15 INFO Utils: Successfully started service 'HTTP file server' on port 57708.
16/06/01 11:44:20 INFO SparkContext: Added JAR file:/home/ubuntu/spark-1.6.1/examples/target/scala-2.10/spark-examples-1.6.1-hadoop2.2.0.jar at http://192.168.144.26:57708/jars/spark-examples-1.6.1-hadoop2.2.0.jar with timestamp 1464781460113
16/06/01 11:44:20 INFO Executor: Starting executor ID driver on host localhost
16/06/01 11:44:20 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51943.
16/06/01 11:44:20 INFO NettyBlockTransferService: Server created on 51943
16/06/01 11:44:20 INFO BlockManagerMaster: Trying to register BlockManager
16/06/01 11:44:20 INFO BlockManagerMasterEndpoint: Registering block manager localhost:51943 with 511.5 MB RAM, BlockManagerId(driver, localhost, 51943)
16/06/01 11:44:20 INFO BlockManagerMaster: Registered BlockManager
16/06/01 11:44:25 INFO SparkContext: Starting job: reduce at SparkPi.scala:36
16/06/01 11:44:27 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:36) with 10 output partitions
16/06/01 11:44:27 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:36)
16/06/01 11:44:27 INFO DAGScheduler: Parents of final stage: List()
16/06/01 11:44:27 INFO DAGScheduler: Missing parents: List()
16/06/01 11:44:27 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32), which has no missing parents
16/06/01 11:44:28 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1904.0 B, free 1904.0 B)
16/06/01 11:44:28 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1218.0 B, free 3.0 KB)
16/06/01 11:44:28 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:51943 (size: 1218.0 B, free: 511.5 MB)
16/06/01 11:44:28 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
16/06/01 11:44:28 INFO DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32)
16/06/01 11:44:28 INFO TaskSchedulerImpl: Adding task set 0.0 with 10 tasks
16/06/01 11:44:28 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 2157 bytes)
16/06/01 11:44:28 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, partition 1,PROCESS_LOCAL, 2157 bytes)
16/06/01 11:44:28 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, partition 2,PROCESS_LOCAL, 2157 bytes)
16/06/01 11:44:28 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, partition 3,PROCESS_LOCAL, 2157 bytes)
16/06/01 11:44:28 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)
16/06/01 11:44:28 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
16/06/01 11:44:28 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
16/06/01 11:44:28 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
16/06/01 11:44:28 INFO Executor: Fetching http://192.168.144.26:57708/jars/spark-examples-1.6.1-hadoop2.2.0.jar with timestamp 1464781460113
16/06/01 11:44:28 INFO Utils: Fetching http://192.168.144.26:57708/jars/spark-examples-1.6.1-hadoop2.2.0.jar to /tmp/spark-e250b06e-9683-45d4-bab1-0f74970e5a4f/userFiles-23014585-18b5-4cde-8bd3-170edab487e9/fetchFileTemp7405909665703120277.tmp
16/06/01 11:44:37 INFO Executor: Adding file:/tmp/spark-e250b06e-9683-45d4-bab1-0f74970e5a4f/userFiles-23014585-18b5-4cde-8bd3-170edab487e9/spark-examples-1.6.1-hadoop2.2.0.jar to class loader
16/06/01 11:44:37 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1031 bytes result sent to driver
16/06/01 11:44:37 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 1031 bytes result sent to driver
16/06/01 11:44:37 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, localhost, partition 4,PROCESS_LOCAL, 2157 bytes)
16/06/01 11:44:37 INFO Executor: Running task 4.0 in stage 0.0 (TID 4)
16/06/01 11:44:37 INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, localhost, partition 5,PROCESS_LOCAL, 2157 bytes)
16/06/01 11:44:37 INFO Executor: Running task 5.0 in stage 0.0 (TID 5)
16/06/01 11:44:37 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 1031 bytes result sent to driver
16/06/01 11:44:37 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 9013 ms on localhost (1/10)
16/06/01 11:44:37 INFO TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, localhost, partition 6,PROCESS_LOCAL, 2157 bytes)
16/06/01 11:44:37 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1031 bytes result sent to driver
16/06/01 11:44:37 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 8930 ms on localhost (2/10)
16/06/01 11:44:37 INFO Executor: Running task 6.0 in stage 0.0 (TID 6)
16/06/01 11:44:37 INFO TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, localhost, partition 7,PROCESS_LOCAL, 2157 bytes)
16/06/01 11:44:37 INFO Executor: Running task 7.0 in stage 0.0 (TID 7)
16/06/01 11:44:37 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 8952 ms on localhost (3/10)
16/06/01 11:44:37 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 8958 ms on localhost (4/10)
16/06/01 11:44:37 INFO Executor: Finished task 4.0 in stage 0.0 (TID 4). 1031 bytes result sent to driver
16/06/01 11:44:37 INFO TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, localhost, partition 8,PROCESS_LOCAL, 2157 bytes)
16/06/01 11:44:37 INFO Executor: Running task 8.0 in stage 0.0 (TID 8)
16/06/01 11:44:37 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 162 ms on localhost (5/10)
16/06/01 11:44:37 INFO Executor: Finished task 6.0 in stage 0.0 (TID 6). 1031 bytes result sent to driver
16/06/01 11:44:37 INFO TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, localhost, partition 9,PROCESS_LOCAL, 2157 bytes)
16/06/01 11:44:37 INFO TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 277 ms on localhost (6/10)
16/06/01 11:44:37 INFO Executor: Running task 9.0 in stage 0.0 (TID 9)
16/06/01 11:44:37 INFO Executor: Finished task 7.0 in stage 0.0 (TID 7). 1031 bytes result sent to driver
16/06/01 11:44:37 INFO TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 314 ms on localhost (7/10)
16/06/01 11:44:37 INFO Executor: Finished task 5.0 in stage 0.0 (TID 5). 1031 bytes result sent to driver
16/06/01 11:44:37 INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 394 ms on localhost (8/10)
16/06/01 11:44:37 INFO Executor: Finished task 8.0 in stage 0.0 (TID 8). 1031 bytes result sent to driver
16/06/01 11:44:37 INFO TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 284 ms on localhost (9/10)
16/06/01 11:44:37 INFO Executor: Finished task 9.0 in stage 0.0 (TID 9). 1031 bytes result sent to driver
16/06/01 11:44:37 INFO TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 139 ms on localhost (10/10)
16/06/01 11:44:37 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:36) finished in 9.477 s
16/06/01 11:44:37 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/06/01 11:44:37 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:36, took 12.426088 s
Pi is roughly 3.142784
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/api,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump/json,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job/json,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/json,null}
16/06/01 11:44:37 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs,null}
16/06/01 11:44:38 INFO SparkUI: Stopped Spark web UI at http://192.168.144.26:4040
16/06/01 11:44:38 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/06/01 11:44:38 INFO MemoryStore: MemoryStore cleared
16/06/01 11:44:38 INFO BlockManager: BlockManager stopped
16/06/01 11:44:38 INFO BlockManagerMaster: BlockManagerMaster stopped
16/06/01 11:44:38 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/06/01 11:44:38 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/06/01 11:44:38 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/06/01 11:44:38 INFO SparkContext: Successfully stopped SparkContext
16/06/01 11:44:38 INFO ShutdownHookManager: Shutdown hook called
16/06/01 11:44:38 INFO ShutdownHookManager: Deleting directory /tmp/spark-e250b06e-9683-45d4-bab1-0f74970e5a4f
16/06/01 11:44:38 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
16/06/01 11:44:38 INFO ShutdownHookManager: Deleting directory /tmp/spark-e250b06e-9683-45d4-bab1-0f74970e5a4f/httpd-820fd134-dc09-4f6f-9dc5-4fa476a2c3e1
參考資料:
http://blog.prabeeshk.com/blog/2014/10/31/install-apache-spark-on-ubuntu-14-dot-04/
https://spark.apache.org/downloads.html
http://www.apache.org/dyn/closer.lua/spark/spark-1.6.1/spark-1.6.1.tgz
留言
張貼留言