Using Apache Spark: Technical Preview with HDP 2.2 - Hortonworks The Spark Technical preview lets you evaluate Apache Spark 1.2.0 on YARN with HDP 2.2. With YARN, Hadoop can now support various types of workloads; Spark on YARN becomes yet another workload running against the same set of hardware resources. This technical preview describes how to: Run Spark on YARN and run the canonical Spark examples: SparkPI and Wordcount.Run Spark 1.2 on HDP 2.2.Work with a built-in UDF, collect_list, a key feature of Hive 13. This technical preview provides support for Hive 0.13.1 and instructions on how to call this UDF from Spark shell.Use SparkSQL thrift JDBC/ODBC Server.View history of finished jobs with Spark Job History.Use ORC files with Spark, with examples.Run SparkPI with Tez as the execution engine. When you are ready to go beyond these tasks, try the machine learning examples at Apache Spark. HDP Sandbox Requirements 127.0.0.1 localhost sandbox.hortonworks.com Install the Technical Preview The Spark 1.2.0 Technical Preview is provided as a single tarball. . .
Announcing Spark Packages | Databricks Blog Today, we are happy to announce Spark Packages ( a community package index to track the growing number of open source packages and libraries that work with Apache Spark. Spark Packages makes it easy for users to find, discuss, rate, and install packages for any version of Spark, and makes it easy for developers to contribute packages. Spark Packages will feature integrations with various data sources, management tools, higher level domain-specific libraries, machine learning algorithms, code samples, and other Spark content. Please give Spark Packages a try and let us know if you have any questions when working with the site!
Spark Workshop - Typesafe Activator | Typesafe Apache Spark Workshop Dean Wampler, Ph.D. Typesafe dean.wampler@typesafe.com @deanwampler This workshop demonstrates how to write and run Apache Spark Big Data applications. If you are most interested in using Spark with Hadoop, the Hadoop vendors have preconfigured, virtual machine “sandboxes” with Spark included. For more advanced Spark training and services from Typesafe, please visit typesafe.com/reactive-big-data. Setup Instructions You can work through the examples and exercises on a local workstation, so-called local mode. Let’s discuss setup for local mode first. Setup for Local Mode Working in local mode makes it easy to edit, test, run, and debug applications quickly. We will build and run the examples and exercises using Typesafe Activator, which includes web-based and command-line interfaces. Activator is part of the Typesafe Reactive Platform. Activator also includes SBT, which the UI uses under the hood. You’ll need either Activator or SBT installed. Setup for Hadoop Mode Intro1
GitHub - mikeaddison93/sbt-spark-package: Sbt plugin for Spark packages deanwampler/spark-workshop · GitHub GitHub - mikeaddison93/spark-package-cmd-tool: A command line tool for Spark packages snowplow/spark-example-project · GitHub SparkR by amplab-extras SparkR is an R package that provides a light-weight frontend to use Apache Spark from R. SparkR exposes the Spark API through the RDD class and allows users to interactively run jobs from the R shell on a cluster. NOTE: As of April 2015, SparkR has been officially merged into Apache Spark and is shipping in an upcoming release (1.4) due early summer 2015. You can contribute and follow SparkR developments on the Apache Spark mailing lists and issue tracker. NOTE: The API from the upcoming Spark release (1.4) will not have the same API as described here. Initial support for Spark in R be focussed on high level operations instead of low level ETL. Features SparkR exposes the RDD API of Spark as distributed lists in R. sc <- sparkR.init("local") lines <- textFile(sc, " wordsPerLine <- lapply(lines, function(line) { length(unlist(strsplit(line, " "))) }) In addition to lapply, SparkR also allows closures to be applied on every partition using lapplyWithPartition. . . . .
freeman-lab/thunder · GitHub Spark Summit 2014 Training Archive | Spark Summit Spark Summit 2014 Training Archive Databricks Spark training was offered as part of the a 3-day pass to the Spark Summit, and contained an introductory and advanced track. Both tracks began at 9am on July 2, 2014 and finished by 5pm. Lunch included. You can download the course materials HERE Course Prerequisites: Laptop with WiFi capabilitiesJava 6 or 7 TRACK A: Introduction to Apache Spark Workshop The Introduction to Apache Spark workshop is for users to learn the core Spark APIs. The integrated lecture and lab format covers the following topics: Overview of Big Data and SparkInstalling Spark LocallyUsing Spark’s Core APIs in Scala, Java, & PythonBuilding Spark ApplicationsDeploying on a Big Data ClusterBuilding Applications for Multiple Platforms TRACK B:Advanced Apache Spark Workshop The Advanced Apache Spark Workshop will cover advanced topics on architecture, tuning, and each of Spark’s high-level libraries (including the latest features). Topics covered include: