반응형

Scala 버전만 작성하였다.


Java, Python, R에 대해서도 정보가 필요하다면 글 하단의 참조 링크를 참고 바란다.


SparkConf( Spark 1.6 / Spark 2.x)

You will continue to use these classes (via the sparkContext accessor) to perform operations that require the Spark Core API, such as working with accumulators, broadcast variables, or low-level RDDs. However, you will not need to manually create it.

// Spark 1.6
val sparkConf = new SparkConf().setMaster("local[*]")
sparkConf.set("spark.files", "file.txt")
 
// Spark 2.x
val spark = SparkSession.builder.master("local[*]").getOrCreate()
spark.conf.set("spark.files", "file.txt")


SparkContext( Spark 1.6 / Spark 2.x)

The SQLContext is completely superceded by SparkSession. Most Dataset and DataFrame operations are directly available in SparkSession. Operations related to table and database metadata are now encapsulated in a Catalog (via the catalog accessor).

// Spark 1.6
val sparkConf = new SparkConf()
val sc = new SparkContext(sparkConf)
val sqlContext = new SQLContext(sc)
val df = sqlContext.read.json("data.json")
val tables = sqlContext.tables()
 
// Spark 2.x
val spark = SparkSession.builder.getOrCreate()
val df = spark.read.json("data.json")
val tables = spark.catalog.listTables()


HiveContext( Spark 1.6 / Spark 2.x)

The HiveContext is completely superceded by SparkSession. You will need enable Hive support when you create your SparkSession and include the necessary Hive library dependencies in your classpath.

// Spark 1.6
val sparkConf = new SparkConf()
val sc = new SparkContext(sparkConf)
val hiveContext = new HiveContext(sc)
val df = hiveContext.sql("SELECT * FROM hiveTable")
 
// Spark 2.x
val spark = SparkSession.builder.enableHiveSupport().getOrCreate()
val df = spark.sql("SELECT * FROM hiveTable")


ref : https://sparkour.urizone.net/recipes/understanding-sparksession/

반응형

+ Recent posts