全部課程
發(fā)布時(shí)間: 2019-07-26 16:45:44
JDBC介紹
Spark SQL可以通過JDBC從關(guān)系型數(shù)據(jù)庫中讀取數(shù)據(jù)的方式創(chuàng)建DataFrame,通過對DataFrame一系列的計(jì)算后,還可以將數(shù)據(jù)再寫回關(guān)系型數(shù)據(jù)庫中。
從MySQL中加載數(shù)據(jù)(Spark Shell方式)
1.啟動(dòng)Spark Shell,必須指定mysql連接驅(qū)動(dòng)jar包
/home/hadoop/apps/spark/bin/spark-shell \
--master spark://hdp08:7077 \
--jars /home/hadoop/mysql-connector-java-5.1.45.jar \
--driver-class-path /home/hadoop/mysql-connector-java-5.1.45.jar
--executor-memory 1g
--total-executor-cores 2
2.從mysql中加載數(shù)據(jù)
scala> case class Emp(empno: Int, ename: String, job:String,mgr:Int,hiredate:java.util.Date,sal:Float,comm:Float,deptno:Int)
scala>var sqlContext = new org.apache.spark.sql.SQLContext(sc);
scala> val jdbcDF = sqlContext.read.format("jdbc").options(Map("url" -> "jdbc:mysql://hdp08:3306/sqoopdb", "driver" -> "com.mysql.jdbc.Driver", "dbtable" -> "emp", "user" -> "root", "password" -> "root")).load()
3.執(zhí)行查詢
jdbcDF.show()
將數(shù)據(jù)寫入到MySQL中(打jar包方式)
本文介紹使用Idea 開發(fā)spark連接mysql操作,并建立maven 工程進(jìn)行相關(guān)開發(fā)
Maven中的pom.xml文件依賴
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.6.0</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_2.10</artifactId> <version>1.6.0</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.10</artifactId> <version>1.6.0</version> <scope>provided</scope> </dependency> |
編寫Spark SQL程序
package net.togogo.sql import java.util.Properties import org.apache.spark.sql.{SQLContext, Row} import org.apache.spark.sql.types.{StringType, IntegerType, StructField, StructType} import org.apache.spark.{SparkConf, SparkContext} object JdbcRDD { def main(args: Array[String]) { val conf = new SparkConf().setAppName("MySQL-Demo") val sc = new SparkContext(conf) val sqlContext = new SQLContext(sc) //通過并行化創(chuàng)建RDD val personRDD = sc.parallelize(Array("1 tom 5", "2 jerry 3", "3 kitty 6")).map(_.split(" ")) //通過StructType直接指定每個(gè)字段的schema val schema = StructType( List( StructField("id", IntegerType, true), StructField("name", StringType, true), StructField("age", IntegerType, true) ) ) //將RDD映射到rowRDD val rowRDD = personRDD.map(p => Row(p(0).toInt, p(1).trim, p(2).toInt)) //將schema信息應(yīng)用到rowRDD上 val personDataFrame = sqlContext.createDataFrame(rowRDD, schema) //創(chuàng)建Properties存儲(chǔ)數(shù)據(jù)庫相關(guān)屬性 val prop = new Properties() prop.put("user", "root") prop.put("password", "root") //將數(shù)據(jù)追加到數(shù)據(jù)庫 personDataFrame.write.mode("append").jdbc("jdbc:mysql://hdp08:3306/sqoopdb", "sqoopdb.person", prop) //停止SparkContext sc.stop() } } |
?打包與運(yùn)行
1.用maven將程序打包
2.將Jar包提交到spark集群
/home/hadoop/apps/spark/bin/spark-submit \
--class net.togogo.sql.JdbcRDD \
--master spark://hdp08:7077 \
--jars /home/hadoop/mysql-connector-java-5.1.45.jar \
--driver-class-path /home/hadoop/mysql-connector-java-5.1.45.jar \
/home/hadoop/schema.jar