0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

scala sparkの勉強備忘録③ ~dataframeの結合~

Posted at

dataframeの結合

dataDF.join(dataDFother, dataDF("name") === dataDFother("name"), "inner").show()

結合の方法にはよくあるinnerやleftouter, rightouter, fullouterの他にleftsemiやleftantiがある
どのようなものかは以下の通り

scala> dataDF.show()
+------+---+----------+
|  name|age|  birthday|
+------+---+----------+
|Brooke| 20|2001-06-19|
|  Bake| 25|1996-07-25|
| Denny| 31|1990-08-16|
| Jules| 30|1991-05-25|
|    TD| 35|1986-07-26|
+------+---+----------+

scala> dataDFother.show()
+------+---+----------+
|  name|age|  birthday|
+------+---+----------+
|Brooke| 20|2001-06-19|
|  Bake| 25|1996-07-25|
| Denny| 31|1990-08-17|
|   Jon| 30|1991-07-25|
|    AK| 35|1986-07-29|
+------+---+----------+

scala> dataDF.join(dataDFother, dataDF("name") === dataDFother("name"), "leftsemi").show()
+------+---+----------+
|  name|age|  birthday|
+------+---+----------+
|Brooke| 20|2001-06-19|
|  Bake| 25|1996-07-25|
| Denny| 31|1990-08-16|
+------+---+----------+
# leftsimiは左右に共通するkeyを持つ行の左側データのみを出す

scala> dataDF.join(dataDFother, dataDF("name") === dataDFother("name"), "leftanti").show()
+-----+---+----------+
| name|age|  birthday|
+-----+---+----------+
|Jules| 30|1991-05-25|
|   TD| 35|1986-07-26|
+-----+---+----------+
# leftantiは左側のデータのうち左右で共通しない行を出す

この様にleftsemiとleftantiは結合というより抽出の様なイメージである

scala> dataDF.join(dataDFother, dataDF("name") === dataDFother("name") and dataDF("birthday") === dataDFother("birthday"), "inner").show()
+------+---+----------+------+---+----------+
|  name|age|  birthday|  name|age|  birthday|
+------+---+----------+------+---+----------+
|Brooke| 20|2001-06-19|Brooke| 20|2001-06-19|
|  Bake| 25|1996-07-25|  Bake| 25|1996-07-25|
+------+---+----------+------+---+----------+
# keyを複数にするときはandで繋ぐ

scala> dataDF.join(dataDFother, dataDF("name") === dataDFother("name") or dataDF("birthday") === dataDFother("birthday"), "inner").show()
+------+---+----------+------+---+----------+
|  name|age|  birthday|  name|age|  birthday|
+------+---+----------+------+---+----------+
|Brooke| 20|2001-06-19|Brooke| 20|2001-06-19|
|  Bake| 25|1996-07-25|  Bake| 25|1996-07-25|
| Denny| 31|1990-08-16| Denny| 31|1990-08-17|
+------+---+----------+------+---+----------+
#orでもできる

scala> dataDF.join(dataDFother, dataDF("name") === dataDFother("name") & dataDF("birthday") === dataDFother("birthday"), "inner").show()
<console>:35: error: value & is not a member of org.apache.spark.sql.Column
       dataDF.join(dataDFother, dataDF("name") === dataDFother("name") & dataDF("birthday") === dataDFother("birthday"), "inner").show()
#記号じゃダメらしい
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?