Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

scala sparkの勉強備忘録③ ~dataframeの結合~

Posted at


dataDF.join(dataDFother, dataDF("name") === dataDFother("name"), "inner").show()

結合の方法にはよくあるinnerやleftouter, rightouter, fullouterの他にleftsemiやleftantiがある

scala> dataDF.show()
|  name|age|  birthday|
|Brooke| 20|2001-06-19|
|  Bake| 25|1996-07-25|
| Denny| 31|1990-08-16|
| Jules| 30|1991-05-25|
|    TD| 35|1986-07-26|

scala> dataDFother.show()
|  name|age|  birthday|
|Brooke| 20|2001-06-19|
|  Bake| 25|1996-07-25|
| Denny| 31|1990-08-17|
|   Jon| 30|1991-07-25|
|    AK| 35|1986-07-29|

scala> dataDF.join(dataDFother, dataDF("name") === dataDFother("name"), "leftsemi").show()
|  name|age|  birthday|
|Brooke| 20|2001-06-19|
|  Bake| 25|1996-07-25|
| Denny| 31|1990-08-16|
# leftsimiは左右に共通するkeyを持つ行の左側データのみを出す

scala> dataDF.join(dataDFother, dataDF("name") === dataDFother("name"), "leftanti").show()
| name|age|  birthday|
|Jules| 30|1991-05-25|
|   TD| 35|1986-07-26|
# leftantiは左側のデータのうち左右で共通しない行を出す


scala> dataDF.join(dataDFother, dataDF("name") === dataDFother("name") and dataDF("birthday") === dataDFother("birthday"), "inner").show()
|  name|age|  birthday|  name|age|  birthday|
|Brooke| 20|2001-06-19|Brooke| 20|2001-06-19|
|  Bake| 25|1996-07-25|  Bake| 25|1996-07-25|
# keyを複数にするときはandで繋ぐ

scala> dataDF.join(dataDFother, dataDF("name") === dataDFother("name") or dataDF("birthday") === dataDFother("birthday"), "inner").show()
|  name|age|  birthday|  name|age|  birthday|
|Brooke| 20|2001-06-19|Brooke| 20|2001-06-19|
|  Bake| 25|1996-07-25|  Bake| 25|1996-07-25|
| Denny| 31|1990-08-16| Denny| 31|1990-08-17|

scala> dataDF.join(dataDFother, dataDF("name") === dataDFother("name") & dataDF("birthday") === dataDFother("birthday"), "inner").show()
<console>:35: error: value & is not a member of org.apache.spark.sql.Column
       dataDF.join(dataDFother, dataDF("name") === dataDFother("name") & dataDF("birthday") === dataDFother("birthday"), "inner").show()

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?