3
5

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

Pythonで2つのリストの差集合を順序を保持したまま求める

Posted at

#どういう時に使うの?
通常リストの差集合を求める際にはset型というものを用いて求めます。
しかし、リストをset型に変換した時点で数字の昇順等でソートされてしまい、重複も削除されてしまいます。
リストの順序が何らかのスコアの高い順のID、のようなデータだと、重要な要素が失われてしまうことになります。
そのようなデータに対して使う感じです。

#イメージ

イメージ
list1 = [5, 4, 3, 2, 1]
list2 = [2, 4]

result = [5, 3, 1]

list1 - list2というイメージです。

#list2の数字をlist1内から一括削除したいとき
##Python版

sample.py
list1 = [5, 4, 3, 4, 2, 1]
list2 = [2, 4]

result = [i for i in list1 if i not in list2]
print(result)
[5, 3, 1]

##Pyspark版

sample.py
import pyspark.sql.functions as F

@F.udf(returnType=ArrayType(IntegerType()))
def udf_list_diff(l1, l2):
    for i in l2:
        l1.remove(i)
    return l1

#list2の数字をlist1内の最初の数字だけ削除したいとき
##Python版

sample.py
def list_diff(l1, l2):
    for i in l2:
        l1.remove(i)
    return l1

list1 = [5, 4, 3, 4, 2, 1]
list2 = [2, 4]

result = list_diff(list1, list2)
print(result)
[5, 3, 4, 1]

##Pyspark版

sample.py
import pyspark.sql.functions as F

@F.udf(returnType=ArrayType(IntegerType()))
def udf_list_diff(l2, l2):
    return [i for i in l1 if i not in l2]

#appendix
##順序を保持しなくてもよい場合

sample.py
list1 = [5, 4, 3, 4, 2, 1]
list2 = [2, 4]

result = (list(set(list1) - set(list2)))
print(result)
[1, 3, 5]
3
5
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
5

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?