0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

【備忘録】pandas: 重複している値とそれに紐づく値を調べる

Posted at

やりたいこと

IDとNAME列のテーブルがあります。
NAMEがいくつか重複していて、どのNAMEでどのIDが重複しているかを知りたいです。


>>> df = pandas.DataFrame({"ID":["a","b","c","d","e","f","g","h"], 
"NAME":["Alice","Bob","Chris","Chris","Chris","Dave","Eve","Eve"]})

>>> df
  ID   NAME
0  a  Alice
1  b    Bob
2  c  Chris
3  d  Chris
4  e  Chris
5  f   Dave
6  g    Eve
7  h    Eve

書いたコード

pandas.DataFrame.duplicated 関数で、重複しているデータを抽出します。

>>> df2 = df[df["NAME"].duplicated(keep=False)]

>>> df2
  ID   NAME
2  c  Chris
3  d  Chris
4  e  Chris
6  g    Eve
7  h    Eve

ChrisとEveが重複していることが分かりました。
ChrisとEveがどのIDなのかをより分かりやすくするため、IDを列方向に並べます。

まずは、DataFrameGroupBy.cumcount 関数で、NAMEごとに連番を振ります。

>>> df2["group_index"] = df2.groupby("NAME").cumcount()

>>> df2
  ID   NAME  group_index
2  c  Chris            0
3  d  Chris            1
4  e  Chris            2
6  g    Eve            0
7  h    Eve            1

そして、pandas.DataFrame.unstack 関数で、NAMEに対応するIDを列方向に並べました。

>>> df3 = df2.set_index(["NAME","group_index"])

>>> df3
                  ID
NAME  group_index   
Chris 0            c
      1            d
      2            e
Eve   0            g
      1            h

>>> df4 = df3.unstack()

>>> df4
            ID        
group_index  0  1    2
NAME                  
Chris        c  d    e
Eve          g  h  NaN

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?