0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

`pandas.read_csv` : CSVの列数が`names`引数で指定した列数より大きい場合、右側から列名が割り当てらる

Posted at

環境

  • Python 3.11.2
  • pandas 1.5.3
In [37]: pandas.show_versions()
/home/vagrant/.pyenv/versions/3.11.2/lib/python3.11/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS
------------------
commit           : 2e218d10984e9919f0296931d92ea851c6a6faf5
python           : 3.11.2.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.15.0-60-generic
Version          : #66-Ubuntu SMP Fri Jan 20 14:29:49 UTC 2023
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : ja_JP.UTF-8
LOCALE           : ja_JP.UTF-8

pandas           : 1.5.3
numpy            : 1.24.2
pytz             : 2022.7.1
dateutil         : 2.8.2
setuptools       : 65.5.0
pip              : 23.0.1
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : 4.9.2
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 3.1.2
IPython          : 8.11.0
pandas_datareader: None
bs4              : None
bottleneck       : None
brotli           : None
fastparquet      : None
fsspec           : None
gcsfs            : None
matplotlib       : None
numba            : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pyreadstat       : None
pyxlsb           : None
s3fs             : None
scipy            : None
snappy           : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
zstandard        : None
tzdata           : None

やりたいこと

ヘッダ行がないCSVファイルをpandas.read_csv関数で読み込みたいです。

input1.csv
1,Alice,Japan
2,Bob,U.S.
3,Chris,China
In [39]: df = pandas.read_csv("input1.csv", header=None, names=["id","name","country"])

In [40]: df
Out[40]: 
   id   name country
0   1  Alice   Japan
1   2    Bob    U.S.
2   3  Chris   China

In [41]: df.dtypes
Out[41]: 
id          int64
name       object
country    object
dtype: object

ハマったこと

3列のCSVを想定していますが、以下のように末尾にカンマを付けて5列にしても問題なく読み込めると思っていました。

input1.csv
1,Alice,Japan,,
2,Bob,U.S.,,
3,Chris,China,,

しかし、列名に対応する値はズレていました。
また余った列は、df.indexに使われていました。

In [75]: df = pandas.read_csv("input2.csv", header=None, names=["id","name","country"])

In [76]: df
Out[76]: 
            id  name  country
1 Alice  Japan   NaN      NaN
2 Bob     U.S.   NaN      NaN
3 Chris  China   NaN      NaN

In [77]: df.index
Out[77]: 
MultiIndex([(1, 'Alice'),
            (2,   'Bob'),
            (3, 'Chris')],
           )

まとめ

CSVの列数が、pandsa.read_csv関数のnames引数に指定した列数より大きい場合、右側から列名が割り当てられます。左側の余った列は、DataFrameのindexに使われます。

参考にしたサイト

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?