More than 5 years have passed since last update.

（Python）pandas read_excel でExcelの特定列のみ型指定して取得する

Last updated at 2019-09-05Posted at 2019-09-04

■課題 pandas read_excel で取れる値の型を特定の場合のみ文字列に指定したい

Excel ファイルを読み込む pandas.read_excel（ExcelFile のparse）で、文字列「0000」をデフォルトの設定で取得すると、 int 型の 0　として取得されてしまいました。

example.xlsx

test.py

from pandas import DataFrame, ExcelFile
import os
def isExcelFilePath(filepath:str)->bool:
    return (filepath.endswith('.xlsx')  
            or filepath.endswith('.xls'))

def getExcelFile(filepath:str)->ExcelFile:
    if (not os.path.exists(filepath)
        or (not isExcelFilePath(filepath))):
        return None
    return ExcelFile(filepath)
    
def getDataFromExcelFile(excelFile:ExcelFile)->DataFrame:
    return excelFile.parse(
        index_col=None,
    )

dataFrame = getDataFromExcelFile(
    getExcelFile('./sample.xlsx')
)
print (dataFrame)

出力結果.txt

   社員コード   社員名
0      0  田中太郎
1      1  多田久信
2      2  安村花子

本来は**「0000」という文字列**で値を取得したかったため、期待する値を取得する方法を調査しました。

■解決法 read_excel で、パラメータ converters に特定列の型を指定する

read_excel （ExcelFile の parse）のパラメータ converters に特定の列の型を指定することができます。
converters には、カラム名：型の連想配列を設定します。

test.py

def getDataFromExcelFile(excelFile:ExcelFile, converters={})->DataFrame:
    return excelFile.parse(
        index_col=None,
        converters=converters
    )

dataFrame = getDataFromExcelFile(
    getExcelFile('./sample.xlsx'),
    converters={0:str}
)
print (dataFrame)

これで、指定した１列目の値がstring型で取得されました。（またその他の値はデフォルトの設定の時と同じ取得の仕方で取得されています。）

出力結果.txt

  社員コード   社員名
0  0000  田中太郎
1  0001  多田久信
2  0002  安村花子

以上です。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up