More than 1 year has passed since last update.

PythonでExcelから必要なデータを抜き出す件

Posted at 2023-02-27

メモ

案件リストから必要な案件を抜き出したり、日付から年・月を抜き出す
もっといい書き方があるかもしれないので更新したい

getdata.py python

#!/usr/bin/env python
# coding: utf-8
# 必要なモジュールのインポート
#import matplotlib.pyplot as plt
#import japanize_matplotlib  # <- これ
#import numpy as np
import pandas as pd
#import calendar
#import datetime
# 不要な警告を非表示にする
import warnings
warnings.filterwarnings('ignore')

# 入力Excelのファイル名、シート名
in_file   = 'リスト_xxxxx.xlsx'
in_sheet1 = '案件リスト'
# 出力Excelファイル名
out_file = 'OutData.xlsx'

#----------
# Excelの読込
in_df = pd.read_excel(in_file, sheet_name=in_sheet1)
# 読込み位置とindexを指定
df1 = in_df[3:]
col_lists   = in_df.iloc[2]
df1.columns = col_lists

#----------
# 年と月を抜き出す
def get_int_year_and_month(obj):
    print(obj)
    yy = obj.year
    mm = obj.month
    return (yy, mm)
#----------
# データ加工
def my_proccessing(df):
    # 書き込み用DataFrame
    print("----------")
    # 必要な行のみ抽出
    df = df[df['担当'] == '橋本']
    df = df[df['対応開始月'].notna()]  # 無視
    # 新しい列
    df['対応開始年int'] = 0
    df['対応開始月int'] = 0
    # Excelから読込んだデータに対して一行ごとに処理
    count = 0;
    for index, row in df.iterrows():
        print("index is "+ str(index))
        (yy1, mm1) = get_int_year_and_month(row['対応開始月'])
        row['対応開始年int'] = yy1
        row['対応開始月int'] = mm1
        df.iloc[count] = row
        count += 1
    return df
#----------
# 加工処理
out_df = my_proccessing(df1)
# indexを振り直してファイル書き出し
out_df = out_df.reset_index(drop=True)
#print(out_df.head(5))
with pd.ExcelWriter(out_file) as writer:
    out_df.to_excel(writer, sheet_name=in_sheet1)
    in_df.to_excel(writer, sheet_name='orig')

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up