新米エンジニアの黙示録

Last updated at 2024-06-22Posted at 2023-05-02

概要

TIPSのまとめ記事。

拙者概要

業務: Web Appの設計・構築
資格: 応用情報技術者, AWS SAA, AWS DVA, DX Questケーススタディ教育プログラムGold修了証

Docker

Docker

基本構文

powershell

docker run --rm -it --name testcontainer python:3.9 /bin/bash

--rm: コンテナを自動で消去するオプション
-it: コンテナとの入出力を行うオプション
--name: コンテナ名を指定するオプション
/bin/bash: bashに入るオプション

bashで指定のコマンドを実行

powershell

docker run --rm -v ${pwd}:/tmp python:3.9 /bin/bash -c "pip install virtualenv && cd /tmp && virtualenv virtualenv && source virtualenv/bin/activate && pip install -r requirements.txt"

-v: ファイルをコンテナにマウントするオプション
-c: コマンドを指定するオプション

カレントディレクトリに、パッケージ情報をまとめたrequirements.txtを置き、
上記コマンドを実行することで、仮想環境の作成 & pip installしてくれる。

bashファイルをマウントし実行

exec.bash

#!/bin/bash
pip install virtualenv
virtualenv virtualenv
source virtualenv/bin/activate
pip install -r requirements.txt

powershell

docker run -v ${pwd}:/tmp python:3.9 /bin/bash -c "cd tmp && ./exec.bash"

カレントディレクトリにrequirements.txt, x.bashを置き上記コマンドを実行すると、
「bashで指定のコマンドを実行」と同じ結果が得られます。

既存コンテナにログイン

powershell

docker exec -it {container name} /bin/bash

Python

Python

package import 整理

powershell

# isort install
pip install isort

# Python srcディレクトリにてisort実行
isort .

Pandas

Pandas

Pandas概要

Seriesは一次元配列(リスト)に相当
DataFrameは二次元配列(テーブルデータ)に相当

Series 作成

python

import pandas as pd
pd.Series(data = [1, 2, 3, 4, 5])

Series, Indexに名前付け

python

pd.Series(
    data = [30, 35, 40],
    index=['2015 Sales', '2016 Sales', '2017 Sales'],
    name='Product A'
)

DataFrame 作成

python

pd.DataFrame(
    {
        'Yes': [50, 21], 
        'No': [131, 2]
    }
)

Indexに名前付け

python

pd.DataFrame(
    {
        'Bob': ['I liked it.', 'It was awful.'], 
        'Sue': ['Pretty good.', 'Bland.']
    },
    index=['Product A', 'Product B']
)

指定した行、列を出力

python

df.iloc[[0,1,2,3],0:2]
df.iloc[-5:, 0:3]

df.loc[0, "country"]
df.loc[:, ["country", "points", "price"]]

条件付き取得

python

df.loc[
    (df.country == 'Italy') & (df.points >= 50),
    ["points","price"]
]

組込条件セレクター (isin, notnull)

python

df.loc[
    (df.country.isin(["Italy", "France"])) & (df.price.notnull()),
    ["country", "price", "points"]
]

分析

python

# 要約
print(df.price.describe())
# 平均
print(df.price.mean())
# 一意の値
print(df.taster_name.unique())
# 値の出現頻度
print(df.taster_name.value_counts())

各列に関数を適応

python

# nameの文字数を表すlen列を追加
df['len'] = df['name'].apply(lambda x: len(x))

# DataFrame全体を処理
df.apply(lambda x: x/100)

グループ化

python

# pointsでグループ化し、グループの個数を取得
df.groupby(by = "points").points.count()

# pointsでグループ化し、グループのpriceの最小値を取得
df.groupby(by = "points").price.min()

# aggを使用し、複数の関数を実行
df.groupby(['country']).price.agg([len, min, max])

# country, provinceでグループ化し、平均を取得
df.groupby(['country', 'province']).price.mean()

# price, pointsで降順に並び変え
df.sort_values(by=["price", "points"], ascending=False)

PowerShell

PowerShell

Python Cache(pyc file)を再帰的に削除

powershell

Get-ChildItem *.pyc -Recurse | Remove-Item

windowsボタン定期的に押下

powershell

Add-Type -AssemblyName System.Windows.Forms;Write-Host run...;while(1-eq1){[System.Windows.Forms.SendKeys]::SendWait('^{ESC}');Sleep -s 30};

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up