More than 1 year has passed since last update.

よく使うコードmemo

メモ

Last updated at 2023-10-08Posted at 2023-10-05

忘れがちな便利コードをまとめておきます

ライブラリ

import os
import cv2
import glob
from ullib.request import urllopen # スクレイピング
from tqdm import tqdm
# from tqdm.notebook import tqdm #こっちの方がかわいい

urllibの使い方

from urllib.request import urlopen
html = urlopen("http://www.google.com/").read()
print(html)

画像処理

import IPython.display
from IPython.display import display

データ処理系

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

音声処理

import librosa
import librosa.display

関数

librosa.load(".wav")
librosa.istft(データ)
librosa.stft(データ)
librosa.amplitude_to_db
librosa.db_to_amplitude
librosa.display -> waveshow()
librosa.display -> specshow("")
*
（振幅×時間）領域の音声を再生したい場合

display(IPython.display.Audio(y, rate=sr))

Google

from google.colab import drive
drive.mount('/content/drive')

コマンド

!cp -i コピー元 コピー先 
# 重複した場合に上書きするかどうか聞いてほしい場合は-iを付ける

!zip -r zip元 zipファイルの保存先
# ただzip圧縮する場合は-r

Huggingface

(Google Colaboratory)

!pip install huggingface_hub
!huggingface-cli login

それかこれ

# !pip install huggingface_hub
# !pip install git+
from huggingface_hub import notebook_login
notebook_login()

Dataset

!pip install datasets
from datasets import load_dataset
data = load_dataset("model_name")

Pipeline

事前学習済みのモデルを適当に試したいとき

!pip install transformers
from transformers import pipeline

pipe = pipeline("タスク名", model = "使いたいモデルの名前")
pipe("流してみたいデータのパス/strデータなど")
# モデルの推論結果が表示される

To iterate over full datasets it is recommended to use a dataset directly. This means you don’t need to allocate the whole dataset at once, nor do you need to do batching yourself. This should work just as fast as custom loops on GPU. If it doesn’t don’t hesitate to create an issue.

データセットのデータをすべて使用する場合は、データセットを直接使うことを推奨します。データセットを解凍する必要もないし、バッチにする必要もありません。またこうすると、GPU上のカスタムループと同じ速度で実行できます。もしこできなかった場合は、イシューを立てることをためらわないでね。

import datasets
from transformers import pipeline
from transformers.pipelines.pt_utils import KeyDataset
from tqdm.auto import tqdm

pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0)
pipe("wavファイルへのパス") # モデルの出力は喋っている内容
dataset = datasets.load_dataset("superb", name="asr", split="test")

# KeyDataset (only *pt*) will simply return the item in the dict returned by the dataset item
# as we're not interested in the *target* part of the dataset. For sentence pair use KeyPairDataset
for out in tqdm(pipe(KeyDataset(dataset, "file"))):
    print(out)
    # {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"}
    # {"text": ....}
    # ....

with open構文(jsonlファイル)

一行ずつ読み込み

import json
with open("ファイルパス") as fin:
    for line in fin:
        data = json.loads(line) # 一行ずつjsonオブジェクトとして読みこみ

この読み方だと、string型（文字列として読み込まれてしまう）

with open("ファイルパス") as f:
    s = f.read()
    print(s) # ファイルの中身が表示される

ファイル全体をリストとして読み込む場合
- https://note.nkmk.me/python-str-remove-strip/

list = 0 
with open("ファイルパス") as f:
    list = f.readlines()
    # このままだと改行コードも一緒についてくるので、改行を取り除く場合は
    # list = [s.rstrip() for s in f.readlines()] 
list

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up