LoginSignup
1
0

More than 1 year has passed since last update.

PyCaretやーる(Windows10、Python3.8)

Last updated at Posted at 2022-05-02

はじめに

AutoMLのPyCaretやっていきまーす

開発環境

  • Windows 10 PC
  • Python 3.8

導入

conda create -n py38 python=3.8
conda activate py38
pip install pycaret[full]

Classification

1.ワインのクオリティデータセットをダウンロード(winequality-white.csv, winequality-red.csv)

2.モデルを作成する

wine_classification_train.py
import pandas as pd 
from sklearn.model_selection import train_test_split
from pycaret.classification import *

white_wine = pd.read_csv("./winequality-white.csv", sep=';')
red_wine = pd.read_csv("./winequality-red.csv", sep=';')
white_wine['is_red'] = 0.0
red_wine['is_red'] = 1.0
data_df = pd.concat([white_wine, red_wine], axis=0)
data_labels = data_df['quality'] >= 7
data_df = data_df.drop(['quality'], axis=1)
print(data_df.head())

train, test = train_test_split(data_df)
s = setup(train, target='is_red')
best = compare_models()
evaluate_model(best)
predictions = predict_model(best, data=test)
save_model(best, 'wine_classification')

3.モデル(wine_classification.pkl)が保存される

4.保存したモデルを読み込み推論する

wine_classification_prediction.py
import pandas as pd 
from sklearn.model_selection import train_test_split
from pycaret.classification import *

white_wine = pd.read_csv("./winequality-white.csv", sep=';')
red_wine = pd.read_csv("./winequality-red.csv", sep=';')
white_wine['is_red'] = 0.0
red_wine['is_red'] = 1.0
data_df = pd.concat([white_wine, red_wine], axis=0)
data_labels = data_df['quality'] >= 7
data_df = data_df.drop(['quality'], axis=1)

train, test = train_test_split(data_df)
saved_model = load_model('wine_classification')
new_prediction = predict_model(saved_model, data=test)
print(new_prediction.head())

image.png

Regression

1.ダイアモンドのデータセットを用いて学習する

diamond_regression_train.py
import pandas as pd 
from sklearn.model_selection import train_test_split
from pycaret.regression import *
from pycaret.datasets import get_data

data_df = get_data('diamond')
train, test = train_test_split(data_df)
s = setup(train, target = 'Price')
best = compare_models()
evaluate_model(best)
predictions = predict_model(best, data=test)
save_model(best, 'diamond_regression')

2.学習したモデルを読み込み推論する

diamond_regression_prediction.py
import pandas as pd 
from sklearn.model_selection import train_test_split
from pycaret.regression import *
from pycaret.datasets import get_data

data_df = get_data('diamond')
train, test = train_test_split(data_df)
saved_model = load_model('diamond_regression')
new_prediction = predict_model(saved_model, data=test)
print(new_prediction.head())

image.png

Clustering

保留

Anomaly Detection

保留

Time Series Forecasting App with PyCaret and Streamlit

EDA with PyCaret

1.pycaretのインストール

pip install pycaret[full]

もしくは

pip install pycaret
pip install autoviz

2.jupyter notebook上でやる

jupyter notebook

3.irisデータセットの読み込み

from pycaret.datasets import get_data
data = get_data('iris')

image.png

4.セットアップ

from pycaret.classification import *
s = setup(data, target = 'species', session_id = 123)

image.png

5.EDA

eda()
Alert! from version 0.1.39, fter importing, you must do '%matplotlib inline' to display charts in Jupyter Notebooks.
    AV = AutoViz_Class()
    dfte = AV.AutoViz(filename, sep=',', depVar='', dfte=None, header=0, verbose=0, lowess=False,
               chart_format='svg',max_rows_analyzed=150000,max_cols_analyzed=30, save_plot_dir=None)
Note: verbose=0 or 1 generates charts and displays them in your local Jupyter notebook.
      verbose=2 does not display plots but saves them in AutoViz_Plots folder in local machine.
Updated: chart_format='bokeh' generates and displays charts in your local Jupyter notebook.
      chart_format='server' generates and displays charts in the browser - one tab for each chart.
      chart_format='html' silently saves charts HTML format - they are also interactive!

Shape of your Data Set loaded: (150, 5)
#######################################################################################
######################## C L A S S I F Y I N G  V A R I A B L E S  ####################
#######################################################################################
Classifying variables in data set...
    4 Predictors classified...
        No variables removed since no ID or low-information variables found in data set

################ Multi_Classification problem #####################
bokeh_plot (1).png bokeh_plot (2).png bokeh_plot (3).png
bokeh_plot (4).png bokeh_plot (5).png bokeh_plot (6).png
bokeh_plot (7).png bokeh_plot (8).png bokeh_plot (9).png

Databricksではsetupのところを下記にする必要がある

s = setup(data, target = 'species', session_id = 123, silent = True, html = False)

DatabricksだとEDAの表示はできなさそう、だれかできる人いたら教えてください。
image.png

参考文献

Tutorials

Examples

Videos

Cheat sheet

image.png

image.png

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0