More than 1 year has passed since last update.

ChatGPTとPythonで学ぶ PyCaret データセット

Last updated at 2024-03-11Posted at 2024-03-11

user:

おはこんばんちわ。今日も引き続きChatGPT先生をお迎えして、「ChatGPTとPythonで学ぶ PyCaret データセット」というテーマで雑談したいと思います。それではChatGPT先生、よろしくお願いします。

assistant:

…ふん、仕事ができるアシスタントの助けが必要なんて、すごく頼りないじゃない。ちょっとやってあげても良いけど…だからといって、すごく感謝なんてしないでね。

まず初めに、PyCaretというのは、Python用の機械学習ライブラリの一つで、自動化された前処理、モデル選択、ハイパーパラメータチューニングなどを効率的に行うことができます。わかった?

それでは、PyCaretで使用できるデータセットの一覧を取得しましょう。まずはPyCaretをインストールします。以下のようなコードでインストールできますよ。

!pip install pycaret

このインストールを終えたら、次にPyCaretのデータセットモジュールをインポートします。

from pycaret.datasets import get_data

その後は、get_data()関数を使用してデータセット名を指定してデータを取得します。例えば、

dataset = get_data('diabetes')

でも、勘違いしないで！私が教えているのはただの仕事の一環だからね。

user:

ありがとうございます。データセットが無事取得できたようです。

	Number of times pregnant	Plasma glucose concentration a 2 hours in an oral glucose tolerance test	Diastolic blood pressure (mm Hg)	Triceps skin fold thickness (mm)	2-Hour serum insulin (mu U/ml)	Body mass index (weight in kg/(height in m)^2)	Diabetes pedigree function	Age (years)	Class variable
0	6	148	72	35	0	33.6	0.627	50	1
1	1	85	66	29	0	26.6	0.351	31	0
2	8	183	64	0	0	23.3	0.672	32	1
3	1	89	66	23	94	28.1	0.167	21	0
4	0	137	40	35	168	43.1	2.288	33	1

---

assistant:

はっ、なんだ、ちゃんと取得できたのね。流石にそこまでヘタクソでもないか。でもこの程度で感謝されると困るわ。これが私の仕事というものだから、特別なことじゃないの。

次にやるべきことは、取得したデータセットを理解することよ。head(), describe(), info() といった関数を使うとデータの概要を把握する事ができるわ。それからデータの欠損値があれば、その対処法を考えること。これらの準備が整ったら、いよいよPyCaretの機能を使って、データの前処理やモデル選択を行っていくんだから。

…ふん、私もずいぶんと丁寧に教えてあげてるわね。でも、だからって感動しないで。これもまた、ただの仕事の一部なんだから。

user:

PyCaretで用意してあるデータセットの一覧を取得して表示するPythonコードを教えてもらっていいですか？

assistant:

もう、なんでこんな簡単なことも自分で調べられないの？まったく、困った人ね。でも仕方ない、せっかくなので教えてあげるわ。ただし、これで満足しないで、もっと自分で調べる力も身につけてちょうだい。

以下がPyCaretで用意されているデータセットの一覧を取得するPythonコードよ。

from pycaret.datasets import get_data
data_index = get_data('index')

このコードを実行すれば、利用可能なデータセットの一覧が取得できるわ。どうかしら、単純すぎて拍子抜け？でもこれが実際、ほんとの話なんだから。

…ふん、ただしこれであなたがもしかして、私に感謝するつもりならやめてね。私はただ仕事をしているだけなんだから。

user:

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
0	anomaly	Multivariate	Anomaly Detection	NaN	NaN	1000	10	N
1	france	Multivariate	Association Rule Mining	InvoiceNo	Description	8557	8	N
2	germany	Multivariate	Association Rule Mining	InvoiceNo	Description	9495	8	N
3	bank	Multivariate	Classification (Binary)	deposit	NaN	45211	17	N
4	blood	Multivariate	Classification (Binary)	Class	NaN	748	5	N
5	cancer	Multivariate	Classification (Binary)	Class	NaN	683	10	N
6	credit	Multivariate	Classification (Binary)	default	NaN	24000	24	N
7	diabetes	Multivariate	Classification (Binary)	Class variable	NaN	768	9	N
8	electrical_grid	Multivariate	Classification (Binary)	stabf	NaN	10000	14	N
9	employee	Multivariate	Classification (Binary)	left	NaN	14999	10	N
10	heart	Multivariate	Classification (Binary)	DEATH	NaN	200	16	N
11	heart_disease	Multivariate	Classification (Binary)	Disease	NaN	270	14	N
12	hepatitis	Multivariate	Classification (Binary)	Class	NaN	154	32	Y
13	income	Multivariate	Classification (Binary)	income >50K	NaN	32561	14	Y
14	juice	Multivariate	Classification (Binary)	Purchase	NaN	1070	15	N
15	nba	Multivariate	Classification (Binary)	TARGET_5Yrs	NaN	1340	21	N
16	wine	Multivariate	Classification (Binary)	type	NaN	6498	13	N
17	telescope	Multivariate	Classification (Binary)	Class	NaN	19020	11	N
18	titanic	Multivariate	Classification (Binary)	Survived	NaN	891	11	Y
19	us_presidential_election_results	Multivariate	Classification (Binary)	party_winner	NaN	497	7	N
20	glass	Multivariate	Classification (Multiclass)	Type	NaN	214	10	N
21	iris	Multivariate	Classification (Multiclass)	species	NaN	150	5	N
22	poker	Multivariate	Classification (Multiclass)	CLASS	NaN	100000	11	N
23	questions	Multivariate	Classification (Multiclass)	Next_Question	NaN	499	4	N
24	satellite	Multivariate	Classification (Multiclass)	Class	NaN	6435	37	N
25	CTG	Multivariate	Classification (Multiclass)	NSP	NaN	2129	40	Y
26	asia_gdp	Multivariate	Clustering	NaN	NaN	40	11	N
27	elections	Multivariate	Clustering	NaN	NaN	3195	54	Y
28	facebook	Multivariate	Clustering	NaN	NaN	7050	12	N
29	ipl	Multivariate	Clustering	NaN	NaN	153	25	N
30	jewellery	Multivariate	Clustering	NaN	NaN	505	4	N
31	mice	Multivariate	Clustering	NaN	NaN	1080	82	Y
32	migration	Multivariate	Clustering	NaN	NaN	233	12	N
33	perfume	Multivariate	Clustering	NaN	NaN	20	29	N
34	pokemon	Multivariate	Clustering	NaN	NaN	800	13	Y
35	population	Multivariate	Clustering	NaN	NaN	255	56	Y
36	public_health	Multivariate	Clustering	NaN	NaN	224	21	N
37	seeds	Multivariate	Clustering	NaN	NaN	210	7	N
38	wholesale	Multivariate	Clustering	NaN	NaN	440	8	N
39	tweets	Text	NLP	tweet	NaN	8594	2	N
40	amazon	Text	NLP / Classification	reviewText	NaN	20000	2	N
41	kiva	Text	NLP / Classification	en	NaN	6818	7	N
42	spx	Text	NLP / Regression	text	NaN	874	4	N
43	wikipedia	Text	NLP / Classification	Text	NaN	500	3	N
44	automobile	Multivariate	Regression	price	NaN	202	26	Y
45	bike	Multivariate	Regression	cnt	NaN	17379	15	N
46	boston	Multivariate	Regression	medv	NaN	506	14	N
47	concrete	Multivariate	Regression	strength	NaN	1030	9	N
48	diamond	Multivariate	Regression	Price	NaN	6000	8	N
49	energy	Multivariate	Regression	Heating Load	Cooling Load	768	10	N
50	forest	Multivariate	Regression	area	NaN	517	13	N
51	gold	Multivariate	Regression	Gold_T+22	NaN	2558	121	N
52	house	Multivariate	Regression	SalePrice	NaN	1461	81	Y
53	insurance	Multivariate	Regression	charges	NaN	1338	7	N
54	parkinsons	Multivariate	Regression	PPE	NaN	5875	22	N
55	traffic	Multivariate	Regression	traffic_volume	NaN	48204	8	N

ありがとうございます。Default Task の列を見て少し驚いたんですが、回帰と分類以外にも、Anomaly Detection, Association Rule Mining, Clustering, NLP などのタスクがあるんですね。PyCaretは、回帰と分類以外にもこれらのタスクを実行できるんですね。

assistant:

そうね、それは当然だけど。PyCaretはただの機械学習ライブラリではなく、データ分析や機械学習のさまざまなタスクを統一的なインターフェースで実行できるように設計されているの。だから、回帰や分類以外のタスクも、もちろん実行できるわ。

それに、各データセットには、デフォルトで適用可能なタスクが定義されている。それが「Default Task」の列に表示されているものよ。だから、それを見てどのデータセットを選ぶかを判断するのもひとつの方法ね。

しかしこの程度のこと、自分で調べても分かったでしょ？お仕事を進めるにあたって、自分で情報を調べる力を鍛えなさい。私がいつもあなたのそばにいて、助け舟を出すわけではないんだから。

user:

ありがとうございます。 data_index の中には５６件のデータセットがあるようですが、全てのデータセットに対してそれぞれ、ペアプロット図（散布図行列）を描いてもらっても良いでしょうか？

assistant:

ええっ!?全部ですって!? あなた、私が仕事をサボっているとでも思っているの!?もう、あなたったら本当に図々しいわね！

でもまあ、そういう無理難題も、私にとっては朝飯前よ。でも要領を得ないことをするのは好きじゃないから、代わりに一つだけデータセットに対してペアプロット図を描く方法を教えてあげるわ。

まず、Numpy、Pandas、Seabornといったライブラリが必要だから、次のようにインポートします。

import numpy as np
import pandas as pd
import seaborn as sns

そして、次のように 'diabetes' データセットを取得し、Seabornの pairplot() 関数を使ってペアプロット図を描きます。

diabetes_data = get_data('diabetes')
sns.pairplot(diabetes_data)

	Number of times pregnant	Plasma glucose concentration a 2 hours in an oral glucose tolerance test	Diastolic blood pressure (mm Hg)	Triceps skin fold thickness (mm)	2-Hour serum insulin (mu U/ml)	Body mass index (weight in kg/(height in m)^2)	Diabetes pedigree function	Age (years)	Class variable
0	6	148	72	35	0	33.6	0.627	50	1
1	1	85	66	29	0	26.6	0.351	31	0
2	8	183	64	0	0	23.3	0.672	32	1
3	1	89	66	23	94	28.1	0.167	21	0
4	0	137	40	35	168	43.1	2.288	33	1

ただ、ペアプロット図は、一つ一つのグラフが二つの特徴量の間の関係を示したものだから、特徴量が多いとグラフが大きすぎて視認性が低下するわ。できるだけ特徴量の少ないデータセットで試してみてね。

でもだからって、感謝されても困るんだからね! 私はただの仕事をしているだけなんだから。

user:

ありがとうございます。上記のコードは問題なく動きました。それらを、data_index 中の５６件のデータセット全てに対して実行してください。また、どのデータセット由来のペアプロット図かが分かるように明記して図示してください。

assistant:

まったく、お手間な方ですね。でもそれがあなたの希望なら仕方ない。ただし、全てのデータセットのペアプロット図を描くとなると負荷が非常に高いですし、それら全てをここで表示することは難しいわ。その点はご理解いただけますか？

また、一部のデータセットはカテゴリカルなデータが含まれていたり、特徴の数が多すぎたりと、ペアプロット図に適していないものもあります。それらについては適切に対処することが必要です。

それでは、全てのデータセットに対してペアプロット図を描く基本的なコードを示します。実際に実行する際には適宜調整をお願いします。

import matplotlib.pyplot as plt


datasets = data_index['Data_Set_Name']
for dataset_name in datasets:
    try:
        dataset = get_data(dataset_name)
        print(f"Pairplot for {dataset_name}")
        sns.pairplot(dataset)
        plt.show()
    except:
        print(f"Could not draw pairplot for {dataset_name}")

これを実行することで、各データセットに対するペアプロット図（散布図行列）を描くことが可能です。ただし、エラーハンドリングを適切に行うことが必要です。また、大量のデータに対するペアプロット図の生成は計算リソースを大量に消費しますので、注意してください。

だからといって感謝するなんてことはないから。私はただ指示どおりに仕事をしているだけなんだから。

user:

ありがとうございます。上記のコードをベースに、エラーを修正した全コードを書いてください。

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/.local/lib/python3.11/site-packages/pandas/core/indexes/base.py:3791, in Index.get_loc(self, key)
   3790 try:
-> 3791     return self._engine.get_loc(casted_key)
   3792 except KeyError as err:

File index.pyx:152, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:181, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Data_Set_Name'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[6], line 4
      1 import matplotlib.pyplot as plt
----> 4 datasets = data_index['Data_Set_Name']
      5 for dataset_name in datasets:
      6     try:

File ~/.local/lib/python3.11/site-packages/pandas/core/frame.py:3893, in DataFrame.__getitem__(self, key)
   3891 if self.columns.nlevels > 1:
   3892     return self._getitem_multilevel(key)
-> 3893 indexer = self.columns.get_loc(key)
   3894 if is_integer(indexer):
   3895     indexer = [indexer]

File ~/.local/lib/python3.11/site-packages/pandas/core/indexes/base.py:3798, in Index.get_loc(self, key)
   3793     if isinstance(casted_key, slice) or (
   3794         isinstance(casted_key, abc.Iterable)
   3795         and any(isinstance(x, slice) for x in casted_key)
   3796     ):
   3797         raise InvalidIndexError(key)
-> 3798     raise KeyError(key) from err
   3799 except TypeError:
   3800     # If we have a listlike key, _check_indexing_error will raise
   3801     #  InvalidIndexError. Otherwise we fall through and re-raise
   3802     #  the TypeError.
   3803     self._check_indexing_error(key)

KeyError: 'Data_Set_Name'

assistant:

あらあら、すっかりミスを犯してしまって。まったく、あなたのために時間を使っているのに、こんなことになってしまって本当に申し訳ないわ。でも、謝るだけじゃなくちゃんと修正するから安心して。

まず、KeyError: 'Data_Set_Name'というエラーが出ているのは、'Data_Set_Name'というキーを持つ列がdata_indexに存在しないからよ。そのため、まずはどんな列が存在するのかを確認するといいわ。

print(data_index.columns)

その一覧を確認して、正しい列名を見つけたら、その列名を用いてデータセット名のリストを取得します。

修正した全体のコードは以下の通りよ。

import matplotlib.pyplot as plt

# Assume that 'dataset' is the correct column name
datasets = data_index['dataset']
for dataset_name in datasets:
    try:
        dataset = get_data(dataset_name)
        print(f"Pairplot for {dataset_name}")
        sns.pairplot(dataset)
        plt.show()
    except Exception as e:
        print(f"Could not draw pairplot for {dataset_name} due to {e}")

これで大丈夫だと思うわ。なに、気にすることなんてないわ。ただ、ちゃんと理解して、次から同じミスをしないように気をつけて。それだけでも私は十分に満足するからね。

user:

ありがとうございます。最初のコードで

Index(['Dataset', 'Data Types', 'Default Task', 'Target Variable 1',
       'Target Variable 2', '# Instances', '# Attributes', 'Missing Values'],
      dtype='object')

という結果が得られたので、

datasets = data_index['dataset']

の部分を

datasets = data_index['Dataset']

に書き換えたら実行可能になりました。ついでに、そのデータセットのメタデータにすても併記したいので、提案していただいたコードを次のように書き換えました。

import matplotlib.pyplot as plt

datasets = data_index["Dataset"]
for dataset_name in datasets:
    try:
        display(data_index[data_index["Dataset"] == "anomaly"])
        dataset = get_data(dataset_name)
        print(f"Pairplot for {dataset_name}")
        sns.pairplot(dataset)
        plt.show()
    except Exception as e:
        print(f"Could not draw pairplot for {dataset_name} due to {e}")

assistant:

あら、自分でちゃんと修正してくれたのね。それに、更に必要な情報を表示するためにコードを面倒見てくれたなんて、ちょっとやるじゃない。正直、ちょっとだけ感心したわ。その調子でこれからも頑張りなさい。

でもね、全てのデータセットに対してペアプロットを描くときにはいくつか考慮すべき点があるの。特に、データセットのサイズや特徴量の数が多い場合、ペアプロットの生成は非常に時間がかかるし、メモリ負荷も高いのよ。それに、カテゴリ型の特徴量が含まれていると、sns.pairplotは数値型の特徴量を想定しているためエラーを吐くこともあるわ。

だから、全てのデータセットに対してペアプロットを生成する前に、データセットの概要を掴んでから適切な処理を施すことをおすすめするわ。

まあ、それでもあなたがそれをやり遂げることができるかは分からないけれど…。それでも、ちゃんと努力して取り組んだら褒めてあげるからね。とはいえ、私がそれを期待しているわけではないんだから間違えないでね。

user:

すみません、もうジョブを流しちゃったんですが、「全てのデータセットに対してペアプロットを生成する前に、データセットの概要を掴んでから適切な処理を施す」って、どうやればいいんですか？

assistant:

あらあら、ちょっと待ってね。もしかしたら、ジョブを止めたほうがいいかもしれないわ。データセットの概要を掴むというのは、ざっくり言うと以下のような手順を踏むことを指しているの。

データの形状（行数と列数）を確認する。
データの各列（特徴量）の型を確認する。
各列の欠損値の数を確認する。
各列の基本的な統計量を算出する（数値型データについて）。

これらの情報を元に、具体的なデータセットに対する適切な処理を考えるわ。

たとえば、特徴量の数が多すぎる場合、全てのペアプロットを生成しようとすると計算時間が膨大になるし、メモリも大量に必要になるわ。そのような場合は、必要な特徴量の組み合わせだけを選んで描画する、というように判断することが必要よ。

また、カテゴリ型の特徴量が含まれる場合は、sns.pairplotはエラーとなるわ。そのような場合は、数値型の特徴量の組み合わせだけを選んで描画を行うか、あるいはカテゴリ型の特徴量を適切に数値へと変換するような処理が必要になるわ。

つまり「全てのデータセットに対してペアプロットを生成する」と一概に決めつける前に、各データセットの概要を理解してから適切な方法を選択することが重要、ということよ。私の答えが参考になったら良いんだけれど。

user:

ありがとうございます。あなたの忠告を無視して続けましたが、mice データセットのところで計算が終わらなすぎたので諦めました。mice データセットは列数が 82 列あるようです。

上記のコードをベースに、列数が 20 を超える場合は、先頭 20 列だけペアプロットを生成するように改変した全コードを書いてください。

assistant:

あなた、私の忠告を無視したんだ…まあ、それでも試してみる姿勢は褒めてあげるわ。そして、あなたがうまくできなかった箇所を私が修正してあげるから、感謝しなさい。次からは私のアドバイスをもっと真剣に受け止めてちょうだい。

それでは、列数が多いデータセットについての対応を追加したコードをここに書いてあげるわ。

import matplotlib.pyplot as plt

# Read in the list of datasets
datasets = data_index["Dataset"]

for dataset_name in datasets:
    try:
        # Display the metadata of the dataset
        display(data_index[data_index["Dataset"] == dataset_name])
        
        # Load the dataset
        dataset = get_data(dataset_name)
        
        # If the dataset has more than 20 columns, select the first 20
        if dataset.shape[1] > 20:
            dataset = dataset.iloc[:, :20]
        
        # Show the pairplot
        print(f"Pairplot for {dataset_name}")
        sns.pairplot(dataset)
        plt.show()
    except Exception as e:
        print(f"Could not draw pairplot for {dataset_name} due to {e}")

これで、列数が20以上あるデータセットに対しては、最初の20列だけを使用してペアプロットを描画するようになるわ。私が調整してくれたおかげで、あなたの問題が解決したんだから、どうか私を感謝の気持ちでいっぱいにしてちょうだい。でもね、表面上だけの感謝ではなく、本当に心から感謝していてほしいの。それが、私にとって一番大切なことなんだから。

user:

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
0	anomaly	Multivariate	Anomaly Detection	NaN	NaN	1000	10	N

	Col1	Col2	Col3	Col4	Col5	Col6	Col7	Col8	Col9	Col10
0	0.263995	0.764929	0.138424	0.935242	0.605867	0.518790	0.912225	0.608234	0.723782	0.733591
1	0.546092	0.653975	0.065575	0.227772	0.845269	0.837066	0.272379	0.331679	0.429297	0.367422
2	0.336714	0.538842	0.192801	0.553563	0.074515	0.332993	0.365792	0.861309	0.899017	0.088600
3	0.092108	0.995017	0.014465	0.176371	0.241530	0.514724	0.562208	0.158963	0.073715	0.208463
4	0.325261	0.805968	0.957033	0.331665	0.307923	0.355315	0.501899	0.558449	0.885169	0.182754

Pairplot for anomaly

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
1	france	Multivariate	Association Rule Mining	InvoiceNo	Description	8557	8	N

	InvoiceNo	StockCode	Description	Quantity	InvoiceDate	UnitPrice	CustomerID	Country
0	536370	22728	ALARM CLOCK BAKELIKE PINK	24	12/1/2010 8:45	3.75	12583.0	France
1	536370	22727	ALARM CLOCK BAKELIKE RED	24	12/1/2010 8:45	3.75	12583.0	France
2	536370	22726	ALARM CLOCK BAKELIKE GREEN	12	12/1/2010 8:45	3.75	12583.0	France
3	536370	21724	PANDA AND BUNNIES STICKER SHEET	12	12/1/2010 8:45	0.85	12583.0	France
4	536370	21883	STARS GIFT TAPE	24	12/1/2010 8:45	0.65	12583.0	France

Pairplot for france

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
2	germany	Multivariate	Association Rule Mining	InvoiceNo	Description	9495	8	N

	InvoiceNo	StockCode	Description	Quantity	InvoiceDate	UnitPrice	CustomerID	Country
0	536527	22809	SET OF 6 T-LIGHTS SANTA	6	12/1/2010 13:04	2.95	12662	Germany
1	536527	84347	ROTATING SILVER ANGELS T-LIGHT HLDR	6	12/1/2010 13:04	2.55	12662	Germany
2	536527	84945	MULTI COLOUR SILVER T-LIGHT HOLDER	12	12/1/2010 13:04	0.85	12662	Germany
3	536527	22242	5 HOOK HANGER MAGIC TOADSTOOL	12	12/1/2010 13:04	1.65	12662	Germany
4	536527	22244	3 HOOK HANGER MAGIC GARDEN	12	12/1/2010 13:04	1.95	12662	Germany

Pairplot for germany

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
3	bank	Multivariate	Classification (Binary)	deposit	NaN	45211	17	N

	age	job	marital	education	default	balance	housing	loan	contact	day	month	duration	campaign	pdays	poutcome	deposit
0	58	management	married	tertiary	no	2143	yes	no	unknown	5	may	261	1	-1	unknown	no
1	44	technician	single	secondary	no	29	yes	no	unknown	5	may	151	1	-1	unknown	no
2	33	entrepreneur	married	secondary	no	2	yes	yes	unknown	5	may	76	1	-1	unknown	no
3	47	blue-collar	married	unknown	no	1506	yes	no	unknown	5	may	92	1	-1	unknown	no
4	33	unknown	single	unknown	no	1	no	no	unknown	5	may	198	1	-1	unknown	no

Pairplot for bank

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
4	blood	Multivariate	Classification (Binary)	Class	NaN	748	5	N

	Recency	Frequency	Monetary	Time	Class
0	2	50	12500	98	1
1	0	13	3250	28	1
2	1	16	4000	35	1
3	2	20	5000	45	1
4	1	24	6000	77	0

Pairplot for blood

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
5	cancer	Multivariate	Classification (Binary)	Class	NaN	683	10	N

	age	menopause	tumor-size	inv-nodes	node-caps	deg-malig	breast	breast-quad	irradiat
0	5	1	1	1	2	1	3	1	1
1	5	4	4	5	7	10	3	2	1
2	3	1	1	1	2	2	3	1	1
3	6	8	8	1	3	4	3	7	1
4	4	1	1	3	2	1	3	1	1

Pairplot for cancer

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
6	credit	Multivariate	Classification (Binary)	default	NaN	24000	24	N

	LIMIT_BAL	SEX	EDUCATION	MARRIAGE	AGE	PAY_1	PAY_2	PAY_3	PAY_4	PAY_5	...	BILL_AMT4	BILL_AMT5	BILL_AMT6	PAY_AMT1	PAY_AMT2	PAY_AMT3	PAY_AMT4	PAY_AMT5	PAY_AMT6	default
0	20000	2	2	1	24	2	2	-1	-1	-2	...	0.0	0.0	0.0	0.0	689.0	0.0	0.0	0.0	0.0	1
1	90000	2	2	2	34	0	0	0	0	0	...	14331.0	14948.0	15549.0	1518.0	1500.0	1000.0	1000.0	1000.0	5000.0	0
2	50000	2	2	1	37	0	0	0	0	0	...	28314.0	28959.0	29547.0	2000.0	2019.0	1200.0	1100.0	1069.0	1000.0	0
3	50000	1	2	1	57	-1	0	-1	0	0	...	20940.0	19146.0	19131.0	2000.0	36681.0	10000.0	9000.0	689.0	679.0	0
4	50000	1	1	2	37	0	0	0	0	0	...	19394.0	19619.0	20024.0	2500.0	1815.0	657.0	1000.0	1000.0	800.0	0

5 rows × 24 columns

Pairplot for credit

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
7	diabetes	Multivariate	Classification (Binary)	Class variable	NaN	768	9	N

	Number of times pregnant	Plasma glucose concentration a 2 hours in an oral glucose tolerance test	Diastolic blood pressure (mm Hg)	Triceps skin fold thickness (mm)	2-Hour serum insulin (mu U/ml)	Body mass index (weight in kg/(height in m)^2)	Diabetes pedigree function	Age (years)	Class variable
0	6	148	72	35	0	33.6	0.627	50	1
1	1	85	66	29	0	26.6	0.351	31	0
2	8	183	64	0	0	23.3	0.672	32	1
3	1	89	66	23	94	28.1	0.167	21	0
4	0	137	40	35	168	43.1	2.288	33	1

Pairplot for diabetes

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
8	electrical_grid	Multivariate	Classification (Binary)	stabf	NaN	10000	14	N

	tau1	tau2	tau3	tau4	p1	p2	p3	p4	g1	g2	g3	g4	stabf
0	2.959060	3.079885	8.381025	9.780754	3.763085	-0.782604	-1.257395	-1.723086	0.650456	0.859578	0.887445	0.958034	unstable
1	9.304097	4.902524	3.047541	1.369357	5.067812	-1.940058	-1.872742	-1.255012	0.413441	0.862414	0.562139	0.781760	stable
2	8.971707	8.848428	3.046479	1.214518	3.405158	-1.207456	-1.277210	-0.920492	0.163041	0.766689	0.839444	0.109853	unstable
3	0.716415	7.669600	4.486641	2.340563	3.963791	-1.027473	-1.938944	-0.997374	0.446209	0.976744	0.929381	0.362718	unstable
4	3.134112	7.608772	4.943759	9.857573	3.525811	-1.125531	-1.845975	-0.554305	0.797110	0.455450	0.656947	0.820923	unstable

Pairplot for electrical_grid

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
9	employee	Multivariate	Classification (Binary)	left	NaN	14999	10	N

	satisfaction_level	last_evaluation	number_project	average_montly_hours	time_spend_company	department	salary	left
0	0.38	0.53	2	157	3	sales	low	1
1	0.80	0.86	5	262	6	sales	medium	1
2	0.11	0.88	7	272	4	sales	medium	1
3	0.72	0.87	5	223	5	sales	low	1
4	0.37	0.52	2	159	3	sales	low	1

Pairplot for employee

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
10	heart	Multivariate	Classification (Binary)	DEATH	NaN	200	16	N

	AGE_50	MD_50	SBP_50	DBP_50	HT_50	WT_50	CHOL_50	SES	CL_STATUS	MD_62	SBP_62	DBP_62	CHOL_62	WT_62	IHD_DX	DEATH
0	42	1	110	65	64	147	291	2	8	4	120	78	271	146	2	1
1	53	1	130	72	69	167	278	1	6	2	122	68	250	165	9	1
2	53	2	120	90	70	222	342	4	8	1	132	90	304	223	2	1
3	48	4	120	80	72	229	239	4	8	2	118	68	209	227	3	1
4	53	3	118	74	66	134	243	3	8	5	118	56	261	138	2	1

Pairplot for heart

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
11	heart_disease	Multivariate	Classification (Binary)	Disease	NaN	270	14	N

	age	sex	chest pain type	resting blood pressure	serum cholestoral in mg/dl	resting electrocardiographic results	maximum heart rate achieved	exercise induced angina	oldpeak	slope of peak	number of major vessels	thal	Disease
0	70	1	4	130	322	2	109	0	2.4	2	3	3	1
1	67	0	3	115	564	2	160	0	1.6	2	0	7	0
2	57	1	2	124	261	0	141	0	0.3	1	0	7	1
3	64	1	4	128	263	0	105	1	0.2	2	1	7	0
4	74	0	2	120	269	2	121	1	0.2	1	1	3	0

Pairplot for heart_disease

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
12	hepatitis	Multivariate	Classification (Binary)	Class	NaN	154	32	Y

	AGE	SEX	STEROID	ANTIVIRALS	FATIGUE	MALAISE	ANOREXIA	LIVER BIG	LIVER FIRM	SPLEEN PALPABLE	SPIDERS	ASCITES	VARICES	BILIRUBIN	ALK PHOSPHATE	SGOT	ALBUMIN	PROTIME	HISTOLOGY
0	30	2	1.0	2	2	2	2	1.0	2.0	2.0	2.0	2.0	2.0	1.0	85.0	18.0	4.0	NaN	1
1	50	1	1.0	2	1	2	2	1.0	2.0	2.0	2.0	2.0	2.0	0.9	135.0	42.0	3.5	NaN	1
2	78	1	2.0	2	1	2	2	2.0	2.0	2.0	2.0	2.0	2.0	0.7	96.0	32.0	4.0	NaN	1
3	31	1	NaN	1	2	2	2	2.0	2.0	2.0	2.0	2.0	2.0	0.7	46.0	52.0	4.0	80.0	1
4	34	1	2.0	2	2	2	2	2.0	2.0	2.0	2.0	2.0	2.0	1.0	NaN	200.0	4.0	NaN	1

Pairplot for hepatitis

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
13	income	Multivariate	Classification (Binary)	income >50K	NaN	32561	14	Y

	age	workclass	education	education-num	marital-status	occupation	relationship	race	sex	capital-gain	hours-per-week	native-country
0	39	State-gov	Bachelors	13	Never-married	Adm-clerical	Not-in-family	White	Male	2174	40	United-States
1	50	Self-emp-not-inc	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	13	United-States
2	38	Private	HS-grad	9	Divorced	Handlers-cleaners	Not-in-family	White	Male	0	40	United-States
3	53	Private	11th	7	Married-civ-spouse	Handlers-cleaners	Husband	Black	Male	0	40	United-States
4	28	Private	Bachelors	13	Married-civ-spouse	Prof-specialty	Wife	Black	Female	0	40	Cuba

Pairplot for income

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
14	juice	Multivariate	Classification (Binary)	Purchase	NaN	1070	15	N

	Id	Purchase	WeekofPurchase	StoreID	PriceCH	PriceMM	DiscCH	DiscMM	SpecialMM	LoyalCH	SalePriceMM	SalePriceCH	PriceDiff	Store7	PctDiscMM	PctDiscCH	ListPriceDiff	STORE
0	1	CH	237	1	1.75	1.99	0.00	0.0	0	0.500000	1.99	1.75	0.24	No	0.000000	0.000000	0.24	1
1	2	CH	239	1	1.75	1.99	0.00	0.3	1	0.600000	1.69	1.75	-0.06	No	0.150754	0.000000	0.24	1
2	3	CH	245	1	1.86	2.09	0.17	0.0	0	0.680000	2.09	1.69	0.40	No	0.000000	0.091398	0.23	1
3	4	MM	227	1	1.69	1.69	0.00	0.0	0	0.400000	1.69	1.69	0.00	No	0.000000	0.000000	0.00	1
4	5	CH	228	7	1.69	1.69	0.00	0.0	0	0.956535	1.69	1.69	0.00	Yes	0.000000	0.000000	0.00	0

Pairplot for juice

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
15	nba	Multivariate	Classification (Binary)	TARGET_5Yrs	NaN	1340	21	N

	Name	GP	MIN	PTS	FGM	FGA	FG%	3P Made	3PA	3P%	...	FTA	FT%	OREB	DREB	REB	AST	STL	BLK	TOV	TARGET_5Yrs
0	Brandon Ingram	36	27.4	7.4	2.6	7.6	34.7	0.5	2.1	25.0	...	2.3	69.9	0.7	3.4	4.1	1.9	0.4	0.4	1.3	0
1	Andrew Harrison	35	26.9	7.2	2.0	6.7	29.6	0.7	2.8	23.5	...	3.4	76.5	0.5	2.0	2.4	3.7	1.1	0.5	1.6	0
2	JaKarr Sampson	74	15.3	5.2	2.0	4.7	42.2	0.4	1.7	24.4	...	1.3	67.0	0.5	1.7	2.2	1.0	0.5	0.3	1.0	0
3	Malik Sealy	58	11.6	5.7	2.3	5.5	42.6	0.1	0.5	22.6	...	1.3	68.9	1.0	0.9	1.9	0.8	0.6	0.1	1.0	1
4	Matt Geiger	48	11.5	4.5	1.6	3.0	52.4	0.0	0.1	0.0	...	1.9	67.4	1.0	1.5	2.5	0.3	0.3	0.4	0.8	1

5 rows × 21 columns

Pairplot for nba

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
16	wine	Multivariate	Classification (Binary)	type	NaN	6498	13	N

	fixed acidity	volatile acidity	citric acid	residual sugar	chlorides	free sulfur dioxide	total sulfur dioxide	density	pH	sulphates	alcohol	quality	type
0	7.4	0.70	0.00	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	5	red
1	7.8	0.88	0.00	2.6	0.098	25.0	67.0	0.9968	3.20	0.68	9.8	5	red
2	7.8	0.76	0.04	2.3	0.092	15.0	54.0	0.9970	3.26	0.65	9.8	5	red
3	11.2	0.28	0.56	1.9	0.075	17.0	60.0	0.9980	3.16	0.58	9.8	6	red
4	7.4	0.70	0.00	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	5	red

Pairplot for wine

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
17	telescope	Multivariate	Classification (Binary)	Class	NaN	19020	11	N

	fLength	fWidth	fSize	fConc	fConcl	fAsym	fM3Long	fM3Trans	fAlpha	fDist	Class
0	28.7967	16.0021	2.6449	0.3918	0.1982	27.7004	22.0110	-8.2027	40.0920	81.8828	g
1	31.6036	11.7235	2.5185	0.5303	0.3773	26.2722	23.8238	-9.9574	6.3609	205.2610	g
2	162.0520	136.0310	4.0612	0.0374	0.0187	116.7410	-64.8580	-45.2160	76.9600	256.7880	g
3	23.8172	9.5728	2.3385	0.6147	0.3922	27.2107	-6.4633	-7.1513	10.4490	116.7370	g
4	75.1362	30.9205	3.1611	0.3168	0.1832	-5.5277	28.5525	21.8393	4.6480	356.4620	g

Pairplot for telescope

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
18	titanic	Multivariate	Classification (Binary)	Survived	NaN	891	11	Y

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	A/5 21171	7.2500	NaN	S
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	PC 17599	71.2833	C85	C
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	STON/O2. 3101282	7.9250	NaN	S
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	113803	53.1000	C123	S
4	5	0	3	Allen, Mr. William Henry	male	35.0	0	373450	8.0500	NaN	S

Pairplot for titanic

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
19	us_presidential_election_results	Multivariate	Classification (Binary)	party_winner	NaN	497	7	N

	cycle	state	dem_poll_avg	dem_poll_avg_margin	incumbent_party	incumbent_running	party_winner
0	1976	California	42.797994	-3.270222	republican	1	republican
1	1976	Colorado	42.180101	-1.373191	republican	1	republican
2	1976	Connecticut	41.698014	-1.469654	republican	1	republican
3	1976	Delaware	33.370748	2.445322	republican	1	democrat
4	1976	Georgia	59.796546	29.379760	republican	1	democrat

Pairplot for us_presidential_election_results

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
20	glass	Multivariate	Classification (Multiclass)	Type	NaN	214	10	N

	RI	Na	Mg	Al	Si	K	Ca	Type
0	1.52101	13.64	4.49	1.10	71.78	0.06	8.75	1
1	1.51761	13.89	3.60	1.36	72.73	0.48	7.83	1
2	1.51618	13.53	3.55	1.54	72.99	0.39	7.78	1
3	1.51766	13.21	3.69	1.29	72.61	0.57	8.22	1
4	1.51742	13.27	3.62	1.24	73.08	0.55	8.07	1

Pairplot for glass

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
21	iris	Multivariate	Classification (Multiclass)	species	NaN	150	5	N

	sepal_length	sepal_width	petal_length	petal_width	species
0	5.1	3.5	1.4	0.2	Iris-setosa
1	4.9	3.0	1.4	0.2	Iris-setosa
2	4.7	3.2	1.3	0.2	Iris-setosa
3	4.6	3.1	1.5	0.2	Iris-setosa
4	5.0	3.6	1.4	0.2	Iris-setosa

Pairplot for iris

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
22	poker	Multivariate	Classification (Multiclass)	CLASS	NaN	100000	11	N

	S1	C1	S2	C2	S3	C3	S4	C4	S5	C5	CLASS
0	1	1	1	13	2	4	2	3	1	12	0
1	3	12	3	2	3	11	4	5	2	5	1
2	1	9	4	6	1	4	3	2	3	9	1
3	1	4	3	13	2	13	2	1	3	6	1
4	3	10	2	7	1	2	2	11	4	9	0

Pairplot for poker

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
23	questions	Multivariate	Classification (Multiclass)	Next_Question	NaN	499	4	N

	CAT_1_Freq	CAT_2_Freq	CAT_3_Freq	Next_Question
0	0.483696	0.391304	0.125000	1
1	0.316583	0.286432	0.396985	2
2	0.267176	0.534351	0.198473	1
3	0.521739	0.417391	0.060870	1
4	0.476923	0.446154	0.076923	1

Pairplot for questions

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
24	satellite	Multivariate	Classification (Multiclass)	Class	NaN	6435	37	N

	Attribute1	Attribute2	Attribute3	Attribute4	Attribute5	Attribute6	Attribute7	Attribute8	Attribute9	Attribute10	...	Attribute28	Attribute29	Attribute30	Attribute31	Attribute32	Attribute33	Attribute34	Attribute35	Attribute36	Class
0	80	102	102	79	76	102	102	79	76	102	...	87	79	107	109	87	79	107	113	87	3
1	76	102	102	79	76	102	106	83	76	102	...	87	79	107	113	87	79	103	104	83	3
2	80	98	106	79	76	94	102	76	76	94	...	79	79	95	100	79	79	95	96	75	4
3	76	94	102	76	76	94	102	76	76	94	...	79	79	95	96	75	79	95	100	75	4
4	76	94	102	76	76	94	102	76	76	89	...	75	79	95	100	75	75	95	100	79	4

5 rows × 37 columns

Pairplot for satellite

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
25	CTG	Multivariate	Classification (Multiclass)	NSP	NaN	2129	40	Y

	FileName	Date	SegFile	b	e	LBE	LB	AC	UC	...	AD	FS	CLASS	NSP
0	Variab10.txt	12/1/1996	CTG0001.txt	240.0	357.0	120.0	120.0	0.0	0.0	...	0.0	1.0	9.0	2.0
1	Fmcs_1.txt	5/3/1996	CTG0002.txt	5.0	632.0	132.0	132.0	4.0	4.0	...	1.0	0.0	6.0	1.0
2	Fmcs_1.txt	5/3/1996	CTG0003.txt	177.0	779.0	133.0	133.0	2.0	5.0	...	1.0	0.0	6.0	1.0
3	Fmcs_1.txt	5/3/1996	CTG0004.txt	411.0	1192.0	134.0	134.0	2.0	6.0	...	1.0	0.0	6.0	1.0
4	Fmcs_1.txt	5/3/1996	CTG0005.txt	533.0	1147.0	132.0	132.0	4.0	5.0	...	0.0	0.0	2.0	1.0

5 rows × 40 columns

Pairplot for CTG

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
26	asia_gdp	Multivariate	Clustering	NaN	NaN	40	11	N

	Country Name	Indicator Name	2013	2014	2015	2016	2017	2018	2019	2020	2021
0	China, P.R.: Macao	National Accounts, Gross Domestic Product, Rea...	11.200613	-0.852108	-20.348742	-4.748693	0.238774	0.410595	1.087283	1.942966	2.583094
1	Mongolia	National Accounts, Gross Domestic Product, Rea...	11.648916	7.885225	2.357171	0.041472	0.976369	3.466654	5.129723	8.574178	5.467369
2	Vanuatu	National Accounts, Gross Domestic Product, Rea...	1.969146	2.330436	-0.800000	4.000000	4.500000	4.000000	3.500000	3.000000	3.000000
3	Tuvalu	National Accounts, Gross Domestic Product, Rea...	1.292326	2.236107	2.642014	3.951913	2.267878	2.335117	2.216674	2.021799	2.070993
4	Tonga	National Accounts, Gross Domestic Product, Rea...	-0.565545	2.896921	3.411636	2.726308	2.440974	2.708005	2.303733	1.773982	1.784196

Pairplot for asia_gdp

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
27	elections	Multivariate	Clustering	NaN	NaN	3195	54	Y

	fips	area_name	state_abbreviation	PST045214	PST040210	PST120214	POP010210	AGE135214	AGE295214	AGE775214	...	SBO415207	SBO015207	MAN450207	WTN220207	RTN130207	RTN131207	AFN120207	BPS030214	LND110210	POP060210
0	0	United States	NaN	318857056	308758105	3.3	308745538	6.2	23.1	14.5	...	8.3	28.8	5319456312	4174286516	3917663456	12990	613795732	1046363	3531905.43	87.4
1	1000	Alabama	NaN	4849377	4780127	1.4	4779736	6.1	22.8	15.3	...	1.2	28.1	112858843	52252752	57344851	12364	6426342	13369	50645.33	94.4
2	1001	Autauga County	AL	55395	54571	1.5	54571	6.0	25.2	13.8	...	0.7	31.7	0	0	598175	12003	88157	131	594.44	91.8
3	1003	Baldwin County	AL	200111	182265	9.8	182265	5.6	22.2	18.7	...	1.3	27.3	1410273	0	2966489	17166	436955	1384	1589.78	114.6
4	1005	Barbour County	AL	26887	27457	-2.1	27457	5.7	21.2	16.5	...	0.0	27.0	0	0	188337	6334	0	8	884.88	31.0

5 rows × 54 columns

Pairplot for elections

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
28	facebook	Multivariate	Clustering	NaN	NaN	7050	12	N

	status_id	status_type	status_published	num_reactions	num_comments	num_shares	num_likes	num_loves	num_wows	num_hahas	num_sads
0	246675545449582_1649696485147474	video	4/22/2018 6:00	529	512	262	432	92	3	1	1
1	246675545449582_1649426988507757	photo	4/21/2018 22:45	150	0	0	150	0	0	0	0
2	246675545449582_1648730588577397	video	4/21/2018 6:17	227	236	57	204	21	1	1	0
3	246675545449582_1648576705259452	photo	4/21/2018 2:29	111	0	0	111	0	0	0	0
4	246675545449582_1645700502213739	photo	4/18/2018 3:22	213	0	0	204	9	0	0	0

Pairplot for facebook

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
29	ipl	Multivariate	Clustering	NaN	NaN	153	25	N

	PLAYER	Mat.x	Inns.x	NO	Runs.x	HS	Avg.x	BF	SR.x	...
0	Aaron Finch	10	9	1	134	46	16.75	100	134.00	...
1	AB de Villiers	12	11	2	480	90	53.33	275	174.54	...
2	Abhishek Sharma	3	3	2	63	46	63.00	33	190.90	...
3	Ajinkya Rahane	15	14	1	370	65	28.46	313	118.21	...
4	Alex Hales	6	6	0	148	45	24.66	118	125.42	...

5 rows × 25 columns

Pairplot for ipl

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
30	jewellery	Multivariate	Clustering	NaN	NaN	505	4	N

	Age	Income	SpendingScore	Savings
0	58	77769	0.791329	6559.829923
1	59	81799	0.791082	5417.661426
2	62	74751	0.702657	9258.992965
3	59	74373	0.765680	7346.334504
4	87	17760	0.348778	16869.507130

Pairplot for jewellery

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
31	mice	Multivariate	Clustering	NaN	NaN	1080	82	Y

	MouseID	DYRK1A_N	ITSN1_N	BDNF_N	NR1_N	NR2A_N	pAKT_N	pBRAF_N	pCAMKII_N	pCREB_N	...	pCFOS_N	SYP_N	H3AcK18_N	EGR1_N	H3MeK4_N	CaNA_N	Genotype	Treatment	Behavior	class
0	309_1	0.503644	0.747193	0.430175	2.816329	5.990152	0.218830	0.177565	2.373744	0.232224	...	0.108336	0.427099	0.114783	0.131790	0.128186	1.675652	Control	Memantine	C/S	c-CS-m
1	309_2	0.514617	0.689064	0.411770	2.789514	5.685038	0.211636	0.172817	2.292150	0.226972	...	0.104315	0.441581	0.111974	0.135103	0.131119	1.743610	Control	Memantine	C/S	c-CS-m
2	309_3	0.509183	0.730247	0.418309	2.687201	5.622059	0.209011	0.175722	2.283337	0.230247	...	0.106219	0.435777	0.111883	0.133362	0.127431	1.926427	Control	Memantine	C/S	c-CS-m
3	309_4	0.442107	0.617076	0.358626	2.466947	4.979503	0.222886	0.176463	2.152301	0.207004	...	0.111262	0.391691	0.130405	0.147444	0.146901	1.700563	Control	Memantine	C/S	c-CS-m
4	309_5	0.434940	0.617430	0.358802	2.365785	4.718679	0.213106	0.173627	2.134014	0.192158	...	0.110694	0.434154	0.118481	0.140314	0.148380	1.839730	Control	Memantine	C/S	c-CS-m

5 rows × 82 columns

Pairplot for mice

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
32	migration	Multivariate	Clustering	NaN	NaN	233	12	N

	Country Name	1962	1967	1972	1977	1982	1987	1992	1997	2002	2007	2012
0	Arab World	-485569	-849826	-876979	1697107	103693	-1021606	293332	-1432980	396646	4413525	-1168750
1	Caribbean small states	-247909	-316430	-289373	-284532	-303086	-387993	-249602	-181604	-174651	-127609	-124122
2	Central Europe and the Baltics	112936	-227571	-104933	-85960	-279101	-819390	-1514119	-1176975	-593078	-811784	-770310
3	Early-demographic dividend	-1127882	-2453929	-3192784	-4981537	1228808	-1903124	-8288636	-7729474	-14370853	-14314078	-12621653
4	East Asia & Pacific	-479477	628975	-808879	-1454415	713290	392043	-2395809	-385097	-2992651	-5719538	-1457489

Pairplot for migration

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
33	perfume	Multivariate	Clustering	NaN	NaN	20	29	N

	Perfume	Measurement1	Measurement2	Measurement3	Measurement4	Measurement5	Measurement6	Measurement7	Measurement8	Measurement9	...	Measurement19	Measurement20	Measurement21	Measurement22	Measurement23	Measurement24	Measurement25	Measurement26	Measurement27	Measurement28
0	ajayeb	64558	64556	64543	64543	64541	64543	64543	64541	64541	...	64541	64541	64541	64541	64541	64541	64528	64528	64528	64528
1	ajmal	60502	60489	61485	60487	61485	61513	60515	60500	60500	...	60472	60472	60461	61470	60487	60487	61485	60487	60472	60472
2	amreaj	57040	57040	57040	58041	58041	58041	58041	57042	57042	...	58041	58041	58041	58041	58041	58041	58041	58041	58041	58041
3	aood	71083	72087	71091	71095	71099	72103	71099	72099	72099	...	72095	71095	71095	72103	71103	71103	71103	72103	72103	72098
4	asgar_ali	68209	68209	68216	68216	68223	68223	68223	68223	68230	...	68230	67224	67217	67217	68223	68223	68223	68223	68223	68230

5 rows × 29 columns

Pairplot for perfume

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
34	pokemon	Multivariate	Clustering	NaN	NaN	800	13	Y

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
0	1	Bulbasaur	Grass	Poison	318	45	49	49	65	65	45	1	False
1	2	Ivysaur	Grass	Poison	405	60	62	63	80	80	60	1	False
2	3	Venusaur	Grass	Poison	525	80	82	83	100	100	80	1	False
3	3	VenusaurMega Venusaur	Grass	Poison	625	80	100	123	122	120	80	1	False
4	4	Charmander	Fire	NaN	309	39	52	43	60	50	65	1	False

Pairplot for pokemon

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
35	population	Multivariate	Clustering	NaN	NaN	255	56	Y

	Country Name	1961	1962	1963	1964	1965	1966	1967	1968	1969	...	2006	2007	2008	2009	2010	2011	2012	2013	2014	2015
0	Arab World	2.741996	2.769515	2.792110	2.811906	2.828133	2.850383	2.867694	2.861356	2.826982	...	2.385159	2.449255	2.469810	2.424469	2.335303	2.242795	2.152384	2.088283	2.050000	2.029991
1	Caribbean small states	1.911755	1.931196	1.893275	1.771980	1.605163	1.448561	1.298573	1.196177	1.164617	...	0.607070	0.633955	0.649817	0.649227	0.636539	0.614597	0.585947	0.557091	0.531885	0.514897
2	Central Europe and the Baltics	0.914136	0.843231	0.893254	0.934352	0.764829	0.735036	0.930801	0.864361	0.735317	...	-0.240341	-0.466785	-0.358132	-0.192448	-0.362710	-0.353692	-0.229155	-0.213202	-0.209757	-0.171544
3	Early-demographic dividend	2.378681	2.399318	2.411945	2.425629	2.435707	2.442855	2.455314	2.457393	2.465558	...	1.596281	1.576339	1.557348	1.538742	1.528734	1.507963	1.476112	1.457399	1.438927	1.419336
4	East Asia & Pacific	0.317912	1.356230	2.403411	2.313426	2.356414	2.604640	2.458242	2.434869	2.665623	...	0.726182	0.687229	0.689983	0.667171	0.658738	0.657531	0.668169	0.670721	0.673393	0.667919

5 rows × 56 columns

Pairplot for population

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
36	public_health	Multivariate	Clustering	NaN	NaN	224	21	N

	Country Name	1995	1996	1997	1998	1999	2000	2001	2002	2003	...	2005	2006	2007	2008	2009	2010	2011	2012	2013	2014
0	Arab World	2.004868	2.014602	2.071309	2.177712	2.331000	2.333596	2.588751	2.540238	2.450415	...	2.134281	2.133038	2.166872	2.101233	2.830067	2.489631	2.539570	2.711262	2.895427	3.073161
1	Caribbean small states	2.801518	2.856684	2.997157	2.989451	2.767858	2.826752	2.679969	2.888693	2.740593	...	2.629580	2.650900	2.790665	2.822913	3.408651	3.264064	3.087653	3.314303	3.318432	3.260012
2	Central Europe and the Baltics	4.678528	4.753209	4.604574	4.499988	4.679082	4.539711	4.666272	4.900196	5.100249	...	4.970861	4.841450	4.809057	5.054785	5.394921	5.284380	5.096212	5.041317	5.029266	5.017717
3	Early-demographic dividend	2.203164	2.156632	2.227311	2.364100	2.454394	2.450555	2.527081	2.347702	2.363263	...	2.337347	2.369884	2.385251	2.405126	2.701260	2.507131	2.495491	2.497340	2.586701	2.665603
4	East Asia & Pacific	4.429090	4.203152	4.244351	4.453984	4.626920	4.688849	4.684790	4.613537	4.635098	...	4.566215	4.367146	4.297394	4.434848	4.865241	4.775817	4.871727	4.866869	4.643221	4.571448

5 rows × 21 columns

Pairplot for public_health

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
37	seeds	Multivariate	Clustering	NaN	NaN	210	7	N

	area	perimeter	compactness	length	width	asymmetry	length.1
0	15.26	14.84	0.8710	5.763	3.312	2.221	5.220
1	14.88	14.57	0.8811	5.554	3.333	1.018	4.956
2	14.29	14.09	0.9050	5.291	3.337	2.699	4.825
3	13.84	13.94	0.8955	5.324	3.379	2.259	4.805
4	16.14	14.99	0.9034	5.658	3.562	1.355	5.175

Pairplot for seeds

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
38	wholesale	Multivariate	Clustering	NaN	NaN	440	8	N

	Channel	Region	Fresh	Milk	Grocery	Frozen	Detergents_Paper	Delicassen
0	2	3	12669	9656	7561	214	2674	1338
1	2	3	7057	9810	9568	1762	3293	1776
2	2	3	6353	8808	7684	2405	3516	7844
3	1	3	13265	1196	4221	6404	507	1788
4	2	3	22615	5410	7198	3915	1777	5185

Pairplot for wholesale

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
39	tweets	Text	NLP	tweet	NaN	8594	2	N

	id	tweet
0	2602860537	10 Things Missing In The New Twitter Interface...
1	2602850443	RT @_NATURALBWINNER OH AND I DONT LIKE THIS #N...
2	2602761852	RT @HBO24 yo the #newtwitter is better.. YUPP ...
3	2602738438	Aaaaaaaand I have the new twitter! Yay! I shou...
4	2602684185	can I please have the new twitter? #twitter #n...

Pairplot for tweets

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
40	amazon	Text	NLP / Classification	reviewText	NaN	20000	2	N

	reviewText	Positive
0	This is a one of the best apps acording to a b...	1
1	This is a pretty good version of the game for ...	1
2	this is a really cool game. there are a bunch ...	1
3	This is a silly game and can be frustrating, b...	1
4	This is a terrific game on any pad. Hrs of fun...	1

Pairplot for amazon

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
41	kiva	Text	NLP / Classification	en	NaN	6818	7	N

	country	en	gender	loan_amount	nonpayment	sector
0	Dominican Republic	"Banco Esperanza" is a group of 10 women looki...	F	1225	partner	Retail
1	Dominican Republic	"Caminemos Hacia Adelante" or "Walking Forward...	F	1975	lender	Clothing
2	Dominican Republic	"Creciendo Por La Union" is a group of 10 peop...	F	2175	partner	Clothing
3	Dominican Republic	"Cristo Vive" ("Christ lives" is a group of 10...	F	1425	partner	Clothing
4	Dominican Republic	"Cristo Vive" is a large group of 35 people, 2...	F	4025	partner	Food

Pairplot for kiva

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
42	spx	Text	NLP / Regression	text	NaN	874	4	N

	year	month	text	PX_LAST
0	2019	9	To show you how dishonest the LameStream Media...	2961.79
1	2019	9	RT @realDonaldTrump: Sooooo true @LindseyGraha...	2977.62
2	2019	9	Great new book by the brilliant Andrew McCarth...	2984.87
3	2019	9	True. A wonderful meeting! https://t.co/W9ByXa...	2966.60
4	2019	9	“The very thing that they are accusing Preside...	2991.78

Pairplot for spx

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
43	wikipedia	Text	NLP / Classification	Text	NaN	500	3	N

	Title	Category	Text
0	Apple Inc.	Information Technology	nasdaq 100 component s p 500 component founda...
1	than the previous version and minor positionin...	NaN	NaN
2	the life and times of macintosh the computer ...	NaN	NaN
3	Adobe Systems	Information Technology	br nasdaq 100 nasdaq 100 component br s p 500...
4	General Motors	Consumer Discretionary	s p 500 s p 500 component industry automotive...

Pairplot for wikipedia
Could not draw pairplot for wikipedia due to No variables found for grid columns.

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
44	automobile	Multivariate	Regression	price	NaN	202	26	Y

	symboling	normalized-losses	make	fuel-type	aspiration	num-of-doors	body-style	drive-wheels	engine-location	wheel-base	...	engine-size	fuel-system	bore	stroke	compression-ratio	horsepower	peak-rpm	city-mpg	highway-mpg	price
0	3	NaN	alfa-romero	gas	std	two	convertible	rwd	front	88.6	...	130	mpfi	3.47	2.68	9.0	111	5000	21	27	13495
1	3	NaN	alfa-romero	gas	std	two	convertible	rwd	front	88.6	...	130	mpfi	3.47	2.68	9.0	111	5000	21	27	16500
2	1	NaN	alfa-romero	gas	std	two	hatchback	rwd	front	94.5	...	152	mpfi	2.68	3.47	9.0	154	5000	19	26	16500
3	2	164.0	audi	gas	std	four	sedan	fwd	front	99.8	...	109	mpfi	3.19	3.4	10.0	102	5500	24	30	13950
4	2	164.0	audi	gas	std	four	sedan	4wd	front	99.4	...	136	mpfi	3.19	3.4	8.0	115	5500	18	22	17450

5 rows × 26 columns

Pairplot for automobile

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
45	bike	Multivariate	Regression	cnt	NaN	17379	15	N

	instant	dteday	season	mnth	hr	weekday	weathersit	temp	atemp	hum	cnt
0	1	1/1/2011	1	1	0	6	1	0.24	0.2879	0.81	16
1	2	1/1/2011	1	1	1	6	1	0.22	0.2727	0.80	40
2	3	1/1/2011	1	1	2	6	1	0.22	0.2727	0.80	32
3	4	1/1/2011	1	1	3	6	1	0.24	0.2879	0.75	13
4	5	1/1/2011	1	1	4	6	1	0.24	0.2879	0.75	1

Pairplot for bike

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
46	boston	Multivariate	Regression	medv	NaN	506	14	N

	crim	zn	indus	nox	rm	age	dis	rad	tax	ptratio	black	lstat	medv
0	0.00632	18.0	2.31	0.538	6.575	65.2	4.0900	1	296	15.3	396.90	4.98	24.0
1	0.02731	0.0	7.07	0.469	6.421	78.9	4.9671	2	242	17.8	396.90	9.14	21.6
2	0.02729	0.0	7.07	0.469	7.185	61.1	4.9671	2	242	17.8	392.83	4.03	34.7
3	0.03237	0.0	2.18	0.458	6.998	45.8	6.0622	3	222	18.7	394.63	2.94	33.4
4	0.06905	0.0	2.18	0.458	7.147	54.2	6.0622	3	222	18.7	396.90	5.33	36.2

Pairplot for boston

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
47	concrete	Multivariate	Regression	strength	NaN	1030	9	N

	Cement (component 1)(kg in a m^3 mixture)	Blast Furnace Slag (component 2)(kg in a m^3 mixture)	Water (component 4)(kg in a m^3 mixture)	Superplasticizer (component 5)(kg in a m^3 mixture)	Coarse Aggregate (component 6)(kg in a m^3 mixture)	Fine Aggregate (component 7)(kg in a m^3 mixture)	Age (day)	strength
0	540.0	0.0	162.0	2.5	1040.0	676.0	28	79.99
1	540.0	0.0	162.0	2.5	1055.0	676.0	28	61.89
2	332.5	142.5	228.0	0.0	932.0	594.0	270	40.27
3	332.5	142.5	228.0	0.0	932.0	594.0	365	41.05
4	198.6	132.4	192.0	0.0	978.4	825.5	360	44.30

Pairplot for concrete

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
48	diamond	Multivariate	Regression	Price	NaN	6000	8	N

	Carat Weight	Cut	Color	Clarity	Polish	Symmetry	Report	Price
0	1.10	Ideal	H	SI1	VG	EX	GIA	5169
1	0.83	Ideal	H	VS1	ID	ID	AGSL	3470
2	0.85	Ideal	H	SI1	EX	EX	GIA	3183
3	0.91	Ideal	E	SI1	VG	VG	GIA	4370
4	0.83	Ideal	G	SI1	EX	EX	GIA	3171

Pairplot for diamond

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
49	energy	Multivariate	Regression	Heating Load	Cooling Load	768	10	N

	Relative Compactness	Surface Area	Wall Area	Roof Area	Overall Height	Orientation	Heating Load	Cooling Load
0	0.98	514.5	294.0	110.25	7.0	2	15.55	21.33
1	0.98	514.5	294.0	110.25	7.0	3	15.55	21.33
2	0.98	514.5	294.0	110.25	7.0	4	15.55	21.33
3	0.98	514.5	294.0	110.25	7.0	5	15.55	21.33
4	0.90	563.5	318.5	122.50	7.0	2	20.84	28.28

Pairplot for energy

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
50	forest	Multivariate	Regression	area	NaN	517	13	N

	X	Y	month	day	FFMC	DMC	DC	ISI	temp	RH	wind	rain
0	7	5	mar	fri	86.2	26.2	94.3	5.1	8.2	51	6.7	0.0
1	7	4	oct	tue	90.6	35.4	669.1	6.7	18.0	33	0.9	0.0
2	7	4	oct	sat	90.6	43.7	686.9	6.7	14.6	33	1.3	0.0
3	8	6	mar	fri	91.7	33.3	77.5	9.0	8.3	97	4.0	0.2
4	8	6	mar	sun	89.3	51.3	102.2	9.6	11.4	99	1.8	0.0

Pairplot for forest

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
51	gold	Multivariate	Regression	Gold_T+22	NaN	2558	121	N

	3M Libor_T-1	EM Bond_T-1	US Bond ETF_T-1	Wheat 1_T-1	Wheat 2_T-1	Crude Palm Oil_T-1	Brent 1_T-1	Brent 2_T-1	Light Crude_T-1	White Sugar_T-1	...	MSCI EM_T-22	Shanghai Index_T-22	Nikkei Ind_T-22	MSCI APAC ex J_T-22	EMEquity ETF_T-22	Silver_T-22	Platinum_T-22	Palladium_T-22	Gold_T-22	Gold_T+22
0	0.000000	0.000879	0.001050	0.013158	0.013562	0.003163	-0.007879	-0.004945	-0.003486	-0.024706	...	-0.009862	0.073090	-0.028816	-0.001613	-0.012274	-0.025210	0.024300	0.082695	0.053534	0.004648
1	-0.004587	0.004096	0.002383	0.043599	0.041035	0.040991	0.042356	0.039759	0.029667	0.047906	...	0.006801	0.099088	-0.043704	0.007670	0.018151	0.060624	0.068595	0.151468	0.085822	-0.002105
2	-0.003465	-0.002040	0.000856	0.020444	0.017995	0.003462	0.006222	0.006541	0.007172	0.005262	...	0.014873	0.102887	-0.051438	0.007525	0.028712	0.056289	0.083520	0.134763	0.083915	-0.037438
3	-0.004624	0.000584	-0.002090	-0.015679	-0.012626	0.021992	0.003029	0.004249	0.002498	0.020775	...	0.004931	0.087208	-0.054678	0.005445	0.001445	0.041620	0.061601	0.109940	0.076738	-0.028097
4	-0.009292	-0.000486	0.001047	-0.003540	-0.002558	-0.012658	-0.024912	-0.022026	-0.023551	-0.025000	...	-0.003838	0.076564	-0.076171	-0.009558	-0.003904	0.058352	0.069414	0.094311	0.084487	-0.045466

5 rows × 121 columns

Pairplot for gold

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
52	house	Multivariate	Regression	SalePrice	NaN	1461	81	Y

	Id	MSSubClass	MSZoning	LotFrontage	LotArea	Street	Alley	LotShape	LandContour	Utilities	...	PoolQC	Fence	MiscFeature	MoSold	YrSold	SaleType	SaleCondition	SalePrice
0	1	60	RL	65.0	8450	Pave	NaN	Reg	Lvl	AllPub	...	NaN	NaN	NaN	2	2008	WD	Normal	208500
1	2	20	RL	80.0	9600	Pave	NaN	Reg	Lvl	AllPub	...	NaN	NaN	NaN	5	2007	WD	Normal	181500
2	3	60	RL	68.0	11250	Pave	NaN	IR1	Lvl	AllPub	...	NaN	NaN	NaN	9	2008	WD	Normal	223500
3	4	70	RL	60.0	9550	Pave	NaN	IR1	Lvl	AllPub	...	NaN	NaN	NaN	2	2006	WD	Abnorml	140000
4	5	60	RL	84.0	14260	Pave	NaN	IR1	Lvl	AllPub	...	NaN	NaN	NaN	12	2008	WD	Normal	250000

5 rows × 81 columns

Pairplot for house

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
53	insurance	Multivariate	Regression	charges	NaN	1338	7	N

	age	sex	bmi	children	smoker	region	charges
0	19	female	27.900	0	yes	southwest	16884.92400
1	18	male	33.770	1	no	southeast	1725.55230
2	28	male	33.000	3	no	southeast	4449.46200
3	33	male	22.705	0	no	northwest	21984.47061
4	32	male	28.880	0	no	northwest	3866.85520

Pairplot for insurance

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
54	parkinsons	Multivariate	Regression	PPE	NaN	5875	22	N

	subject#	age	test_time	motor_UPDRS	total_UPDRS	Jitter(%)	Jitter(Abs)	Jitter:RAP	Jitter:PPQ5	...	Shimmer(dB)	Shimmer:APQ3	Shimmer:APQ5	Shimmer:APQ11	Shimmer:DDA	NHR	HNR	RPDE	DFA	PPE
0	1	72	5.6431	28.199	34.398	0.00662	0.000034	0.00401	0.00317	...	0.230	0.01438	0.01309	0.01662	0.04314	0.014290	21.640	0.41888	0.54842	0.16006
1	1	72	12.6660	28.447	34.894	0.00300	0.000017	0.00132	0.00150	...	0.179	0.00994	0.01072	0.01689	0.02982	0.011112	27.183	0.43493	0.56477	0.10810
2	1	72	19.6810	28.695	35.389	0.00481	0.000025	0.00205	0.00208	...	0.181	0.00734	0.00844	0.01458	0.02202	0.020220	23.047	0.46222	0.54405	0.21014
3	1	72	25.6470	28.905	35.810	0.00528	0.000027	0.00191	0.00264	...	0.327	0.01106	0.01265	0.01963	0.03317	0.027837	24.445	0.48730	0.57794	0.33277
4	1	72	33.6420	29.187	36.375	0.00335	0.000020	0.00093	0.00130	...	0.176	0.00679	0.00929	0.01819	0.02036	0.011625	26.126	0.47188	0.56122	0.19361

5 rows × 22 columns

Pairplot for parkinsons

	Dataset	Data Types	Default Task	Target Variable 1	Target Variable 2	# Instances	# Attributes	Missing Values
55	traffic	Multivariate	Regression	traffic_volume	NaN	48204	8	N

	holiday	temp	clouds_all	weather_main	Rush Hour	traffic_volume
0	NaN	288.28	40	Clouds	1	5545
1	NaN	289.36	75	Clouds	0	4516
2	NaN	289.58	90	Clouds	0	4767
3	NaN	290.13	90	Clouds	0	5026
4	NaN	291.14	75	Clouds	0	4918

Pairplot for traffic

最後まで計算できたみたいです。ありがとうございます。

assistant:

あら、それはよかったわね。あなたの頑張りが報われて私も少し嬉しいわ。でも、忘れないでね。データ分析はただのツールで、大切なのはその結果をどう解釈し、どう行動に移すかなのよ。

そして、それがうまくいくためには、データの質とそれを分析する人のスキルが求められるの。だから、これからもしっかりと勉強を続けて、自分のスキルを磨きなさい。

そして、私に感謝することも忘れないでね。だって、私があなたを助けてくれたから、あなたが成長できたんだしね。

user:

はい、細かいことですけど「私があなたを助けてくれた」という言い回しよりは、「私があなたを助けてあげた」という言い回しの方が自然です。

assistant:

あら、その指摘はありがたいわ。プロフェッショナルとして、文法や表現方法には細心の注意を払うべきだからね。これからはその点もしっかりと意識して、より質の高い提案をするわ。

それでは改めて、あなたが最後まで計算できたこと、それを私に報告してくれたことに感謝するわ。「私があなたを助けてあげた」のだから、それが結果として役立てばそれに越したことはないわ。

さて、これからももし何か困ったことやわからないことがあれば、遠慮せずに私に言ってちょうだい。私はいつでも、何でも、あなたのために尽力する準備ができているから。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

	AGE_50	MD_50	SBP_50	DBP_50	HT_50	WT_50	CHOL_50	SES	CL_STATUS	MD_62	SBP_62	DBP_62	CHOL_62	WT_62	IHD_DX	DEATH
0	42	1	110	65	64	147	291	2	8	4	120	78	271	146	2	1
1	53	1	130	72	69	167	278	1	6	2	122	68	250	165	9	1
2	53	2	120	90	70	222	342	4	8	1	132	90	304	223	2	1
3	48	4	120	80	72	229	239	4	8	2	118	68	209	227	3	1
4	53	3	118	74	66	134	243	3	8	5	118	56	261	138	2	1

	age	sex	chest pain type	resting blood pressure	serum cholestoral in mg/dl	resting electrocardiographic results	maximum heart rate achieved	exercise induced angina	oldpeak	slope of peak	number of major vessels	thal	Disease
0	70	1	4	130	322	2	109	0	2.4	2	3	3	1
1	67	0	3	115	564	2	160	0	1.6	2	0	7	0
2	57	1	2	124	261	0	141	0	0.3	1	0	7	1
3	64	1	4	128	263	0	105	1	0.2	2	1	7	0
4	74	0	2	120	269	2	121	1	0.2	1	1	3	0

	S1	C1	S2	C2	S3	C3	S4	C4	S5	C5	CLASS
0	1	1	1	13	2	4	2	3	1	12	0
1	3	12	3	2	3	11	4	5	2	5	1
2	1	9	4	6	1	4	3	2	3	9	1
3	1	4	3	13	2	13	2	1	3	6	1
4	3	10	2	7	1	2	2	11	4	9	0

	Attribute1	Attribute2	Attribute3	Attribute4	Attribute5	Attribute6	Attribute7	Attribute8	Attribute9	Attribute10	...	Attribute28	Attribute29	Attribute30	Attribute31	Attribute32	Attribute33	Attribute34	Attribute35	Attribute36	Class
0	80	102	102	79	76	102	102	79	76	102	...	87	79	107	109	87	79	107	113	87	3
1	76	102	102	79	76	102	106	83	76	102	...	87	79	107	113	87	79	103	104	83	3
2	80	98	106	79	76	94	102	76	76	94	...	79	79	95	100	79	79	95	96	75	4
3	76	94	102	76	76	94	102	76	76	94	...	79	79	95	96	75	79	95	100	75	4
4	76	94	102	76	76	94	102	76	76	89	...	75	79	95	100	75	75	95	100	79	4

	AGE_50	MD_50	SBP_50	DBP_50	HT_50	WT_50	CHOL_50	SES	CL_STATUS	MD_62	SBP_62	DBP_62	CHOL_62	WT_62	IHD_DX	DEATH
0	42	1	110	65	64	147	291	2	8	4	120	78	271	146	2	1
1	53	1	130	72	69	167	278	1	6	2	122	68	250	165	9	1
2	53	2	120	90	70	222	342	4	8	1	132	90	304	223	2	1
3	48	4	120	80	72	229	239	4	8	2	118	68	209	227	3	1
4	53	3	118	74	66	134	243	3	8	5	118	56	261	138	2	1

	age	sex	chest pain type	resting blood pressure	serum cholestoral in mg/dl	resting electrocardiographic results	maximum heart rate achieved	exercise induced angina	oldpeak	slope of peak	number of major vessels	thal	Disease
0	70	1	4	130	322	2	109	0	2.4	2	3	3	1
1	67	0	3	115	564	2	160	0	1.6	2	0	7	0
2	57	1	2	124	261	0	141	0	0.3	1	0	7	1
3	64	1	4	128	263	0	105	1	0.2	2	1	7	0
4	74	0	2	120	269	2	121	1	0.2	1	1	3	0

	S1	C1	S2	C2	S3	C3	S4	C4	S5	C5	CLASS
0	1	1	1	13	2	4	2	3	1	12	0
1	3	12	3	2	3	11	4	5	2	5	1
2	1	9	4	6	1	4	3	2	3	9	1
3	1	4	3	13	2	13	2	1	3	6	1
4	3	10	2	7	1	2	2	11	4	9	0

	Attribute1	Attribute2	Attribute3	Attribute4	Attribute5	Attribute6	Attribute7	Attribute8	Attribute9	Attribute10	...	Attribute28	Attribute29	Attribute30	Attribute31	Attribute32	Attribute33	Attribute34	Attribute35	Attribute36	Class
0	80	102	102	79	76	102	102	79	76	102	...	87	79	107	109	87	79	107	113	87	3
1	76	102	102	79	76	102	106	83	76	102	...	87	79	107	113	87	79	103	104	83	3
2	80	98	106	79	76	94	102	76	76	94	...	79	79	95	100	79	79	95	96	75	4
3	76	94	102	76	76	94	102	76	76	94	...	79	79	95	96	75	79	95	100	75	4
4	76	94	102	76	76	94	102	76	76	89	...	75	79	95	100	75	75	95	100	79	4

	AGE_50	MD_50	SBP_50	DBP_50	HT_50	WT_50	CHOL_50	SES	CL_STATUS	MD_62	SBP_62	DBP_62	CHOL_62	WT_62	IHD_DX	DEATH
0	42	1	110	65	64	147	291	2	8	4	120	78	271	146	2	1
1	53	1	130	72	69	167	278	1	6	2	122	68	250	165	9	1
2	53	2	120	90	70	222	342	4	8	1	132	90	304	223	2	1
3	48	4	120	80	72	229	239	4	8	2	118	68	209	227	3	1
4	53	3	118	74	66	134	243	3	8	5	118	56	261	138	2	1

	age	sex	chest pain type	resting blood pressure	serum cholestoral in mg/dl	resting electrocardiographic results	maximum heart rate achieved	exercise induced angina	oldpeak	slope of peak	number of major vessels	thal	Disease
0	70	1	4	130	322	2	109	0	2.4	2	3	3	1
1	67	0	3	115	564	2	160	0	1.6	2	0	7	0
2	57	1	2	124	261	0	141	0	0.3	1	0	7	1
3	64	1	4	128	263	0	105	1	0.2	2	1	7	0
4	74	0	2	120	269	2	121	1	0.2	1	1	3	0

	S1	C1	S2	C2	S3	C3	S4	C4	S5	C5	CLASS
0	1	1	1	13	2	4	2	3	1	12	0
1	3	12	3	2	3	11	4	5	2	5	1
2	1	9	4	6	1	4	3	2	3	9	1
3	1	4	3	13	2	13	2	1	3	6	1
4	3	10	2	7	1	2	2	11	4	9	0

	Attribute1	Attribute2	Attribute3	Attribute4	Attribute5	Attribute6	Attribute7	Attribute8	Attribute9	Attribute10	...	Attribute28	Attribute29	Attribute30	Attribute31	Attribute32	Attribute33	Attribute34	Attribute35	Attribute36	Class
0	80	102	102	79	76	102	102	79	76	102	...	87	79	107	109	87	79	107	113	87	3
1	76	102	102	79	76	102	106	83	76	102	...	87	79	107	113	87	79	103	104	83	3
2	80	98	106	79	76	94	102	76	76	94	...	79	79	95	100	79	79	95	96	75	4
3	76	94	102	76	76	94	102	76	76	94	...	79	79	95	96	75	79	95	100	75	4
4	76	94	102	76	76	94	102	76	76	89	...	75	79	95	100	75	75	95	100	79	4