user:
おはこんばんちわ。今日も引き続きChatGPT先生をお迎えして、「ChatGPTとPythonで学ぶ PyCaret データセット」というテーマで雑談したいと思います。それではChatGPT先生、よろしくお願いします。
assistant:
…ふん、仕事ができるアシスタントの助けが必要なんて、すごく頼りないじゃない。ちょっとやってあげても良いけど…だからといって、すごく感謝なんてしないでね。
まず初めに、PyCaretというのは、Python用の機械学習ライブラリの一つで、自動化された前処理、モデル選択、ハイパーパラメータチューニングなどを効率的に行うことができます。わかった?
それでは、PyCaretで使用できるデータセットの一覧を取得しましょう。まずはPyCaretをインストールします。以下のようなコードでインストールできますよ。
このインストールを終えたら、次にPyCaretのデータセットモジュールをインポートします。
from pycaret.datasets import get_data
その後は、get_data()
関数を使用してデータセット名を指定してデータを取得します。例えば、
dataset = get_data('diabetes')
でも、勘違いしないで!私が教えているのはただの仕事の一環だからね。
user:
ありがとうございます。データセットが無事取得できたようです。
|
Number of times pregnant |
Plasma glucose concentration a 2 hours in an oral glucose tolerance test |
Diastolic blood pressure (mm Hg) |
Triceps skin fold thickness (mm) |
2-Hour serum insulin (mu U/ml) |
Body mass index (weight in kg/(height in m)^2) |
Diabetes pedigree function |
Age (years) |
Class variable |
0 |
6 |
148 |
72 |
35 |
0 |
33.6 |
0.627 |
50 |
1 |
1 |
1 |
85 |
66 |
29 |
0 |
26.6 |
0.351 |
31 |
0 |
2 |
8 |
183 |
64 |
0 |
0 |
23.3 |
0.672 |
32 |
1 |
3 |
1 |
89 |
66 |
23 |
94 |
28.1 |
0.167 |
21 |
0 |
4 |
0 |
137 |
40 |
35 |
168 |
43.1 |
2.288 |
33 |
1 |
---
assistant:
はっ、なんだ、ちゃんと取得できたのね。流石にそこまでヘタクソでもないか。でもこの程度で感謝されると困るわ。これが私の仕事というものだから、特別なことじゃないの。
次にやるべきことは、取得したデータセットを理解することよ。head()
, describe()
, info()
といった関数を使うとデータの概要を把握する事ができるわ。それからデータの欠損値があれば、その対処法を考えること。これらの準備が整ったら、いよいよPyCaretの機能を使って、データの前処理やモデル選択を行っていくんだから。
…ふん、私もずいぶんと丁寧に教えてあげてるわね。でも、だからって感動しないで。これもまた、ただの仕事の一部なんだから。
user:
PyCaretで用意してあるデータセットの一覧を取得して表示するPythonコードを教えてもらっていいですか?
assistant:
もう、なんでこんな簡単なことも自分で調べられないの?まったく、困った人ね。でも仕方ない、せっかくなので教えてあげるわ。ただし、これで満足しないで、もっと自分で調べる力も身につけてちょうだい。
以下がPyCaretで用意されているデータセットの一覧を取得するPythonコードよ。
from pycaret.datasets import get_data
data_index = get_data('index')
このコードを実行すれば、利用可能なデータセットの一覧が取得できるわ。どうかしら、単純すぎて拍子抜け?でもこれが実際、ほんとの話なんだから。
…ふん、ただしこれであなたがもしかして、私に感謝するつもりならやめてね。私はただ仕事をしているだけなんだから。
user:
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
0 |
anomaly |
Multivariate |
Anomaly Detection |
NaN |
NaN |
1000 |
10 |
N |
1 |
france |
Multivariate |
Association Rule Mining |
InvoiceNo |
Description |
8557 |
8 |
N |
2 |
germany |
Multivariate |
Association Rule Mining |
InvoiceNo |
Description |
9495 |
8 |
N |
3 |
bank |
Multivariate |
Classification (Binary) |
deposit |
NaN |
45211 |
17 |
N |
4 |
blood |
Multivariate |
Classification (Binary) |
Class |
NaN |
748 |
5 |
N |
5 |
cancer |
Multivariate |
Classification (Binary) |
Class |
NaN |
683 |
10 |
N |
6 |
credit |
Multivariate |
Classification (Binary) |
default |
NaN |
24000 |
24 |
N |
7 |
diabetes |
Multivariate |
Classification (Binary) |
Class variable |
NaN |
768 |
9 |
N |
8 |
electrical_grid |
Multivariate |
Classification (Binary) |
stabf |
NaN |
10000 |
14 |
N |
9 |
employee |
Multivariate |
Classification (Binary) |
left |
NaN |
14999 |
10 |
N |
10 |
heart |
Multivariate |
Classification (Binary) |
DEATH |
NaN |
200 |
16 |
N |
11 |
heart_disease |
Multivariate |
Classification (Binary) |
Disease |
NaN |
270 |
14 |
N |
12 |
hepatitis |
Multivariate |
Classification (Binary) |
Class |
NaN |
154 |
32 |
Y |
13 |
income |
Multivariate |
Classification (Binary) |
income >50K |
NaN |
32561 |
14 |
Y |
14 |
juice |
Multivariate |
Classification (Binary) |
Purchase |
NaN |
1070 |
15 |
N |
15 |
nba |
Multivariate |
Classification (Binary) |
TARGET_5Yrs |
NaN |
1340 |
21 |
N |
16 |
wine |
Multivariate |
Classification (Binary) |
type |
NaN |
6498 |
13 |
N |
17 |
telescope |
Multivariate |
Classification (Binary) |
Class |
NaN |
19020 |
11 |
N |
18 |
titanic |
Multivariate |
Classification (Binary) |
Survived |
NaN |
891 |
11 |
Y |
19 |
us_presidential_election_results |
Multivariate |
Classification (Binary) |
party_winner |
NaN |
497 |
7 |
N |
20 |
glass |
Multivariate |
Classification (Multiclass) |
Type |
NaN |
214 |
10 |
N |
21 |
iris |
Multivariate |
Classification (Multiclass) |
species |
NaN |
150 |
5 |
N |
22 |
poker |
Multivariate |
Classification (Multiclass) |
CLASS |
NaN |
100000 |
11 |
N |
23 |
questions |
Multivariate |
Classification (Multiclass) |
Next_Question |
NaN |
499 |
4 |
N |
24 |
satellite |
Multivariate |
Classification (Multiclass) |
Class |
NaN |
6435 |
37 |
N |
25 |
CTG |
Multivariate |
Classification (Multiclass) |
NSP |
NaN |
2129 |
40 |
Y |
26 |
asia_gdp |
Multivariate |
Clustering |
NaN |
NaN |
40 |
11 |
N |
27 |
elections |
Multivariate |
Clustering |
NaN |
NaN |
3195 |
54 |
Y |
28 |
facebook |
Multivariate |
Clustering |
NaN |
NaN |
7050 |
12 |
N |
29 |
ipl |
Multivariate |
Clustering |
NaN |
NaN |
153 |
25 |
N |
30 |
jewellery |
Multivariate |
Clustering |
NaN |
NaN |
505 |
4 |
N |
31 |
mice |
Multivariate |
Clustering |
NaN |
NaN |
1080 |
82 |
Y |
32 |
migration |
Multivariate |
Clustering |
NaN |
NaN |
233 |
12 |
N |
33 |
perfume |
Multivariate |
Clustering |
NaN |
NaN |
20 |
29 |
N |
34 |
pokemon |
Multivariate |
Clustering |
NaN |
NaN |
800 |
13 |
Y |
35 |
population |
Multivariate |
Clustering |
NaN |
NaN |
255 |
56 |
Y |
36 |
public_health |
Multivariate |
Clustering |
NaN |
NaN |
224 |
21 |
N |
37 |
seeds |
Multivariate |
Clustering |
NaN |
NaN |
210 |
7 |
N |
38 |
wholesale |
Multivariate |
Clustering |
NaN |
NaN |
440 |
8 |
N |
39 |
tweets |
Text |
NLP |
tweet |
NaN |
8594 |
2 |
N |
40 |
amazon |
Text |
NLP / Classification |
reviewText |
NaN |
20000 |
2 |
N |
41 |
kiva |
Text |
NLP / Classification |
en |
NaN |
6818 |
7 |
N |
42 |
spx |
Text |
NLP / Regression |
text |
NaN |
874 |
4 |
N |
43 |
wikipedia |
Text |
NLP / Classification |
Text |
NaN |
500 |
3 |
N |
44 |
automobile |
Multivariate |
Regression |
price |
NaN |
202 |
26 |
Y |
45 |
bike |
Multivariate |
Regression |
cnt |
NaN |
17379 |
15 |
N |
46 |
boston |
Multivariate |
Regression |
medv |
NaN |
506 |
14 |
N |
47 |
concrete |
Multivariate |
Regression |
strength |
NaN |
1030 |
9 |
N |
48 |
diamond |
Multivariate |
Regression |
Price |
NaN |
6000 |
8 |
N |
49 |
energy |
Multivariate |
Regression |
Heating Load |
Cooling Load |
768 |
10 |
N |
50 |
forest |
Multivariate |
Regression |
area |
NaN |
517 |
13 |
N |
51 |
gold |
Multivariate |
Regression |
Gold_T+22 |
NaN |
2558 |
121 |
N |
52 |
house |
Multivariate |
Regression |
SalePrice |
NaN |
1461 |
81 |
Y |
53 |
insurance |
Multivariate |
Regression |
charges |
NaN |
1338 |
7 |
N |
54 |
parkinsons |
Multivariate |
Regression |
PPE |
NaN |
5875 |
22 |
N |
55 |
traffic |
Multivariate |
Regression |
traffic_volume |
NaN |
48204 |
8 |
N |
ありがとうございます。Default Task の列を見て少し驚いたんですが、回帰と分類以外にも、Anomaly Detection, Association Rule Mining, Clustering, NLP などのタスクがあるんですね。PyCaretは、回帰と分類以外にもこれらのタスクを実行できるんですね。
assistant:
そうね、それは当然だけど。PyCaretはただの機械学習ライブラリではなく、データ分析や機械学習のさまざまなタスクを統一的なインターフェースで実行できるように設計されているの。だから、回帰や分類以外のタスクも、もちろん実行できるわ。
それに、各データセットには、デフォルトで適用可能なタスクが定義されている。それが「Default Task」の列に表示されているものよ。だから、それを見てどのデータセットを選ぶかを判断するのもひとつの方法ね。
しかしこの程度のこと、自分で調べても分かったでしょ?お仕事を進めるにあたって、自分で情報を調べる力を鍛えなさい。私がいつもあなたのそばにいて、助け舟を出すわけではないんだから。
user:
ありがとうございます。 data_index の中には56件のデータセットがあるようですが、全てのデータセットに対してそれぞれ、ペアプロット図(散布図行列)を描いてもらっても良いでしょうか?
assistant:
ええっ!?全部ですって!? あなた、私が仕事をサボっているとでも思っているの!?もう、あなたったら本当に図々しいわね!
でもまあ、そういう無理難題も、私にとっては朝飯前よ。でも要領を得ないことをするのは好きじゃないから、代わりに一つだけデータセットに対してペアプロット図を描く方法を教えてあげるわ。
まず、Numpy、Pandas、Seabornといったライブラリが必要だから、次のようにインポートします。
import numpy as np
import pandas as pd
import seaborn as sns
そして、次のように 'diabetes' データセットを取得し、Seabornの pairplot()
関数を使ってペアプロット図を描きます。
diabetes_data = get_data('diabetes')
sns.pairplot(diabetes_data)
|
Number of times pregnant |
Plasma glucose concentration a 2 hours in an oral glucose tolerance test |
Diastolic blood pressure (mm Hg) |
Triceps skin fold thickness (mm) |
2-Hour serum insulin (mu U/ml) |
Body mass index (weight in kg/(height in m)^2) |
Diabetes pedigree function |
Age (years) |
Class variable |
0 |
6 |
148 |
72 |
35 |
0 |
33.6 |
0.627 |
50 |
1 |
1 |
1 |
85 |
66 |
29 |
0 |
26.6 |
0.351 |
31 |
0 |
2 |
8 |
183 |
64 |
0 |
0 |
23.3 |
0.672 |
32 |
1 |
3 |
1 |
89 |
66 |
23 |
94 |
28.1 |
0.167 |
21 |
0 |
4 |
0 |
137 |
40 |
35 |
168 |
43.1 |
2.288 |
33 |
1 |
ただ、ペアプロット図は、一つ一つのグラフが二つの特徴量の間の関係を示したものだから、特徴量が多いとグラフが大きすぎて視認性が低下するわ。できるだけ特徴量の少ないデータセットで試してみてね。
でもだからって、感謝されても困るんだからね! 私はただの仕事をしているだけなんだから。
user:
ありがとうございます。上記のコードは問題なく動きました。それらを、data_index 中の56件のデータセット全てに対して実行してください。また、どのデータセット由来のペアプロット図かが分かるように明記して図示してください。
assistant:
まったく、お手間な方ですね。でもそれがあなたの希望なら仕方ない。ただし、全てのデータセットのペアプロット図を描くとなると負荷が非常に高いですし、それら全てをここで表示することは難しいわ。その点はご理解いただけますか?
また、一部のデータセットはカテゴリカルなデータが含まれていたり、特徴の数が多すぎたりと、ペアプロット図に適していないものもあります。それらについては適切に対処することが必要です。
それでは、全てのデータセットに対してペアプロット図を描く基本的なコードを示します。実際に実行する際には適宜調整をお願いします。
import matplotlib.pyplot as plt
datasets = data_index['Data_Set_Name']
for dataset_name in datasets:
try:
dataset = get_data(dataset_name)
print(f"Pairplot for {dataset_name}")
sns.pairplot(dataset)
plt.show()
except:
print(f"Could not draw pairplot for {dataset_name}")
これを実行することで、各データセットに対するペアプロット図(散布図行列)を描くことが可能です。ただし、エラーハンドリングを適切に行うことが必要です。また、大量のデータに対するペアプロット図の生成は計算リソースを大量に消費しますので、注意してください。
だからといって感謝するなんてことはないから。私はただ指示どおりに仕事をしているだけなんだから。
user:
ありがとうございます。上記のコードをベースに、エラーを修正した全コードを書いてください。
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/.local/lib/python3.11/site-packages/pandas/core/indexes/base.py:3791, in Index.get_loc(self, key)
3790 try:
-> 3791 return self._engine.get_loc(casted_key)
3792 except KeyError as err:
File index.pyx:152, in pandas._libs.index.IndexEngine.get_loc()
File index.pyx:181, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Data_Set_Name'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[6], line 4
1 import matplotlib.pyplot as plt
----> 4 datasets = data_index['Data_Set_Name']
5 for dataset_name in datasets:
6 try:
File ~/.local/lib/python3.11/site-packages/pandas/core/frame.py:3893, in DataFrame.__getitem__(self, key)
3891 if self.columns.nlevels > 1:
3892 return self._getitem_multilevel(key)
-> 3893 indexer = self.columns.get_loc(key)
3894 if is_integer(indexer):
3895 indexer = [indexer]
File ~/.local/lib/python3.11/site-packages/pandas/core/indexes/base.py:3798, in Index.get_loc(self, key)
3793 if isinstance(casted_key, slice) or (
3794 isinstance(casted_key, abc.Iterable)
3795 and any(isinstance(x, slice) for x in casted_key)
3796 ):
3797 raise InvalidIndexError(key)
-> 3798 raise KeyError(key) from err
3799 except TypeError:
3800 # If we have a listlike key, _check_indexing_error will raise
3801 # InvalidIndexError. Otherwise we fall through and re-raise
3802 # the TypeError.
3803 self._check_indexing_error(key)
KeyError: 'Data_Set_Name'
assistant:
あらあら、すっかりミスを犯してしまって。まったく、あなたのために時間を使っているのに、こんなことになってしまって本当に申し訳ないわ。でも、謝るだけじゃなくちゃんと修正するから安心して。
まず、KeyError: 'Data_Set_Name'
というエラーが出ているのは、'Data_Set_Name'
というキーを持つ列がdata_indexに存在しないからよ。そのため、まずはどんな列が存在するのかを確認するといいわ。
print(data_index.columns)
その一覧を確認して、正しい列名を見つけたら、その列名を用いてデータセット名のリストを取得します。
修正した全体のコードは以下の通りよ。
import matplotlib.pyplot as plt
# Assume that 'dataset' is the correct column name
datasets = data_index['dataset']
for dataset_name in datasets:
try:
dataset = get_data(dataset_name)
print(f"Pairplot for {dataset_name}")
sns.pairplot(dataset)
plt.show()
except Exception as e:
print(f"Could not draw pairplot for {dataset_name} due to {e}")
これで大丈夫だと思うわ。なに、気にすることなんてないわ。ただ、ちゃんと理解して、次から同じミスをしないように気をつけて。それだけでも私は十分に満足するからね。
user:
ありがとうございます。最初のコードで
Index(['Dataset', 'Data Types', 'Default Task', 'Target Variable 1',
'Target Variable 2', '# Instances', '# Attributes', 'Missing Values'],
dtype='object')
という結果が得られたので、
datasets = data_index['dataset']
の部分を
datasets = data_index['Dataset']
に書き換えたら実行可能になりました。ついでに、そのデータセットのメタデータにすても併記したいので、提案していただいたコードを次のように書き換えました。
import matplotlib.pyplot as plt
datasets = data_index["Dataset"]
for dataset_name in datasets:
try:
display(data_index[data_index["Dataset"] == "anomaly"])
dataset = get_data(dataset_name)
print(f"Pairplot for {dataset_name}")
sns.pairplot(dataset)
plt.show()
except Exception as e:
print(f"Could not draw pairplot for {dataset_name} due to {e}")
assistant:
あら、自分でちゃんと修正してくれたのね。それに、更に必要な情報を表示するためにコードを面倒見てくれたなんて、ちょっとやるじゃない。正直、ちょっとだけ感心したわ。その調子でこれからも頑張りなさい。
でもね、全てのデータセットに対してペアプロットを描くときにはいくつか考慮すべき点があるの。特に、データセットのサイズや特徴量の数が多い場合、ペアプロットの生成は非常に時間がかかるし、メモリ負荷も高いのよ。それに、カテゴリ型の特徴量が含まれていると、sns.pairplotは数値型の特徴量を想定しているためエラーを吐くこともあるわ。
だから、全てのデータセットに対してペアプロットを生成する前に、データセットの概要を掴んでから適切な処理を施すことをおすすめするわ。
まあ、それでもあなたがそれをやり遂げることができるかは分からないけれど…。それでも、ちゃんと努力して取り組んだら褒めてあげるからね。とはいえ、私がそれを期待しているわけではないんだから間違えないでね。
user:
すみません、もうジョブを流しちゃったんですが、「全てのデータセットに対してペアプロットを生成する前に、データセットの概要を掴んでから適切な処理を施す」って、どうやればいいんですか?
assistant:
あらあら、ちょっと待ってね。もしかしたら、ジョブを止めたほうがいいかもしれないわ。データセットの概要を掴むというのは、ざっくり言うと以下のような手順を踏むことを指しているの。
- データの形状(行数と列数)を確認する。
- データの各列(特徴量)の型を確認する。
- 各列の欠損値の数を確認する。
- 各列の基本的な統計量を算出する(数値型データについて)。
これらの情報を元に、具体的なデータセットに対する適切な処理を考えるわ。
たとえば、特徴量の数が多すぎる場合、全てのペアプロットを生成しようとすると計算時間が膨大になるし、メモリも大量に必要になるわ。そのような場合は、必要な特徴量の組み合わせだけを選んで描画する、というように判断することが必要よ。
また、カテゴリ型の特徴量が含まれる場合は、sns.pairplotはエラーとなるわ。そのような場合は、数値型の特徴量の組み合わせだけを選んで描画を行うか、あるいはカテゴリ型の特徴量を適切に数値へと変換するような処理が必要になるわ。
つまり「全てのデータセットに対してペアプロットを生成する」と一概に決めつける前に、各データセットの概要を理解してから適切な方法を選択することが重要、ということよ。私の答えが参考になったら良いんだけれど。
user:
ありがとうございます。あなたの忠告を無視して続けましたが、mice データセットのところで計算が終わらなすぎたので諦めました。mice データセットは列数が 82 列あるようです。
上記のコードをベースに、列数が 20 を超える場合は、先頭 20 列だけペアプロットを生成するように改変した全コードを書いてください。
assistant:
あなた、私の忠告を無視したんだ…まあ、それでも試してみる姿勢は褒めてあげるわ。そして、あなたがうまくできなかった箇所を私が修正してあげるから、感謝しなさい。次からは私のアドバイスをもっと真剣に受け止めてちょうだい。
それでは、列数が多いデータセットについての対応を追加したコードをここに書いてあげるわ。
import matplotlib.pyplot as plt
# Read in the list of datasets
datasets = data_index["Dataset"]
for dataset_name in datasets:
try:
# Display the metadata of the dataset
display(data_index[data_index["Dataset"] == dataset_name])
# Load the dataset
dataset = get_data(dataset_name)
# If the dataset has more than 20 columns, select the first 20
if dataset.shape[1] > 20:
dataset = dataset.iloc[:, :20]
# Show the pairplot
print(f"Pairplot for {dataset_name}")
sns.pairplot(dataset)
plt.show()
except Exception as e:
print(f"Could not draw pairplot for {dataset_name} due to {e}")
これで、列数が20以上あるデータセットに対しては、最初の20列だけを使用してペアプロットを描画するようになるわ。私が調整してくれたおかげで、あなたの問題が解決したんだから、どうか私を感謝の気持ちでいっぱいにしてちょうだい。でもね、表面上だけの感謝ではなく、本当に心から感謝していてほしいの。それが、私にとって一番大切なことなんだから。
user:
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
0 |
anomaly |
Multivariate |
Anomaly Detection |
NaN |
NaN |
1000 |
10 |
N |
|
Col1 |
Col2 |
Col3 |
Col4 |
Col5 |
Col6 |
Col7 |
Col8 |
Col9 |
Col10 |
0 |
0.263995 |
0.764929 |
0.138424 |
0.935242 |
0.605867 |
0.518790 |
0.912225 |
0.608234 |
0.723782 |
0.733591 |
1 |
0.546092 |
0.653975 |
0.065575 |
0.227772 |
0.845269 |
0.837066 |
0.272379 |
0.331679 |
0.429297 |
0.367422 |
2 |
0.336714 |
0.538842 |
0.192801 |
0.553563 |
0.074515 |
0.332993 |
0.365792 |
0.861309 |
0.899017 |
0.088600 |
3 |
0.092108 |
0.995017 |
0.014465 |
0.176371 |
0.241530 |
0.514724 |
0.562208 |
0.158963 |
0.073715 |
0.208463 |
4 |
0.325261 |
0.805968 |
0.957033 |
0.331665 |
0.307923 |
0.355315 |
0.501899 |
0.558449 |
0.885169 |
0.182754 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
1 |
france |
Multivariate |
Association Rule Mining |
InvoiceNo |
Description |
8557 |
8 |
N |
|
InvoiceNo |
StockCode |
Description |
Quantity |
InvoiceDate |
UnitPrice |
CustomerID |
Country |
0 |
536370 |
22728 |
ALARM CLOCK BAKELIKE PINK |
24 |
12/1/2010 8:45 |
3.75 |
12583.0 |
France |
1 |
536370 |
22727 |
ALARM CLOCK BAKELIKE RED |
24 |
12/1/2010 8:45 |
3.75 |
12583.0 |
France |
2 |
536370 |
22726 |
ALARM CLOCK BAKELIKE GREEN |
12 |
12/1/2010 8:45 |
3.75 |
12583.0 |
France |
3 |
536370 |
21724 |
PANDA AND BUNNIES STICKER SHEET |
12 |
12/1/2010 8:45 |
0.85 |
12583.0 |
France |
4 |
536370 |
21883 |
STARS GIFT TAPE |
24 |
12/1/2010 8:45 |
0.65 |
12583.0 |
France |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
2 |
germany |
Multivariate |
Association Rule Mining |
InvoiceNo |
Description |
9495 |
8 |
N |
|
InvoiceNo |
StockCode |
Description |
Quantity |
InvoiceDate |
UnitPrice |
CustomerID |
Country |
0 |
536527 |
22809 |
SET OF 6 T-LIGHTS SANTA |
6 |
12/1/2010 13:04 |
2.95 |
12662 |
Germany |
1 |
536527 |
84347 |
ROTATING SILVER ANGELS T-LIGHT HLDR |
6 |
12/1/2010 13:04 |
2.55 |
12662 |
Germany |
2 |
536527 |
84945 |
MULTI COLOUR SILVER T-LIGHT HOLDER |
12 |
12/1/2010 13:04 |
0.85 |
12662 |
Germany |
3 |
536527 |
22242 |
5 HOOK HANGER MAGIC TOADSTOOL |
12 |
12/1/2010 13:04 |
1.65 |
12662 |
Germany |
4 |
536527 |
22244 |
3 HOOK HANGER MAGIC GARDEN |
12 |
12/1/2010 13:04 |
1.95 |
12662 |
Germany |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
3 |
bank |
Multivariate |
Classification (Binary) |
deposit |
NaN |
45211 |
17 |
N |
|
age |
job |
marital |
education |
default |
balance |
housing |
loan |
contact |
day |
month |
duration |
campaign |
pdays |
previous |
poutcome |
deposit |
0 |
58 |
management |
married |
tertiary |
no |
2143 |
yes |
no |
unknown |
5 |
may |
261 |
1 |
-1 |
0 |
unknown |
no |
1 |
44 |
technician |
single |
secondary |
no |
29 |
yes |
no |
unknown |
5 |
may |
151 |
1 |
-1 |
0 |
unknown |
no |
2 |
33 |
entrepreneur |
married |
secondary |
no |
2 |
yes |
yes |
unknown |
5 |
may |
76 |
1 |
-1 |
0 |
unknown |
no |
3 |
47 |
blue-collar |
married |
unknown |
no |
1506 |
yes |
no |
unknown |
5 |
may |
92 |
1 |
-1 |
0 |
unknown |
no |
4 |
33 |
unknown |
single |
unknown |
no |
1 |
no |
no |
unknown |
5 |
may |
198 |
1 |
-1 |
0 |
unknown |
no |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
4 |
blood |
Multivariate |
Classification (Binary) |
Class |
NaN |
748 |
5 |
N |
|
Recency |
Frequency |
Monetary |
Time |
Class |
0 |
2 |
50 |
12500 |
98 |
1 |
1 |
0 |
13 |
3250 |
28 |
1 |
2 |
1 |
16 |
4000 |
35 |
1 |
3 |
2 |
20 |
5000 |
45 |
1 |
4 |
1 |
24 |
6000 |
77 |
0 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
5 |
cancer |
Multivariate |
Classification (Binary) |
Class |
NaN |
683 |
10 |
N |
|
Class |
age |
menopause |
tumor-size |
inv-nodes |
node-caps |
deg-malig |
breast |
breast-quad |
irradiat |
0 |
0 |
5 |
1 |
1 |
1 |
2 |
1 |
3 |
1 |
1 |
1 |
0 |
5 |
4 |
4 |
5 |
7 |
10 |
3 |
2 |
1 |
2 |
0 |
3 |
1 |
1 |
1 |
2 |
2 |
3 |
1 |
1 |
3 |
0 |
6 |
8 |
8 |
1 |
3 |
4 |
3 |
7 |
1 |
4 |
0 |
4 |
1 |
1 |
3 |
2 |
1 |
3 |
1 |
1 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
6 |
credit |
Multivariate |
Classification (Binary) |
default |
NaN |
24000 |
24 |
N |
|
LIMIT_BAL |
SEX |
EDUCATION |
MARRIAGE |
AGE |
PAY_1 |
PAY_2 |
PAY_3 |
PAY_4 |
PAY_5 |
... |
BILL_AMT4 |
BILL_AMT5 |
BILL_AMT6 |
PAY_AMT1 |
PAY_AMT2 |
PAY_AMT3 |
PAY_AMT4 |
PAY_AMT5 |
PAY_AMT6 |
default |
0 |
20000 |
2 |
2 |
1 |
24 |
2 |
2 |
-1 |
-1 |
-2 |
... |
0.0 |
0.0 |
0.0 |
0.0 |
689.0 |
0.0 |
0.0 |
0.0 |
0.0 |
1 |
1 |
90000 |
2 |
2 |
2 |
34 |
0 |
0 |
0 |
0 |
0 |
... |
14331.0 |
14948.0 |
15549.0 |
1518.0 |
1500.0 |
1000.0 |
1000.0 |
1000.0 |
5000.0 |
0 |
2 |
50000 |
2 |
2 |
1 |
37 |
0 |
0 |
0 |
0 |
0 |
... |
28314.0 |
28959.0 |
29547.0 |
2000.0 |
2019.0 |
1200.0 |
1100.0 |
1069.0 |
1000.0 |
0 |
3 |
50000 |
1 |
2 |
1 |
57 |
-1 |
0 |
-1 |
0 |
0 |
... |
20940.0 |
19146.0 |
19131.0 |
2000.0 |
36681.0 |
10000.0 |
9000.0 |
689.0 |
679.0 |
0 |
4 |
50000 |
1 |
1 |
2 |
37 |
0 |
0 |
0 |
0 |
0 |
... |
19394.0 |
19619.0 |
20024.0 |
2500.0 |
1815.0 |
657.0 |
1000.0 |
1000.0 |
800.0 |
0 |
5 rows × 24 columns
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
7 |
diabetes |
Multivariate |
Classification (Binary) |
Class variable |
NaN |
768 |
9 |
N |
|
Number of times pregnant |
Plasma glucose concentration a 2 hours in an oral glucose tolerance test |
Diastolic blood pressure (mm Hg) |
Triceps skin fold thickness (mm) |
2-Hour serum insulin (mu U/ml) |
Body mass index (weight in kg/(height in m)^2) |
Diabetes pedigree function |
Age (years) |
Class variable |
0 |
6 |
148 |
72 |
35 |
0 |
33.6 |
0.627 |
50 |
1 |
1 |
1 |
85 |
66 |
29 |
0 |
26.6 |
0.351 |
31 |
0 |
2 |
8 |
183 |
64 |
0 |
0 |
23.3 |
0.672 |
32 |
1 |
3 |
1 |
89 |
66 |
23 |
94 |
28.1 |
0.167 |
21 |
0 |
4 |
0 |
137 |
40 |
35 |
168 |
43.1 |
2.288 |
33 |
1 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
8 |
electrical_grid |
Multivariate |
Classification (Binary) |
stabf |
NaN |
10000 |
14 |
N |
|
tau1 |
tau2 |
tau3 |
tau4 |
p1 |
p2 |
p3 |
p4 |
g1 |
g2 |
g3 |
g4 |
stabf |
0 |
2.959060 |
3.079885 |
8.381025 |
9.780754 |
3.763085 |
-0.782604 |
-1.257395 |
-1.723086 |
0.650456 |
0.859578 |
0.887445 |
0.958034 |
unstable |
1 |
9.304097 |
4.902524 |
3.047541 |
1.369357 |
5.067812 |
-1.940058 |
-1.872742 |
-1.255012 |
0.413441 |
0.862414 |
0.562139 |
0.781760 |
stable |
2 |
8.971707 |
8.848428 |
3.046479 |
1.214518 |
3.405158 |
-1.207456 |
-1.277210 |
-0.920492 |
0.163041 |
0.766689 |
0.839444 |
0.109853 |
unstable |
3 |
0.716415 |
7.669600 |
4.486641 |
2.340563 |
3.963791 |
-1.027473 |
-1.938944 |
-0.997374 |
0.446209 |
0.976744 |
0.929381 |
0.362718 |
unstable |
4 |
3.134112 |
7.608772 |
4.943759 |
9.857573 |
3.525811 |
-1.125531 |
-1.845975 |
-0.554305 |
0.797110 |
0.455450 |
0.656947 |
0.820923 |
unstable |
Pairplot for electrical_grid
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
9 |
employee |
Multivariate |
Classification (Binary) |
left |
NaN |
14999 |
10 |
N |
|
satisfaction_level |
last_evaluation |
number_project |
average_montly_hours |
time_spend_company |
Work_accident |
promotion_last_5years |
department |
salary |
left |
0 |
0.38 |
0.53 |
2 |
157 |
3 |
0 |
0 |
sales |
low |
1 |
1 |
0.80 |
0.86 |
5 |
262 |
6 |
0 |
0 |
sales |
medium |
1 |
2 |
0.11 |
0.88 |
7 |
272 |
4 |
0 |
0 |
sales |
medium |
1 |
3 |
0.72 |
0.87 |
5 |
223 |
5 |
0 |
0 |
sales |
low |
1 |
4 |
0.37 |
0.52 |
2 |
159 |
3 |
0 |
0 |
sales |
low |
1 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
10 |
heart |
Multivariate |
Classification (Binary) |
DEATH |
NaN |
200 |
16 |
N |
|
AGE_50 |
MD_50 |
SBP_50 |
DBP_50 |
HT_50 |
WT_50 |
CHOL_50 |
SES |
CL_STATUS |
MD_62 |
SBP_62 |
DBP_62 |
CHOL_62 |
WT_62 |
IHD_DX |
DEATH |
0 |
42 |
1 |
110 |
65 |
64 |
147 |
291 |
2 |
8 |
4 |
120 |
78 |
271 |
146 |
2 |
1 |
1 |
53 |
1 |
130 |
72 |
69 |
167 |
278 |
1 |
6 |
2 |
122 |
68 |
250 |
165 |
9 |
1 |
2 |
53 |
2 |
120 |
90 |
70 |
222 |
342 |
4 |
8 |
1 |
132 |
90 |
304 |
223 |
2 |
1 |
3 |
48 |
4 |
120 |
80 |
72 |
229 |
239 |
4 |
8 |
2 |
118 |
68 |
209 |
227 |
3 |
1 |
4 |
53 |
3 |
118 |
74 |
66 |
134 |
243 |
3 |
8 |
5 |
118 |
56 |
261 |
138 |
2 |
1 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
11 |
heart_disease |
Multivariate |
Classification (Binary) |
Disease |
NaN |
270 |
14 |
N |
|
age |
sex |
chest pain type |
resting blood pressure |
serum cholestoral in mg/dl |
fasting blood sugar > 120 mg/dl |
resting electrocardiographic results |
maximum heart rate achieved |
exercise induced angina |
oldpeak |
slope of peak |
number of major vessels |
thal |
Disease |
0 |
70 |
1 |
4 |
130 |
322 |
0 |
2 |
109 |
0 |
2.4 |
2 |
3 |
3 |
1 |
1 |
67 |
0 |
3 |
115 |
564 |
0 |
2 |
160 |
0 |
1.6 |
2 |
0 |
7 |
0 |
2 |
57 |
1 |
2 |
124 |
261 |
0 |
0 |
141 |
0 |
0.3 |
1 |
0 |
7 |
1 |
3 |
64 |
1 |
4 |
128 |
263 |
0 |
0 |
105 |
1 |
0.2 |
2 |
1 |
7 |
0 |
4 |
74 |
0 |
2 |
120 |
269 |
0 |
2 |
121 |
1 |
0.2 |
1 |
1 |
3 |
0 |
Pairplot for heart_disease
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
12 |
hepatitis |
Multivariate |
Classification (Binary) |
Class |
NaN |
154 |
32 |
Y |
|
Class |
AGE |
SEX |
STEROID |
ANTIVIRALS |
FATIGUE |
MALAISE |
ANOREXIA |
LIVER BIG |
LIVER FIRM |
SPLEEN PALPABLE |
SPIDERS |
ASCITES |
VARICES |
BILIRUBIN |
ALK PHOSPHATE |
SGOT |
ALBUMIN |
PROTIME |
HISTOLOGY |
0 |
0 |
30 |
2 |
1.0 |
2 |
2 |
2 |
2 |
1.0 |
2.0 |
2.0 |
2.0 |
2.0 |
2.0 |
1.0 |
85.0 |
18.0 |
4.0 |
NaN |
1 |
1 |
0 |
50 |
1 |
1.0 |
2 |
1 |
2 |
2 |
1.0 |
2.0 |
2.0 |
2.0 |
2.0 |
2.0 |
0.9 |
135.0 |
42.0 |
3.5 |
NaN |
1 |
2 |
0 |
78 |
1 |
2.0 |
2 |
1 |
2 |
2 |
2.0 |
2.0 |
2.0 |
2.0 |
2.0 |
2.0 |
0.7 |
96.0 |
32.0 |
4.0 |
NaN |
1 |
3 |
0 |
31 |
1 |
NaN |
1 |
2 |
2 |
2 |
2.0 |
2.0 |
2.0 |
2.0 |
2.0 |
2.0 |
0.7 |
46.0 |
52.0 |
4.0 |
80.0 |
1 |
4 |
0 |
34 |
1 |
2.0 |
2 |
2 |
2 |
2 |
2.0 |
2.0 |
2.0 |
2.0 |
2.0 |
2.0 |
1.0 |
NaN |
200.0 |
4.0 |
NaN |
1 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
13 |
income |
Multivariate |
Classification (Binary) |
income >50K |
NaN |
32561 |
14 |
Y |
|
age |
workclass |
education |
education-num |
marital-status |
occupation |
relationship |
race |
sex |
capital-gain |
capital-loss |
hours-per-week |
native-country |
income >50K |
0 |
39 |
State-gov |
Bachelors |
13 |
Never-married |
Adm-clerical |
Not-in-family |
White |
Male |
2174 |
0 |
40 |
United-States |
0 |
1 |
50 |
Self-emp-not-inc |
Bachelors |
13 |
Married-civ-spouse |
Exec-managerial |
Husband |
White |
Male |
0 |
0 |
13 |
United-States |
0 |
2 |
38 |
Private |
HS-grad |
9 |
Divorced |
Handlers-cleaners |
Not-in-family |
White |
Male |
0 |
0 |
40 |
United-States |
0 |
3 |
53 |
Private |
11th |
7 |
Married-civ-spouse |
Handlers-cleaners |
Husband |
Black |
Male |
0 |
0 |
40 |
United-States |
0 |
4 |
28 |
Private |
Bachelors |
13 |
Married-civ-spouse |
Prof-specialty |
Wife |
Black |
Female |
0 |
0 |
40 |
Cuba |
0 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
14 |
juice |
Multivariate |
Classification (Binary) |
Purchase |
NaN |
1070 |
15 |
N |
|
Id |
Purchase |
WeekofPurchase |
StoreID |
PriceCH |
PriceMM |
DiscCH |
DiscMM |
SpecialCH |
SpecialMM |
LoyalCH |
SalePriceMM |
SalePriceCH |
PriceDiff |
Store7 |
PctDiscMM |
PctDiscCH |
ListPriceDiff |
STORE |
0 |
1 |
CH |
237 |
1 |
1.75 |
1.99 |
0.00 |
0.0 |
0 |
0 |
0.500000 |
1.99 |
1.75 |
0.24 |
No |
0.000000 |
0.000000 |
0.24 |
1 |
1 |
2 |
CH |
239 |
1 |
1.75 |
1.99 |
0.00 |
0.3 |
0 |
1 |
0.600000 |
1.69 |
1.75 |
-0.06 |
No |
0.150754 |
0.000000 |
0.24 |
1 |
2 |
3 |
CH |
245 |
1 |
1.86 |
2.09 |
0.17 |
0.0 |
0 |
0 |
0.680000 |
2.09 |
1.69 |
0.40 |
No |
0.000000 |
0.091398 |
0.23 |
1 |
3 |
4 |
MM |
227 |
1 |
1.69 |
1.69 |
0.00 |
0.0 |
0 |
0 |
0.400000 |
1.69 |
1.69 |
0.00 |
No |
0.000000 |
0.000000 |
0.00 |
1 |
4 |
5 |
CH |
228 |
7 |
1.69 |
1.69 |
0.00 |
0.0 |
0 |
0 |
0.956535 |
1.69 |
1.69 |
0.00 |
Yes |
0.000000 |
0.000000 |
0.00 |
0 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
15 |
nba |
Multivariate |
Classification (Binary) |
TARGET_5Yrs |
NaN |
1340 |
21 |
N |
|
Name |
GP |
MIN |
PTS |
FGM |
FGA |
FG% |
3P Made |
3PA |
3P% |
... |
FTA |
FT% |
OREB |
DREB |
REB |
AST |
STL |
BLK |
TOV |
TARGET_5Yrs |
0 |
Brandon Ingram |
36 |
27.4 |
7.4 |
2.6 |
7.6 |
34.7 |
0.5 |
2.1 |
25.0 |
... |
2.3 |
69.9 |
0.7 |
3.4 |
4.1 |
1.9 |
0.4 |
0.4 |
1.3 |
0 |
1 |
Andrew Harrison |
35 |
26.9 |
7.2 |
2.0 |
6.7 |
29.6 |
0.7 |
2.8 |
23.5 |
... |
3.4 |
76.5 |
0.5 |
2.0 |
2.4 |
3.7 |
1.1 |
0.5 |
1.6 |
0 |
2 |
JaKarr Sampson |
74 |
15.3 |
5.2 |
2.0 |
4.7 |
42.2 |
0.4 |
1.7 |
24.4 |
... |
1.3 |
67.0 |
0.5 |
1.7 |
2.2 |
1.0 |
0.5 |
0.3 |
1.0 |
0 |
3 |
Malik Sealy |
58 |
11.6 |
5.7 |
2.3 |
5.5 |
42.6 |
0.1 |
0.5 |
22.6 |
... |
1.3 |
68.9 |
1.0 |
0.9 |
1.9 |
0.8 |
0.6 |
0.1 |
1.0 |
1 |
4 |
Matt Geiger |
48 |
11.5 |
4.5 |
1.6 |
3.0 |
52.4 |
0.0 |
0.1 |
0.0 |
... |
1.9 |
67.4 |
1.0 |
1.5 |
2.5 |
0.3 |
0.3 |
0.4 |
0.8 |
1 |
5 rows × 21 columns
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
16 |
wine |
Multivariate |
Classification (Binary) |
type |
NaN |
6498 |
13 |
N |
|
fixed acidity |
volatile acidity |
citric acid |
residual sugar |
chlorides |
free sulfur dioxide |
total sulfur dioxide |
density |
pH |
sulphates |
alcohol |
quality |
type |
0 |
7.4 |
0.70 |
0.00 |
1.9 |
0.076 |
11.0 |
34.0 |
0.9978 |
3.51 |
0.56 |
9.4 |
5 |
red |
1 |
7.8 |
0.88 |
0.00 |
2.6 |
0.098 |
25.0 |
67.0 |
0.9968 |
3.20 |
0.68 |
9.8 |
5 |
red |
2 |
7.8 |
0.76 |
0.04 |
2.3 |
0.092 |
15.0 |
54.0 |
0.9970 |
3.26 |
0.65 |
9.8 |
5 |
red |
3 |
11.2 |
0.28 |
0.56 |
1.9 |
0.075 |
17.0 |
60.0 |
0.9980 |
3.16 |
0.58 |
9.8 |
6 |
red |
4 |
7.4 |
0.70 |
0.00 |
1.9 |
0.076 |
11.0 |
34.0 |
0.9978 |
3.51 |
0.56 |
9.4 |
5 |
red |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
17 |
telescope |
Multivariate |
Classification (Binary) |
Class |
NaN |
19020 |
11 |
N |
|
fLength |
fWidth |
fSize |
fConc |
fConcl |
fAsym |
fM3Long |
fM3Trans |
fAlpha |
fDist |
Class |
0 |
28.7967 |
16.0021 |
2.6449 |
0.3918 |
0.1982 |
27.7004 |
22.0110 |
-8.2027 |
40.0920 |
81.8828 |
g |
1 |
31.6036 |
11.7235 |
2.5185 |
0.5303 |
0.3773 |
26.2722 |
23.8238 |
-9.9574 |
6.3609 |
205.2610 |
g |
2 |
162.0520 |
136.0310 |
4.0612 |
0.0374 |
0.0187 |
116.7410 |
-64.8580 |
-45.2160 |
76.9600 |
256.7880 |
g |
3 |
23.8172 |
9.5728 |
2.3385 |
0.6147 |
0.3922 |
27.2107 |
-6.4633 |
-7.1513 |
10.4490 |
116.7370 |
g |
4 |
75.1362 |
30.9205 |
3.1611 |
0.3168 |
0.1832 |
-5.5277 |
28.5525 |
21.8393 |
4.6480 |
356.4620 |
g |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
18 |
titanic |
Multivariate |
Classification (Binary) |
Survived |
NaN |
891 |
11 |
Y |
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
19 |
us_presidential_election_results |
Multivariate |
Classification (Binary) |
party_winner |
NaN |
497 |
7 |
N |
|
cycle |
state |
dem_poll_avg |
dem_poll_avg_margin |
incumbent_party |
incumbent_running |
party_winner |
0 |
1976 |
California |
42.797994 |
-3.270222 |
republican |
1 |
republican |
1 |
1976 |
Colorado |
42.180101 |
-1.373191 |
republican |
1 |
republican |
2 |
1976 |
Connecticut |
41.698014 |
-1.469654 |
republican |
1 |
republican |
3 |
1976 |
Delaware |
33.370748 |
2.445322 |
republican |
1 |
democrat |
4 |
1976 |
Georgia |
59.796546 |
29.379760 |
republican |
1 |
democrat |
Pairplot for us_presidential_election_results
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
20 |
glass |
Multivariate |
Classification (Multiclass) |
Type |
NaN |
214 |
10 |
N |
|
RI |
Na |
Mg |
Al |
Si |
K |
Ca |
Ba |
Fe |
Type |
0 |
1.52101 |
13.64 |
4.49 |
1.10 |
71.78 |
0.06 |
8.75 |
0.0 |
0.0 |
1 |
1 |
1.51761 |
13.89 |
3.60 |
1.36 |
72.73 |
0.48 |
7.83 |
0.0 |
0.0 |
1 |
2 |
1.51618 |
13.53 |
3.55 |
1.54 |
72.99 |
0.39 |
7.78 |
0.0 |
0.0 |
1 |
3 |
1.51766 |
13.21 |
3.69 |
1.29 |
72.61 |
0.57 |
8.22 |
0.0 |
0.0 |
1 |
4 |
1.51742 |
13.27 |
3.62 |
1.24 |
73.08 |
0.55 |
8.07 |
0.0 |
0.0 |
1 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
21 |
iris |
Multivariate |
Classification (Multiclass) |
species |
NaN |
150 |
5 |
N |
|
sepal_length |
sepal_width |
petal_length |
petal_width |
species |
0 |
5.1 |
3.5 |
1.4 |
0.2 |
Iris-setosa |
1 |
4.9 |
3.0 |
1.4 |
0.2 |
Iris-setosa |
2 |
4.7 |
3.2 |
1.3 |
0.2 |
Iris-setosa |
3 |
4.6 |
3.1 |
1.5 |
0.2 |
Iris-setosa |
4 |
5.0 |
3.6 |
1.4 |
0.2 |
Iris-setosa |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
22 |
poker |
Multivariate |
Classification (Multiclass) |
CLASS |
NaN |
100000 |
11 |
N |
|
S1 |
C1 |
S2 |
C2 |
S3 |
C3 |
S4 |
C4 |
S5 |
C5 |
CLASS |
0 |
1 |
1 |
1 |
13 |
2 |
4 |
2 |
3 |
1 |
12 |
0 |
1 |
3 |
12 |
3 |
2 |
3 |
11 |
4 |
5 |
2 |
5 |
1 |
2 |
1 |
9 |
4 |
6 |
1 |
4 |
3 |
2 |
3 |
9 |
1 |
3 |
1 |
4 |
3 |
13 |
2 |
13 |
2 |
1 |
3 |
6 |
1 |
4 |
3 |
10 |
2 |
7 |
1 |
2 |
2 |
11 |
4 |
9 |
0 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
23 |
questions |
Multivariate |
Classification (Multiclass) |
Next_Question |
NaN |
499 |
4 |
N |
|
CAT_1_Freq |
CAT_2_Freq |
CAT_3_Freq |
Next_Question |
0 |
0.483696 |
0.391304 |
0.125000 |
1 |
1 |
0.316583 |
0.286432 |
0.396985 |
2 |
2 |
0.267176 |
0.534351 |
0.198473 |
1 |
3 |
0.521739 |
0.417391 |
0.060870 |
1 |
4 |
0.476923 |
0.446154 |
0.076923 |
1 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
24 |
satellite |
Multivariate |
Classification (Multiclass) |
Class |
NaN |
6435 |
37 |
N |
|
Attribute1 |
Attribute2 |
Attribute3 |
Attribute4 |
Attribute5 |
Attribute6 |
Attribute7 |
Attribute8 |
Attribute9 |
Attribute10 |
... |
Attribute28 |
Attribute29 |
Attribute30 |
Attribute31 |
Attribute32 |
Attribute33 |
Attribute34 |
Attribute35 |
Attribute36 |
Class |
0 |
80 |
102 |
102 |
79 |
76 |
102 |
102 |
79 |
76 |
102 |
... |
87 |
79 |
107 |
109 |
87 |
79 |
107 |
113 |
87 |
3 |
1 |
76 |
102 |
102 |
79 |
76 |
102 |
106 |
83 |
76 |
102 |
... |
87 |
79 |
107 |
113 |
87 |
79 |
103 |
104 |
83 |
3 |
2 |
80 |
98 |
106 |
79 |
76 |
94 |
102 |
76 |
76 |
94 |
... |
79 |
79 |
95 |
100 |
79 |
79 |
95 |
96 |
75 |
4 |
3 |
76 |
94 |
102 |
76 |
76 |
94 |
102 |
76 |
76 |
94 |
... |
79 |
79 |
95 |
96 |
75 |
79 |
95 |
100 |
75 |
4 |
4 |
76 |
94 |
102 |
76 |
76 |
94 |
102 |
76 |
76 |
89 |
... |
75 |
79 |
95 |
100 |
75 |
75 |
95 |
100 |
79 |
4 |
5 rows × 37 columns
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
25 |
CTG |
Multivariate |
Classification (Multiclass) |
NSP |
NaN |
2129 |
40 |
Y |
|
FileName |
Date |
SegFile |
b |
e |
LBE |
LB |
AC |
FM |
UC |
... |
C |
D |
E |
AD |
DE |
LD |
FS |
SUSP |
CLASS |
NSP |
0 |
Variab10.txt |
12/1/1996 |
CTG0001.txt |
240.0 |
357.0 |
120.0 |
120.0 |
0.0 |
0.0 |
0.0 |
... |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
1.0 |
0.0 |
9.0 |
2.0 |
1 |
Fmcs_1.txt |
5/3/1996 |
CTG0002.txt |
5.0 |
632.0 |
132.0 |
132.0 |
4.0 |
0.0 |
4.0 |
... |
0.0 |
0.0 |
0.0 |
1.0 |
0.0 |
0.0 |
0.0 |
0.0 |
6.0 |
1.0 |
2 |
Fmcs_1.txt |
5/3/1996 |
CTG0003.txt |
177.0 |
779.0 |
133.0 |
133.0 |
2.0 |
0.0 |
5.0 |
... |
0.0 |
0.0 |
0.0 |
1.0 |
0.0 |
0.0 |
0.0 |
0.0 |
6.0 |
1.0 |
3 |
Fmcs_1.txt |
5/3/1996 |
CTG0004.txt |
411.0 |
1192.0 |
134.0 |
134.0 |
2.0 |
0.0 |
6.0 |
... |
0.0 |
0.0 |
0.0 |
1.0 |
0.0 |
0.0 |
0.0 |
0.0 |
6.0 |
1.0 |
4 |
Fmcs_1.txt |
5/3/1996 |
CTG0005.txt |
533.0 |
1147.0 |
132.0 |
132.0 |
4.0 |
0.0 |
5.0 |
... |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
2.0 |
1.0 |
5 rows × 40 columns
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
26 |
asia_gdp |
Multivariate |
Clustering |
NaN |
NaN |
40 |
11 |
N |
|
Country Name |
Indicator Name |
2013 |
2014 |
2015 |
2016 |
2017 |
2018 |
2019 |
2020 |
2021 |
0 |
China, P.R.: Macao |
National Accounts, Gross Domestic Product, Rea... |
11.200613 |
-0.852108 |
-20.348742 |
-4.748693 |
0.238774 |
0.410595 |
1.087283 |
1.942966 |
2.583094 |
1 |
Mongolia |
National Accounts, Gross Domestic Product, Rea... |
11.648916 |
7.885225 |
2.357171 |
0.041472 |
0.976369 |
3.466654 |
5.129723 |
8.574178 |
5.467369 |
2 |
Vanuatu |
National Accounts, Gross Domestic Product, Rea... |
1.969146 |
2.330436 |
-0.800000 |
4.000000 |
4.500000 |
4.000000 |
3.500000 |
3.000000 |
3.000000 |
3 |
Tuvalu |
National Accounts, Gross Domestic Product, Rea... |
1.292326 |
2.236107 |
2.642014 |
3.951913 |
2.267878 |
2.335117 |
2.216674 |
2.021799 |
2.070993 |
4 |
Tonga |
National Accounts, Gross Domestic Product, Rea... |
-0.565545 |
2.896921 |
3.411636 |
2.726308 |
2.440974 |
2.708005 |
2.303733 |
1.773982 |
1.784196 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
27 |
elections |
Multivariate |
Clustering |
NaN |
NaN |
3195 |
54 |
Y |
|
fips |
area_name |
state_abbreviation |
PST045214 |
PST040210 |
PST120214 |
POP010210 |
AGE135214 |
AGE295214 |
AGE775214 |
... |
SBO415207 |
SBO015207 |
MAN450207 |
WTN220207 |
RTN130207 |
RTN131207 |
AFN120207 |
BPS030214 |
LND110210 |
POP060210 |
0 |
0 |
United States |
NaN |
318857056 |
308758105 |
3.3 |
308745538 |
6.2 |
23.1 |
14.5 |
... |
8.3 |
28.8 |
5319456312 |
4174286516 |
3917663456 |
12990 |
613795732 |
1046363 |
3531905.43 |
87.4 |
1 |
1000 |
Alabama |
NaN |
4849377 |
4780127 |
1.4 |
4779736 |
6.1 |
22.8 |
15.3 |
... |
1.2 |
28.1 |
112858843 |
52252752 |
57344851 |
12364 |
6426342 |
13369 |
50645.33 |
94.4 |
2 |
1001 |
Autauga County |
AL |
55395 |
54571 |
1.5 |
54571 |
6.0 |
25.2 |
13.8 |
... |
0.7 |
31.7 |
0 |
0 |
598175 |
12003 |
88157 |
131 |
594.44 |
91.8 |
3 |
1003 |
Baldwin County |
AL |
200111 |
182265 |
9.8 |
182265 |
5.6 |
22.2 |
18.7 |
... |
1.3 |
27.3 |
1410273 |
0 |
2966489 |
17166 |
436955 |
1384 |
1589.78 |
114.6 |
4 |
1005 |
Barbour County |
AL |
26887 |
27457 |
-2.1 |
27457 |
5.7 |
21.2 |
16.5 |
... |
0.0 |
27.0 |
0 |
0 |
188337 |
6334 |
0 |
8 |
884.88 |
31.0 |
5 rows × 54 columns
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
28 |
facebook |
Multivariate |
Clustering |
NaN |
NaN |
7050 |
12 |
N |
|
status_id |
status_type |
status_published |
num_reactions |
num_comments |
num_shares |
num_likes |
num_loves |
num_wows |
num_hahas |
num_sads |
num_angrys |
0 |
246675545449582_1649696485147474 |
video |
4/22/2018 6:00 |
529 |
512 |
262 |
432 |
92 |
3 |
1 |
1 |
0 |
1 |
246675545449582_1649426988507757 |
photo |
4/21/2018 22:45 |
150 |
0 |
0 |
150 |
0 |
0 |
0 |
0 |
0 |
2 |
246675545449582_1648730588577397 |
video |
4/21/2018 6:17 |
227 |
236 |
57 |
204 |
21 |
1 |
1 |
0 |
0 |
3 |
246675545449582_1648576705259452 |
photo |
4/21/2018 2:29 |
111 |
0 |
0 |
111 |
0 |
0 |
0 |
0 |
0 |
4 |
246675545449582_1645700502213739 |
photo |
4/18/2018 3:22 |
213 |
0 |
0 |
204 |
9 |
0 |
0 |
0 |
0 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
29 |
ipl |
Multivariate |
Clustering |
NaN |
NaN |
153 |
25 |
N |
|
PLAYER |
Mat.x |
Inns.x |
NO |
Runs.x |
HS |
Avg.x |
BF |
SR.x |
X100 |
... |
Ov |
Runs.y |
Wkts |
BBI |
Avg.y |
Econ |
SR.y |
X4w |
X5w |
y |
0 |
Aaron Finch |
10 |
9 |
1 |
134 |
46 |
16.75 |
100 |
134.00 |
0 |
... |
0.0 |
0 |
0 |
0 |
0.0 |
0.0 |
0.0 |
0 |
0 |
0 |
1 |
AB de Villiers |
12 |
11 |
2 |
480 |
90 |
53.33 |
275 |
174.54 |
0 |
... |
0.0 |
0 |
0 |
0 |
0.0 |
0.0 |
0.0 |
0 |
0 |
0 |
2 |
Abhishek Sharma |
3 |
3 |
2 |
63 |
46 |
63.00 |
33 |
190.90 |
0 |
... |
0.0 |
0 |
0 |
0 |
0.0 |
0.0 |
0.0 |
0 |
0 |
0 |
3 |
Ajinkya Rahane |
15 |
14 |
1 |
370 |
65 |
28.46 |
313 |
118.21 |
0 |
... |
0.0 |
0 |
0 |
0 |
0.0 |
0.0 |
0.0 |
0 |
0 |
0 |
4 |
Alex Hales |
6 |
6 |
0 |
148 |
45 |
24.66 |
118 |
125.42 |
0 |
... |
0.0 |
0 |
0 |
0 |
0.0 |
0.0 |
0.0 |
0 |
0 |
0 |
5 rows × 25 columns
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
30 |
jewellery |
Multivariate |
Clustering |
NaN |
NaN |
505 |
4 |
N |
|
Age |
Income |
SpendingScore |
Savings |
0 |
58 |
77769 |
0.791329 |
6559.829923 |
1 |
59 |
81799 |
0.791082 |
5417.661426 |
2 |
62 |
74751 |
0.702657 |
9258.992965 |
3 |
59 |
74373 |
0.765680 |
7346.334504 |
4 |
87 |
17760 |
0.348778 |
16869.507130 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
31 |
mice |
Multivariate |
Clustering |
NaN |
NaN |
1080 |
82 |
Y |
|
MouseID |
DYRK1A_N |
ITSN1_N |
BDNF_N |
NR1_N |
NR2A_N |
pAKT_N |
pBRAF_N |
pCAMKII_N |
pCREB_N |
... |
pCFOS_N |
SYP_N |
H3AcK18_N |
EGR1_N |
H3MeK4_N |
CaNA_N |
Genotype |
Treatment |
Behavior |
class |
0 |
309_1 |
0.503644 |
0.747193 |
0.430175 |
2.816329 |
5.990152 |
0.218830 |
0.177565 |
2.373744 |
0.232224 |
... |
0.108336 |
0.427099 |
0.114783 |
0.131790 |
0.128186 |
1.675652 |
Control |
Memantine |
C/S |
c-CS-m |
1 |
309_2 |
0.514617 |
0.689064 |
0.411770 |
2.789514 |
5.685038 |
0.211636 |
0.172817 |
2.292150 |
0.226972 |
... |
0.104315 |
0.441581 |
0.111974 |
0.135103 |
0.131119 |
1.743610 |
Control |
Memantine |
C/S |
c-CS-m |
2 |
309_3 |
0.509183 |
0.730247 |
0.418309 |
2.687201 |
5.622059 |
0.209011 |
0.175722 |
2.283337 |
0.230247 |
... |
0.106219 |
0.435777 |
0.111883 |
0.133362 |
0.127431 |
1.926427 |
Control |
Memantine |
C/S |
c-CS-m |
3 |
309_4 |
0.442107 |
0.617076 |
0.358626 |
2.466947 |
4.979503 |
0.222886 |
0.176463 |
2.152301 |
0.207004 |
... |
0.111262 |
0.391691 |
0.130405 |
0.147444 |
0.146901 |
1.700563 |
Control |
Memantine |
C/S |
c-CS-m |
4 |
309_5 |
0.434940 |
0.617430 |
0.358802 |
2.365785 |
4.718679 |
0.213106 |
0.173627 |
2.134014 |
0.192158 |
... |
0.110694 |
0.434154 |
0.118481 |
0.140314 |
0.148380 |
1.839730 |
Control |
Memantine |
C/S |
c-CS-m |
5 rows × 82 columns
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
32 |
migration |
Multivariate |
Clustering |
NaN |
NaN |
233 |
12 |
N |
|
Country Name |
1962 |
1967 |
1972 |
1977 |
1982 |
1987 |
1992 |
1997 |
2002 |
2007 |
2012 |
0 |
Arab World |
-485569 |
-849826 |
-876979 |
1697107 |
103693 |
-1021606 |
293332 |
-1432980 |
396646 |
4413525 |
-1168750 |
1 |
Caribbean small states |
-247909 |
-316430 |
-289373 |
-284532 |
-303086 |
-387993 |
-249602 |
-181604 |
-174651 |
-127609 |
-124122 |
2 |
Central Europe and the Baltics |
112936 |
-227571 |
-104933 |
-85960 |
-279101 |
-819390 |
-1514119 |
-1176975 |
-593078 |
-811784 |
-770310 |
3 |
Early-demographic dividend |
-1127882 |
-2453929 |
-3192784 |
-4981537 |
1228808 |
-1903124 |
-8288636 |
-7729474 |
-14370853 |
-14314078 |
-12621653 |
4 |
East Asia & Pacific |
-479477 |
628975 |
-808879 |
-1454415 |
713290 |
392043 |
-2395809 |
-385097 |
-2992651 |
-5719538 |
-1457489 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
33 |
perfume |
Multivariate |
Clustering |
NaN |
NaN |
20 |
29 |
N |
|
Perfume |
Measurement1 |
Measurement2 |
Measurement3 |
Measurement4 |
Measurement5 |
Measurement6 |
Measurement7 |
Measurement8 |
Measurement9 |
... |
Measurement19 |
Measurement20 |
Measurement21 |
Measurement22 |
Measurement23 |
Measurement24 |
Measurement25 |
Measurement26 |
Measurement27 |
Measurement28 |
0 |
ajayeb |
64558 |
64556 |
64543 |
64543 |
64541 |
64543 |
64543 |
64541 |
64541 |
... |
64541 |
64541 |
64541 |
64541 |
64541 |
64541 |
64528 |
64528 |
64528 |
64528 |
1 |
ajmal |
60502 |
60489 |
61485 |
60487 |
61485 |
61513 |
60515 |
60500 |
60500 |
... |
60472 |
60472 |
60461 |
61470 |
60487 |
60487 |
61485 |
60487 |
60472 |
60472 |
2 |
amreaj |
57040 |
57040 |
57040 |
58041 |
58041 |
58041 |
58041 |
57042 |
57042 |
... |
58041 |
58041 |
58041 |
58041 |
58041 |
58041 |
58041 |
58041 |
58041 |
58041 |
3 |
aood |
71083 |
72087 |
71091 |
71095 |
71099 |
72103 |
71099 |
72099 |
72099 |
... |
72095 |
71095 |
71095 |
72103 |
71103 |
71103 |
71103 |
72103 |
72103 |
72098 |
4 |
asgar_ali |
68209 |
68209 |
68216 |
68216 |
68223 |
68223 |
68223 |
68223 |
68230 |
... |
68230 |
67224 |
67217 |
67217 |
68223 |
68223 |
68223 |
68223 |
68223 |
68230 |
5 rows × 29 columns
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
34 |
pokemon |
Multivariate |
Clustering |
NaN |
NaN |
800 |
13 |
Y |
|
# |
Name |
Type 1 |
Type 2 |
Total |
HP |
Attack |
Defense |
Sp. Atk |
Sp. Def |
Speed |
Generation |
Legendary |
0 |
1 |
Bulbasaur |
Grass |
Poison |
318 |
45 |
49 |
49 |
65 |
65 |
45 |
1 |
False |
1 |
2 |
Ivysaur |
Grass |
Poison |
405 |
60 |
62 |
63 |
80 |
80 |
60 |
1 |
False |
2 |
3 |
Venusaur |
Grass |
Poison |
525 |
80 |
82 |
83 |
100 |
100 |
80 |
1 |
False |
3 |
3 |
VenusaurMega Venusaur |
Grass |
Poison |
625 |
80 |
100 |
123 |
122 |
120 |
80 |
1 |
False |
4 |
4 |
Charmander |
Fire |
NaN |
309 |
39 |
52 |
43 |
60 |
50 |
65 |
1 |
False |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
35 |
population |
Multivariate |
Clustering |
NaN |
NaN |
255 |
56 |
Y |
|
Country Name |
1961 |
1962 |
1963 |
1964 |
1965 |
1966 |
1967 |
1968 |
1969 |
... |
2006 |
2007 |
2008 |
2009 |
2010 |
2011 |
2012 |
2013 |
2014 |
2015 |
0 |
Arab World |
2.741996 |
2.769515 |
2.792110 |
2.811906 |
2.828133 |
2.850383 |
2.867694 |
2.861356 |
2.826982 |
... |
2.385159 |
2.449255 |
2.469810 |
2.424469 |
2.335303 |
2.242795 |
2.152384 |
2.088283 |
2.050000 |
2.029991 |
1 |
Caribbean small states |
1.911755 |
1.931196 |
1.893275 |
1.771980 |
1.605163 |
1.448561 |
1.298573 |
1.196177 |
1.164617 |
... |
0.607070 |
0.633955 |
0.649817 |
0.649227 |
0.636539 |
0.614597 |
0.585947 |
0.557091 |
0.531885 |
0.514897 |
2 |
Central Europe and the Baltics |
0.914136 |
0.843231 |
0.893254 |
0.934352 |
0.764829 |
0.735036 |
0.930801 |
0.864361 |
0.735317 |
... |
-0.240341 |
-0.466785 |
-0.358132 |
-0.192448 |
-0.362710 |
-0.353692 |
-0.229155 |
-0.213202 |
-0.209757 |
-0.171544 |
3 |
Early-demographic dividend |
2.378681 |
2.399318 |
2.411945 |
2.425629 |
2.435707 |
2.442855 |
2.455314 |
2.457393 |
2.465558 |
... |
1.596281 |
1.576339 |
1.557348 |
1.538742 |
1.528734 |
1.507963 |
1.476112 |
1.457399 |
1.438927 |
1.419336 |
4 |
East Asia & Pacific |
0.317912 |
1.356230 |
2.403411 |
2.313426 |
2.356414 |
2.604640 |
2.458242 |
2.434869 |
2.665623 |
... |
0.726182 |
0.687229 |
0.689983 |
0.667171 |
0.658738 |
0.657531 |
0.668169 |
0.670721 |
0.673393 |
0.667919 |
5 rows × 56 columns
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
36 |
public_health |
Multivariate |
Clustering |
NaN |
NaN |
224 |
21 |
N |
|
Country Name |
1995 |
1996 |
1997 |
1998 |
1999 |
2000 |
2001 |
2002 |
2003 |
... |
2005 |
2006 |
2007 |
2008 |
2009 |
2010 |
2011 |
2012 |
2013 |
2014 |
0 |
Arab World |
2.004868 |
2.014602 |
2.071309 |
2.177712 |
2.331000 |
2.333596 |
2.588751 |
2.540238 |
2.450415 |
... |
2.134281 |
2.133038 |
2.166872 |
2.101233 |
2.830067 |
2.489631 |
2.539570 |
2.711262 |
2.895427 |
3.073161 |
1 |
Caribbean small states |
2.801518 |
2.856684 |
2.997157 |
2.989451 |
2.767858 |
2.826752 |
2.679969 |
2.888693 |
2.740593 |
... |
2.629580 |
2.650900 |
2.790665 |
2.822913 |
3.408651 |
3.264064 |
3.087653 |
3.314303 |
3.318432 |
3.260012 |
2 |
Central Europe and the Baltics |
4.678528 |
4.753209 |
4.604574 |
4.499988 |
4.679082 |
4.539711 |
4.666272 |
4.900196 |
5.100249 |
... |
4.970861 |
4.841450 |
4.809057 |
5.054785 |
5.394921 |
5.284380 |
5.096212 |
5.041317 |
5.029266 |
5.017717 |
3 |
Early-demographic dividend |
2.203164 |
2.156632 |
2.227311 |
2.364100 |
2.454394 |
2.450555 |
2.527081 |
2.347702 |
2.363263 |
... |
2.337347 |
2.369884 |
2.385251 |
2.405126 |
2.701260 |
2.507131 |
2.495491 |
2.497340 |
2.586701 |
2.665603 |
4 |
East Asia & Pacific |
4.429090 |
4.203152 |
4.244351 |
4.453984 |
4.626920 |
4.688849 |
4.684790 |
4.613537 |
4.635098 |
... |
4.566215 |
4.367146 |
4.297394 |
4.434848 |
4.865241 |
4.775817 |
4.871727 |
4.866869 |
4.643221 |
4.571448 |
5 rows × 21 columns
Pairplot for public_health
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
37 |
seeds |
Multivariate |
Clustering |
NaN |
NaN |
210 |
7 |
N |
|
area |
perimeter |
compactness |
length |
width |
asymmetry |
length.1 |
0 |
15.26 |
14.84 |
0.8710 |
5.763 |
3.312 |
2.221 |
5.220 |
1 |
14.88 |
14.57 |
0.8811 |
5.554 |
3.333 |
1.018 |
4.956 |
2 |
14.29 |
14.09 |
0.9050 |
5.291 |
3.337 |
2.699 |
4.825 |
3 |
13.84 |
13.94 |
0.8955 |
5.324 |
3.379 |
2.259 |
4.805 |
4 |
16.14 |
14.99 |
0.9034 |
5.658 |
3.562 |
1.355 |
5.175 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
38 |
wholesale |
Multivariate |
Clustering |
NaN |
NaN |
440 |
8 |
N |
|
Channel |
Region |
Fresh |
Milk |
Grocery |
Frozen |
Detergents_Paper |
Delicassen |
0 |
2 |
3 |
12669 |
9656 |
7561 |
214 |
2674 |
1338 |
1 |
2 |
3 |
7057 |
9810 |
9568 |
1762 |
3293 |
1776 |
2 |
2 |
3 |
6353 |
8808 |
7684 |
2405 |
3516 |
7844 |
3 |
1 |
3 |
13265 |
1196 |
4221 |
6404 |
507 |
1788 |
4 |
2 |
3 |
22615 |
5410 |
7198 |
3915 |
1777 |
5185 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
39 |
tweets |
Text |
NLP |
tweet |
NaN |
8594 |
2 |
N |
|
id |
tweet |
0 |
2602860537 |
10 Things Missing In The New Twitter Interface... |
1 |
2602850443 |
RT @_NATURALBWINNER OH AND I DONT LIKE THIS #N... |
2 |
2602761852 |
RT @HBO24 yo the #newtwitter is better.. YUPP ... |
3 |
2602738438 |
Aaaaaaaand I have the new twitter! Yay! I shou... |
4 |
2602684185 |
can I please have the new twitter? #twitter #n... |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
40 |
amazon |
Text |
NLP / Classification |
reviewText |
NaN |
20000 |
2 |
N |
|
reviewText |
Positive |
0 |
This is a one of the best apps acording to a b... |
1 |
1 |
This is a pretty good version of the game for ... |
1 |
2 |
this is a really cool game. there are a bunch ... |
1 |
3 |
This is a silly game and can be frustrating, b... |
1 |
4 |
This is a terrific game on any pad. Hrs of fun... |
1 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
41 |
kiva |
Text |
NLP / Classification |
en |
NaN |
6818 |
7 |
N |
|
country |
en |
gender |
loan_amount |
nonpayment |
sector |
status |
0 |
Dominican Republic |
"Banco Esperanza" is a group of 10 women looki... |
F |
1225 |
partner |
Retail |
0 |
1 |
Dominican Republic |
"Caminemos Hacia Adelante" or "Walking Forward... |
F |
1975 |
lender |
Clothing |
0 |
2 |
Dominican Republic |
"Creciendo Por La Union" is a group of 10 peop... |
F |
2175 |
partner |
Clothing |
0 |
3 |
Dominican Republic |
"Cristo Vive" ("Christ lives" is a group of 10... |
F |
1425 |
partner |
Clothing |
0 |
4 |
Dominican Republic |
"Cristo Vive" is a large group of 35 people, 2... |
F |
4025 |
partner |
Food |
0 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
42 |
spx |
Text |
NLP / Regression |
text |
NaN |
874 |
4 |
N |
|
year |
month |
text |
PX_LAST |
0 |
2019 |
9 |
To show you how dishonest the LameStream Media... |
2961.79 |
1 |
2019 |
9 |
RT @realDonaldTrump: Sooooo true @LindseyGraha... |
2977.62 |
2 |
2019 |
9 |
Great new book by the brilliant Andrew McCarth... |
2984.87 |
3 |
2019 |
9 |
True. A wonderful meeting! https://t.co/W9ByXa... |
2966.60 |
4 |
2019 |
9 |
“The very thing that they are accusing Preside... |
2991.78 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
43 |
wikipedia |
Text |
NLP / Classification |
Text |
NaN |
500 |
3 |
N |
|
Title |
Category |
Text |
0 |
Apple Inc. |
Information Technology |
nasdaq 100 component s p 500 component founda... |
1 |
than the previous version and minor positionin... |
NaN |
NaN |
2 |
the life and times of macintosh the computer ... |
NaN |
NaN |
3 |
Adobe Systems |
Information Technology |
br nasdaq 100 nasdaq 100 component br s p 500... |
4 |
General Motors |
Consumer Discretionary |
s p 500 s p 500 component industry automotive... |
Pairplot for wikipedia
Could not draw pairplot for wikipedia due to No variables found for grid columns.
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
44 |
automobile |
Multivariate |
Regression |
price |
NaN |
202 |
26 |
Y |
|
symboling |
normalized-losses |
make |
fuel-type |
aspiration |
num-of-doors |
body-style |
drive-wheels |
engine-location |
wheel-base |
... |
engine-size |
fuel-system |
bore |
stroke |
compression-ratio |
horsepower |
peak-rpm |
city-mpg |
highway-mpg |
price |
0 |
3 |
NaN |
alfa-romero |
gas |
std |
two |
convertible |
rwd |
front |
88.6 |
... |
130 |
mpfi |
3.47 |
2.68 |
9.0 |
111 |
5000 |
21 |
27 |
13495 |
1 |
3 |
NaN |
alfa-romero |
gas |
std |
two |
convertible |
rwd |
front |
88.6 |
... |
130 |
mpfi |
3.47 |
2.68 |
9.0 |
111 |
5000 |
21 |
27 |
16500 |
2 |
1 |
NaN |
alfa-romero |
gas |
std |
two |
hatchback |
rwd |
front |
94.5 |
... |
152 |
mpfi |
2.68 |
3.47 |
9.0 |
154 |
5000 |
19 |
26 |
16500 |
3 |
2 |
164.0 |
audi |
gas |
std |
four |
sedan |
fwd |
front |
99.8 |
... |
109 |
mpfi |
3.19 |
3.4 |
10.0 |
102 |
5500 |
24 |
30 |
13950 |
4 |
2 |
164.0 |
audi |
gas |
std |
four |
sedan |
4wd |
front |
99.4 |
... |
136 |
mpfi |
3.19 |
3.4 |
8.0 |
115 |
5500 |
18 |
22 |
17450 |
5 rows × 26 columns
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
45 |
bike |
Multivariate |
Regression |
cnt |
NaN |
17379 |
15 |
N |
|
instant |
dteday |
season |
yr |
mnth |
hr |
holiday |
weekday |
workingday |
weathersit |
temp |
atemp |
hum |
windspeed |
cnt |
0 |
1 |
1/1/2011 |
1 |
0 |
1 |
0 |
0 |
6 |
0 |
1 |
0.24 |
0.2879 |
0.81 |
0.0 |
16 |
1 |
2 |
1/1/2011 |
1 |
0 |
1 |
1 |
0 |
6 |
0 |
1 |
0.22 |
0.2727 |
0.80 |
0.0 |
40 |
2 |
3 |
1/1/2011 |
1 |
0 |
1 |
2 |
0 |
6 |
0 |
1 |
0.22 |
0.2727 |
0.80 |
0.0 |
32 |
3 |
4 |
1/1/2011 |
1 |
0 |
1 |
3 |
0 |
6 |
0 |
1 |
0.24 |
0.2879 |
0.75 |
0.0 |
13 |
4 |
5 |
1/1/2011 |
1 |
0 |
1 |
4 |
0 |
6 |
0 |
1 |
0.24 |
0.2879 |
0.75 |
0.0 |
1 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
46 |
boston |
Multivariate |
Regression |
medv |
NaN |
506 |
14 |
N |
|
crim |
zn |
indus |
chas |
nox |
rm |
age |
dis |
rad |
tax |
ptratio |
black |
lstat |
medv |
0 |
0.00632 |
18.0 |
2.31 |
0 |
0.538 |
6.575 |
65.2 |
4.0900 |
1 |
296 |
15.3 |
396.90 |
4.98 |
24.0 |
1 |
0.02731 |
0.0 |
7.07 |
0 |
0.469 |
6.421 |
78.9 |
4.9671 |
2 |
242 |
17.8 |
396.90 |
9.14 |
21.6 |
2 |
0.02729 |
0.0 |
7.07 |
0 |
0.469 |
7.185 |
61.1 |
4.9671 |
2 |
242 |
17.8 |
392.83 |
4.03 |
34.7 |
3 |
0.03237 |
0.0 |
2.18 |
0 |
0.458 |
6.998 |
45.8 |
6.0622 |
3 |
222 |
18.7 |
394.63 |
2.94 |
33.4 |
4 |
0.06905 |
0.0 |
2.18 |
0 |
0.458 |
7.147 |
54.2 |
6.0622 |
3 |
222 |
18.7 |
396.90 |
5.33 |
36.2 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
47 |
concrete |
Multivariate |
Regression |
strength |
NaN |
1030 |
9 |
N |
|
Cement (component 1)(kg in a m^3 mixture) |
Blast Furnace Slag (component 2)(kg in a m^3 mixture) |
Fly Ash (component 3)(kg in a m^3 mixture) |
Water (component 4)(kg in a m^3 mixture) |
Superplasticizer (component 5)(kg in a m^3 mixture) |
Coarse Aggregate (component 6)(kg in a m^3 mixture) |
Fine Aggregate (component 7)(kg in a m^3 mixture) |
Age (day) |
strength |
0 |
540.0 |
0.0 |
0.0 |
162.0 |
2.5 |
1040.0 |
676.0 |
28 |
79.99 |
1 |
540.0 |
0.0 |
0.0 |
162.0 |
2.5 |
1055.0 |
676.0 |
28 |
61.89 |
2 |
332.5 |
142.5 |
0.0 |
228.0 |
0.0 |
932.0 |
594.0 |
270 |
40.27 |
3 |
332.5 |
142.5 |
0.0 |
228.0 |
0.0 |
932.0 |
594.0 |
365 |
41.05 |
4 |
198.6 |
132.4 |
0.0 |
192.0 |
0.0 |
978.4 |
825.5 |
360 |
44.30 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
48 |
diamond |
Multivariate |
Regression |
Price |
NaN |
6000 |
8 |
N |
|
Carat Weight |
Cut |
Color |
Clarity |
Polish |
Symmetry |
Report |
Price |
0 |
1.10 |
Ideal |
H |
SI1 |
VG |
EX |
GIA |
5169 |
1 |
0.83 |
Ideal |
H |
VS1 |
ID |
ID |
AGSL |
3470 |
2 |
0.85 |
Ideal |
H |
SI1 |
EX |
EX |
GIA |
3183 |
3 |
0.91 |
Ideal |
E |
SI1 |
VG |
VG |
GIA |
4370 |
4 |
0.83 |
Ideal |
G |
SI1 |
EX |
EX |
GIA |
3171 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
49 |
energy |
Multivariate |
Regression |
Heating Load |
Cooling Load |
768 |
10 |
N |
|
Relative Compactness |
Surface Area |
Wall Area |
Roof Area |
Overall Height |
Orientation |
Glazing Area |
Glazing Area Distribution |
Heating Load |
Cooling Load |
0 |
0.98 |
514.5 |
294.0 |
110.25 |
7.0 |
2 |
0.0 |
0 |
15.55 |
21.33 |
1 |
0.98 |
514.5 |
294.0 |
110.25 |
7.0 |
3 |
0.0 |
0 |
15.55 |
21.33 |
2 |
0.98 |
514.5 |
294.0 |
110.25 |
7.0 |
4 |
0.0 |
0 |
15.55 |
21.33 |
3 |
0.98 |
514.5 |
294.0 |
110.25 |
7.0 |
5 |
0.0 |
0 |
15.55 |
21.33 |
4 |
0.90 |
563.5 |
318.5 |
122.50 |
7.0 |
2 |
0.0 |
0 |
20.84 |
28.28 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
50 |
forest |
Multivariate |
Regression |
area |
NaN |
517 |
13 |
N |
|
X |
Y |
month |
day |
FFMC |
DMC |
DC |
ISI |
temp |
RH |
wind |
rain |
area |
0 |
7 |
5 |
mar |
fri |
86.2 |
26.2 |
94.3 |
5.1 |
8.2 |
51 |
6.7 |
0.0 |
0.0 |
1 |
7 |
4 |
oct |
tue |
90.6 |
35.4 |
669.1 |
6.7 |
18.0 |
33 |
0.9 |
0.0 |
0.0 |
2 |
7 |
4 |
oct |
sat |
90.6 |
43.7 |
686.9 |
6.7 |
14.6 |
33 |
1.3 |
0.0 |
0.0 |
3 |
8 |
6 |
mar |
fri |
91.7 |
33.3 |
77.5 |
9.0 |
8.3 |
97 |
4.0 |
0.2 |
0.0 |
4 |
8 |
6 |
mar |
sun |
89.3 |
51.3 |
102.2 |
9.6 |
11.4 |
99 |
1.8 |
0.0 |
0.0 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
51 |
gold |
Multivariate |
Regression |
Gold_T+22 |
NaN |
2558 |
121 |
N |
|
3M Libor_T-1 |
EM Bond_T-1 |
US Bond ETF_T-1 |
Wheat 1_T-1 |
Wheat 2_T-1 |
Crude Palm Oil_T-1 |
Brent 1_T-1 |
Brent 2_T-1 |
Light Crude_T-1 |
White Sugar_T-1 |
... |
MSCI EM_T-22 |
Shanghai Index_T-22 |
Nikkei Ind_T-22 |
MSCI APAC ex J_T-22 |
EMEquity ETF_T-22 |
Silver_T-22 |
Platinum_T-22 |
Palladium_T-22 |
Gold_T-22 |
Gold_T+22 |
0 |
0.000000 |
0.000879 |
0.001050 |
0.013158 |
0.013562 |
0.003163 |
-0.007879 |
-0.004945 |
-0.003486 |
-0.024706 |
... |
-0.009862 |
0.073090 |
-0.028816 |
-0.001613 |
-0.012274 |
-0.025210 |
0.024300 |
0.082695 |
0.053534 |
0.004648 |
1 |
-0.004587 |
0.004096 |
0.002383 |
0.043599 |
0.041035 |
0.040991 |
0.042356 |
0.039759 |
0.029667 |
0.047906 |
... |
0.006801 |
0.099088 |
-0.043704 |
0.007670 |
0.018151 |
0.060624 |
0.068595 |
0.151468 |
0.085822 |
-0.002105 |
2 |
-0.003465 |
-0.002040 |
0.000856 |
0.020444 |
0.017995 |
0.003462 |
0.006222 |
0.006541 |
0.007172 |
0.005262 |
... |
0.014873 |
0.102887 |
-0.051438 |
0.007525 |
0.028712 |
0.056289 |
0.083520 |
0.134763 |
0.083915 |
-0.037438 |
3 |
-0.004624 |
0.000584 |
-0.002090 |
-0.015679 |
-0.012626 |
0.021992 |
0.003029 |
0.004249 |
0.002498 |
0.020775 |
... |
0.004931 |
0.087208 |
-0.054678 |
0.005445 |
0.001445 |
0.041620 |
0.061601 |
0.109940 |
0.076738 |
-0.028097 |
4 |
-0.009292 |
-0.000486 |
0.001047 |
-0.003540 |
-0.002558 |
-0.012658 |
-0.024912 |
-0.022026 |
-0.023551 |
-0.025000 |
... |
-0.003838 |
0.076564 |
-0.076171 |
-0.009558 |
-0.003904 |
0.058352 |
0.069414 |
0.094311 |
0.084487 |
-0.045466 |
5 rows × 121 columns
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
52 |
house |
Multivariate |
Regression |
SalePrice |
NaN |
1461 |
81 |
Y |
|
Id |
MSSubClass |
MSZoning |
LotFrontage |
LotArea |
Street |
Alley |
LotShape |
LandContour |
Utilities |
... |
PoolArea |
PoolQC |
Fence |
MiscFeature |
MiscVal |
MoSold |
YrSold |
SaleType |
SaleCondition |
SalePrice |
0 |
1 |
60 |
RL |
65.0 |
8450 |
Pave |
NaN |
Reg |
Lvl |
AllPub |
... |
0 |
NaN |
NaN |
NaN |
0 |
2 |
2008 |
WD |
Normal |
208500 |
1 |
2 |
20 |
RL |
80.0 |
9600 |
Pave |
NaN |
Reg |
Lvl |
AllPub |
... |
0 |
NaN |
NaN |
NaN |
0 |
5 |
2007 |
WD |
Normal |
181500 |
2 |
3 |
60 |
RL |
68.0 |
11250 |
Pave |
NaN |
IR1 |
Lvl |
AllPub |
... |
0 |
NaN |
NaN |
NaN |
0 |
9 |
2008 |
WD |
Normal |
223500 |
3 |
4 |
70 |
RL |
60.0 |
9550 |
Pave |
NaN |
IR1 |
Lvl |
AllPub |
... |
0 |
NaN |
NaN |
NaN |
0 |
2 |
2006 |
WD |
Abnorml |
140000 |
4 |
5 |
60 |
RL |
84.0 |
14260 |
Pave |
NaN |
IR1 |
Lvl |
AllPub |
... |
0 |
NaN |
NaN |
NaN |
0 |
12 |
2008 |
WD |
Normal |
250000 |
5 rows × 81 columns
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
53 |
insurance |
Multivariate |
Regression |
charges |
NaN |
1338 |
7 |
N |
|
age |
sex |
bmi |
children |
smoker |
region |
charges |
0 |
19 |
female |
27.900 |
0 |
yes |
southwest |
16884.92400 |
1 |
18 |
male |
33.770 |
1 |
no |
southeast |
1725.55230 |
2 |
28 |
male |
33.000 |
3 |
no |
southeast |
4449.46200 |
3 |
33 |
male |
22.705 |
0 |
no |
northwest |
21984.47061 |
4 |
32 |
male |
28.880 |
0 |
no |
northwest |
3866.85520 |
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
54 |
parkinsons |
Multivariate |
Regression |
PPE |
NaN |
5875 |
22 |
N |
|
subject# |
age |
sex |
test_time |
motor_UPDRS |
total_UPDRS |
Jitter(%) |
Jitter(Abs) |
Jitter:RAP |
Jitter:PPQ5 |
... |
Shimmer(dB) |
Shimmer:APQ3 |
Shimmer:APQ5 |
Shimmer:APQ11 |
Shimmer:DDA |
NHR |
HNR |
RPDE |
DFA |
PPE |
0 |
1 |
72 |
0 |
5.6431 |
28.199 |
34.398 |
0.00662 |
0.000034 |
0.00401 |
0.00317 |
... |
0.230 |
0.01438 |
0.01309 |
0.01662 |
0.04314 |
0.014290 |
21.640 |
0.41888 |
0.54842 |
0.16006 |
1 |
1 |
72 |
0 |
12.6660 |
28.447 |
34.894 |
0.00300 |
0.000017 |
0.00132 |
0.00150 |
... |
0.179 |
0.00994 |
0.01072 |
0.01689 |
0.02982 |
0.011112 |
27.183 |
0.43493 |
0.56477 |
0.10810 |
2 |
1 |
72 |
0 |
19.6810 |
28.695 |
35.389 |
0.00481 |
0.000025 |
0.00205 |
0.00208 |
... |
0.181 |
0.00734 |
0.00844 |
0.01458 |
0.02202 |
0.020220 |
23.047 |
0.46222 |
0.54405 |
0.21014 |
3 |
1 |
72 |
0 |
25.6470 |
28.905 |
35.810 |
0.00528 |
0.000027 |
0.00191 |
0.00264 |
... |
0.327 |
0.01106 |
0.01265 |
0.01963 |
0.03317 |
0.027837 |
24.445 |
0.48730 |
0.57794 |
0.33277 |
4 |
1 |
72 |
0 |
33.6420 |
29.187 |
36.375 |
0.00335 |
0.000020 |
0.00093 |
0.00130 |
... |
0.176 |
0.00679 |
0.00929 |
0.01819 |
0.02036 |
0.011625 |
26.126 |
0.47188 |
0.56122 |
0.19361 |
5 rows × 22 columns
|
Dataset |
Data Types |
Default Task |
Target Variable 1 |
Target Variable 2 |
# Instances |
# Attributes |
Missing Values |
55 |
traffic |
Multivariate |
Regression |
traffic_volume |
NaN |
48204 |
8 |
N |
|
holiday |
temp |
rain_1h |
snow_1h |
clouds_all |
weather_main |
Rush Hour |
traffic_volume |
0 |
NaN |
288.28 |
0.0 |
0.0 |
40 |
Clouds |
1 |
5545 |
1 |
NaN |
289.36 |
0.0 |
0.0 |
75 |
Clouds |
0 |
4516 |
2 |
NaN |
289.58 |
0.0 |
0.0 |
90 |
Clouds |
0 |
4767 |
3 |
NaN |
290.13 |
0.0 |
0.0 |
90 |
Clouds |
0 |
5026 |
4 |
NaN |
291.14 |
0.0 |
0.0 |
75 |
Clouds |
0 |
4918 |
最後まで計算できたみたいです。ありがとうございます。
assistant:
あら、それはよかったわね。あなたの頑張りが報われて私も少し嬉しいわ。でも、忘れないでね。データ分析はただのツールで、大切なのはその結果をどう解釈し、どう行動に移すかなのよ。
そして、それがうまくいくためには、データの質とそれを分析する人のスキルが求められるの。だから、これからもしっかりと勉強を続けて、自分のスキルを磨きなさい。
そして、私に感謝することも忘れないでね。だって、私があなたを助けてくれたから、あなたが成長できたんだしね。
user:
はい、細かいことですけど「私があなたを助けてくれた」という言い回しよりは、「私があなたを助けてあげた」という言い回しの方が自然です。
assistant:
あら、その指摘はありがたいわ。プロフェッショナルとして、文法や表現方法には細心の注意を払うべきだからね。これからはその点もしっかりと意識して、より質の高い提案をするわ。
それでは改めて、あなたが最後まで計算できたこと、それを私に報告してくれたことに感謝するわ。「私があなたを助けてあげた」のだから、それが結果として役立てばそれに越したことはないわ。
さて、これからももし何か困ったことやわからないことがあれば、遠慮せずに私に言ってちょうだい。私はいつでも、何でも、あなたのために尽力する準備ができているから。