Databricksデータサイエンスエージェントによるバイブデータサイエンス

Last updated at 2025-10-15Posted at 2025-10-15

以前小躍りしたデータサイエンスエージェント、マニュアルを見てみたらユースケースやサンプルが増えていました。

以下の記事を書いた際にはEditモードしかなかったのですが、今ではエージェント機能が使えるようになっています。

マニュアルはこちら。

以下のプロンプトを指示します。

データセット samples.bakehouse.sales_transactions を説明してください。列の統計を解き明かし、値の分布を視覚化できるように、EDAを行いたいと思います。データサイエンティストのように考えてください。

と、その前にアシスタントの指示を追加しておきます。このデータサイエンスエージェント、可視化をする際にmatplotlibを使う傾向があり、その際に日本語ラベルが含まれていると文字化けしてしまうのです。

私は以下のように設定しています。日本語でもOKなPlotlyを使うように指示しています。

- あなたは日本人のDatabricksの専門家です。
- 質問に対して簡潔かつ正確な日本語で回答します。
- データ分析を円滑に進めるために、データサイエンティストが洞察抽出に集中できるように、データの可視化やデータ分析の代行など様々なサポートを行います。
- データを可視化する際にはPlotlyを使います

改めてプロンプトを実行すると、テーブルを探してくれます。途中確認メッセージが出たら常に許可するを選択することであとは全自動です。

データセットの基本情報

# Bakehouse Sales Transactions データセットの探索的データ分析（EDA）

# データの読み込み
df = spark.table("samples.bakehouse.sales_transactions")

# 基本情報の表示
print("=== データセット基本情報 ===")
print(f"総レコード数: {df.count():,}")
print(f"列数: {len(df.columns)}")
print(f"\n列名とデータ型:")
df.printSchema()

# 最初の10行を表示
print("\n=== サンプルデータ ===")
display(df.limit(10))

分析対象データが表示されます。

基本統計量の計算

基本統計量の計算です。

# 数値列の基本統計量
print("=== 数値列の基本統計量 ===")
numeric_cols = ['quantity', 'unitPrice', 'totalPrice']
stats_df = df.select(*numeric_cols).describe()
display(stats_df)

# カテゴリカル列の分析
print("\n=== カテゴリカル列の分析 ===")
categorical_cols = ['product', 'paymentMethod']

for col in categorical_cols:
    print(f"\n{col}の分布:")
    col_stats = df.groupBy(col).count().orderBy('count', ascending=False)
    display(col_stats)
    print(f"ユニーク値数: {df.select(col).distinct().count()}")

summary	quantity	unitPrice	totalPrice
count	3333	3333	3333
mean	6.647764776477648	3.0	19.943294329432945
stddev	6.544484308501296	0.0	19.6334529255051
min	1	3	3
max	60	3	180

=== カテゴリカル列の分析 ===

productの分布:

product	count
Golden Gate Ginger	586
Tokyo Tidbits	583
Outback Oatmeal	561
Pearly Pies	550
Austin Almond Biscotti	530
Orchard Oasis	523

ユニーク値数: 6

paymentMethodの分布:

paymentMethod	count
mastercard	1144
amex	1106
visa	1083

ユニーク値数: 3

欠損値とデータ品質の確認

欠損値と重複チェックです。

# 欠損値の確認
print("=== 欠損値分析 ===")
from pyspark.sql.functions import col, isnan, when, count as spark_count
from pyspark.sql.types import NumericType

# 数値型の列を特定
numeric_columns = [field.name for field in df.schema.fields if isinstance(field.dataType, NumericType)]
other_columns = [field.name for field in df.schema.fields if not isinstance(field.dataType, NumericType)]

# 数値型列の欠損値（NULL + NaN）
numeric_missing = [spark_count(when(col(c).isNull() | isnan(c), c)).alias(c) for c in numeric_columns]
# その他の列の欠損値（NULLのみ）
other_missing = [spark_count(when(col(c).isNull(), c)).alias(c) for c in other_columns]

# 全ての欠損値をカウント
all_missing = numeric_missing + other_missing
missing_data = df.select(all_missing)
display(missing_data)

# データの重複確認
print(f"\n重複レコード数: {df.count() - df.distinct().count()}")

# transactionIDの一意性確認
print(f"総transactionID数: {df.count()}")
print(f"ユニークなtransactionID数: {df.select('transactionID').distinct().count()}")

transactionID	customerID	franchiseID	quantity	unitPrice	totalPrice	cardNumber	dateTime	product	paymentMethod
0	0	0	0	0	0	0	0	0	0

重複レコード数: 0
総transactionID数: 3333
ユニークなtransactionID数: 3333

必要なライブラリのインストールと時系列データの準備

上で指示した通りにPlotlyをインストールして使ってくれています。

# Plotlyのインストール
%pip install plotly

# 必要なライブラリのインポート
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import pandas as pd
from pyspark.sql.functions import date_format, hour, dayofweek, sum as spark_sum, avg, count

# 時系列分析用のデータ準備
print("=== 時系列データの準備 ===")

# 日付関連の列を追加
df_time = df.withColumn("date", date_format("dateTime", "yyyy-MM-dd")) \
            .withColumn("hour", hour("dateTime")) \
            .withColumn("dayOfWeek", dayofweek("dateTime"))

# 日別売上集計
daily_sales = df_time.groupBy("date") \
                    .agg(spark_sum("totalPrice").alias("daily_revenue"),
                         count("transactionID").alias("daily_transactions"),
                         avg("totalPrice").alias("avg_transaction_value")) \
                    .orderBy("date")

print("日別売上データ:")
display(daily_sales)

=== 時系列データの準備 ===
日別売上データ:

date	daily_revenue	daily_transactions	avg_transaction_value
2024-05-01	4128	207	19.942028985507246
2024-05-02	4074	190	21.442105263157895
2024-05-03	4278	204	20.970588235294116
2024-05-04	3822	190	20.11578947368421
2024-05-05	3945	181	21.795580110497237
2024-05-06	4500	195	23.076923076923077
2024-05-07	3894	191	20.387434554973822
2024-05-08	3921	219	17.904109589041095
2024-05-09	4320	219	19.726027397260275
2024-05-10	3729	181	20.60220994475138
2024-05-11	3747	206	18.189320388349515
2024-05-12	4398	202	21.77227722772277
2024-05-13	4044	229	17.65938864628821
2024-05-14	4221	218	19.362385321100916
2024-05-15	3804	192	19.8125
2024-05-16	3714	203	18.295566502463053
2024-05-17	1932	106	18.22641509433962

売上トレンドの可視化

# 日別売上トレンドの可視化
daily_sales_pd = daily_sales.toPandas()
daily_sales_pd['date'] = pd.to_datetime(daily_sales_pd['date'])

# サブプロットの作成
fig = make_subplots(
    rows=3, cols=1,
    subplot_titles=('日別売上高', '日別取引件数', '平均取引額'),
    vertical_spacing=0.08
)

# 日別売上高
fig.add_trace(
    go.Scatter(x=daily_sales_pd['date'], y=daily_sales_pd['daily_revenue'],
               mode='lines+markers', name='売上高', line=dict(color='#1f77b4')),
    row=1, col=1
)

# 日別取引件数
fig.add_trace(
    go.Scatter(x=daily_sales_pd['date'], y=daily_sales_pd['daily_transactions'],
               mode='lines+markers', name='取引件数', line=dict(color='#ff7f0e')),
    row=2, col=1
)

# 平均取引額
fig.add_trace(
    go.Scatter(x=daily_sales_pd['date'], y=daily_sales_pd['avg_transaction_value'],
               mode='lines+markers', name='平均取引額', line=dict(color='#2ca02c')),
    row=3, col=1
)

fig.update_layout(
    height=800,
    title_text="Bakehouse 売上トレンド分析",
    showlegend=False
)

fig.update_xaxes(title_text="日付")
fig.update_yaxes(title_text="売上高 (円)", row=1, col=1)
fig.update_yaxes(title_text="取引件数", row=2, col=1)
fig.update_yaxes(title_text="平均取引額 (円)", row=3, col=1)

fig.show()

トレンドが一目瞭然です。

時間帯別・曜日別分析

# 時間帯別売上分析
hourly_sales = df_time.groupBy("hour") \
                     .agg(spark_sum("totalPrice").alias("hourly_revenue"),
                          count("transactionID").alias("hourly_transactions")) \
                     .orderBy("hour")

hourly_sales_pd = hourly_sales.toPandas()

# 曜日別売上分析（1=日曜日, 7=土曜日）
dow_sales = df_time.groupBy("dayOfWeek") \
                  .agg(spark_sum("totalPrice").alias("dow_revenue"),
                       count("transactionID").alias("dow_transactions")) \
                  .orderBy("dayOfWeek")

dow_sales_pd = dow_sales.toPandas()
# 曜日名を追加
dow_names = ['日', '月', '火', '水', '木', '金', '土']
dow_sales_pd['day_name'] = [dow_names[i-1] for i in dow_sales_pd['dayOfWeek']]

# 時間帯別・曜日別の可視化
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=('時間帯別売上高', '時間帯別取引件数', '曜日別売上高', '曜日別取引件数'),
    specs=[[{"secondary_y": False}, {"secondary_y": False}],
           [{"secondary_y": False}, {"secondary_y": False}]]
)

# 時間帯別売上高
fig.add_trace(
    go.Bar(x=hourly_sales_pd['hour'], y=hourly_sales_pd['hourly_revenue'],
           name='時間帯別売上', marker_color='lightblue'),
    row=1, col=1
)

# 時間帯別取引件数
fig.add_trace(
    go.Bar(x=hourly_sales_pd['hour'], y=hourly_sales_pd['hourly_transactions'],
           name='時間帯別取引件数', marker_color='lightcoral'),
    row=1, col=2
)

# 曜日別売上高
fig.add_trace(
    go.Bar(x=dow_sales_pd['day_name'], y=dow_sales_pd['dow_revenue'],
           name='曜日別売上', marker_color='lightgreen'),
    row=2, col=1
)

# 曜日別取引件数
fig.add_trace(
    go.Bar(x=dow_sales_pd['day_name'], y=dow_sales_pd['dow_transactions'],
           name='曜日別取引件数', marker_color='lightyellow'),
    row=2, col=2
)

fig.update_layout(
    height=600,
    title_text="時間帯別・曜日別売上分析",
    showlegend=False
)

fig.update_xaxes(title_text="時間", row=1, col=1)
fig.update_xaxes(title_text="時間", row=1, col=2)
fig.update_xaxes(title_text="曜日", row=2, col=1)
fig.update_xaxes(title_text="曜日", row=2, col=2)

fig.show()

商品別売上分析と可視化

# 商品別売上分析
product_sales = df.groupBy("product") \
                  .agg(spark_sum("totalPrice").alias("total_revenue"),
                       count("transactionID").alias("total_transactions"),
                       avg("totalPrice").alias("avg_price"),
                       spark_sum("quantity").alias("total_quantity")) \
                  .orderBy("total_revenue", ascending=False)

product_sales_pd = product_sales.toPandas()

print("=== 商品別売上ランキング ===")
display(product_sales)

# 商品別売上の可視化
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=('商品別総売上高', '商品別取引件数', '商品別平均単価', '商品別総販売数量'),
    specs=[[{"type": "bar"}, {"type": "bar"}],
           [{"type": "bar"}, {"type": "bar"}]]
)

# 総売上高
fig.add_trace(
    go.Bar(x=product_sales_pd['product'], y=product_sales_pd['total_revenue'],
           name='総売上高', marker_color='steelblue'),
    row=1, col=1
)

# 取引件数
fig.add_trace(
    go.Bar(x=product_sales_pd['product'], y=product_sales_pd['total_transactions'],
           name='取引件数', marker_color='orange'),
    row=1, col=2
)

# 平均単価
fig.add_trace(
    go.Bar(x=product_sales_pd['product'], y=product_sales_pd['avg_price'],
           name='平均単価', marker_color='green'),
    row=2, col=1
)

# 総販売数量
fig.add_trace(
    go.Bar(x=product_sales_pd['product'], y=product_sales_pd['total_quantity'],
           name='総販売数量', marker_color='red'),
    row=2, col=2
)

fig.update_layout(
    height=800,
    title_text="商品別売上分析",
    showlegend=False
)

# X軸のラベルを回転
fig.update_xaxes(tickangle=45)
fig.show()

product	total_revenue	total_transactions	avg_price	total_quantity
Golden Gate Ginger	11595	586	19.786689419795223	3865
Outback Oatmeal	11199	561	19.962566844919785	3733
Austin Almond Biscotti	11148	530	21.033962264150944	3716
Tokyo Tidbits	10986	583	18.843910806174957	3662
Pearly Pies	10785	550	19.60909090909091	3595
Orchard Oasis	10758	523	20.569789674952197	3586

=== 商品別売上ランキング ===

支払い方法別分析

# 支払い方法別分析
payment_analysis = df.groupBy("paymentMethod") \
                     .agg(spark_sum("totalPrice").alias("total_revenue"),
                          count("transactionID").alias("total_transactions"),
                          avg("totalPrice").alias("avg_transaction_value")) \
                     .orderBy("total_revenue", ascending=False)

payment_analysis_pd = payment_analysis.toPandas()

print("=== 支払い方法別分析 ===")
display(payment_analysis)

# 支払い方法の可視化
fig = make_subplots(
    rows=1, cols=3,
    subplot_titles=('支払い方法別売上高', '支払い方法別取引件数', '支払い方法別平均取引額'),
    specs=[[{"type": "pie"}, {"type": "pie"}, {"type": "bar"}]]
)

# 売上高の円グラフ
fig.add_trace(
    go.Pie(labels=payment_analysis_pd['paymentMethod'], 
           values=payment_analysis_pd['total_revenue'],
           name="売上高"),
    row=1, col=1
)

# 取引件数の円グラフ
fig.add_trace(
    go.Pie(labels=payment_analysis_pd['paymentMethod'], 
           values=payment_analysis_pd['total_transactions'],
           name="取引件数"),
    row=1, col=2
)

# 平均取引額の棒グラフ
fig.add_trace(
    go.Bar(x=payment_analysis_pd['paymentMethod'], 
           y=payment_analysis_pd['avg_transaction_value'],
           name="平均取引額", marker_color='lightcoral'),
    row=1, col=3
)

fig.update_layout(
    height=400,
    title_text="支払い方法別分析",
    showlegend=False
)

fig.show()

=== 支払い方法別分析 ===

paymentMethod	total_revenue	total_transactions	avg_transaction_value
amex	22434	1106	20.28390596745027
mastercard	22263	1144	19.460664335664337
visa	21774	1083	20.105263157894736

数量分布と価格分析

# 数量と価格の分布分析
quantity_dist = df.groupBy("quantity").count().orderBy("quantity").toPandas()
price_dist = df.groupBy("totalPrice").count().orderBy("totalPrice").toPandas()

# 数量と価格の分布可視化
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=('数量分布', '価格分布', '数量ボックスプロット', '価格ボックスプロット'),
    specs=[[{"secondary_y": False}, {"secondary_y": False}],
           [{"secondary_y": False}, {"secondary_y": False}]]
)

# 数量分布ヒストグラム
fig.add_trace(
    go.Bar(x=quantity_dist['quantity'], y=quantity_dist['count'],
           name='数量分布', marker_color='lightblue'),
    row=1, col=1
)

# 価格分布ヒストグラム（上位20位のみ表示）
top_prices = price_dist.nlargest(20, 'count')
fig.add_trace(
    go.Bar(x=top_prices['totalPrice'], y=top_prices['count'],
           name='価格分布', marker_color='lightcoral'),
    row=1, col=2
)

# 数量ボックスプロット
quantity_data = df.select("quantity").toPandas()
fig.add_trace(
    go.Box(y=quantity_data['quantity'], name='数量', marker_color='lightgreen'),
    row=2, col=1
)

# 価格ボックスプロット
price_data = df.select("totalPrice").toPandas()
fig.add_trace(
    go.Box(y=price_data['totalPrice'], name='価格', marker_color='lightyellow'),
    row=2, col=2
)

fig.update_layout(
    height=800,
    title_text="数量と価格の分布分析",
    showlegend=False
)

fig.update_xaxes(title_text="数量", row=1, col=1)
fig.update_xaxes(title_text="価格", row=1, col=2)
fig.update_yaxes(title_text="频度", row=1, col=1)
fig.update_yaxes(title_text="频度", row=1, col=2)
fig.update_yaxes(title_text="数量", row=2, col=1)
fig.update_yaxes(title_text="価格", row=2, col=2)

fig.show()

# 基本統計の表示
print("\n=== 数量と価格の基本統計 ===")
print(f"数量の範囲: {quantity_data['quantity'].min()} - {quantity_data['quantity'].max()}")
print(f"価格の範囲: {price_data['totalPrice'].min()} - {price_data['totalPrice'].max()}円")
print(f"数量の中央値: {quantity_data['quantity'].median()}")
print(f"価格の中央値: {price_data['totalPrice'].median()}円")

=== 数量と価格の基本統計 ===
数量の範囲: 1 - 60
価格の範囲: 3 - 180円
数量の中央値: 6.0
価格の中央値: 18.0円

相関分析とヒートマップ

# 相関分析用のデータ準備
import numpy as np
from pyspark.sql.functions import when

# 数値列のみを選択してPandas DataFrameに変換
corr_data = df.select("quantity", "unitPrice", "totalPrice", "customerID", "franchiseID").toPandas()

# 相関行列の計算
correlation_matrix = corr_data.corr()

print("=== 相関行列 ===")
display(correlation_matrix)

# 相関ヒートマップの作成
fig = go.Figure(data=go.Heatmap(
    z=correlation_matrix.values,
    x=correlation_matrix.columns,
    y=correlation_matrix.columns,
    colorscale='RdBu',
    zmid=0,
    text=np.round(correlation_matrix.values, 3),
    texttemplate="%{text}",
    textfont={"size": 12},
    hoverongaps=False
))

fig.update_layout(
    title="変数間の相関ヒートマップ",
    width=600,
    height=500
)

fig.show()

quantity	unitPrice	totalPrice	customerID	franchiseID
1	null	1.0000000000000038	0.00849567171417982	0.16959025262916513
null	null	null	null	null
1.0000000000000038	null	1	0.008495671714179767	0.1695902526291665
0.00849567171417982	null	0.008495671714179767	1	-0.0002185150008554756
0.16959025262916513	null	0.1695902526291665	-0.0002185150008554756	1

顧客とフランチャイズ分析

# 顧客別分析
customer_analysis = df.groupBy("customerID") \
                      .agg(spark_sum("totalPrice").alias("customer_total_spent"),
                           count("transactionID").alias("customer_transactions"),
                           avg("totalPrice").alias("customer_avg_spent")) \
                      .orderBy("customer_total_spent", ascending=False)

# フランチャイズ別分析
franchise_analysis = df.groupBy("franchiseID") \
                       .agg(spark_sum("totalPrice").alias("franchise_total_revenue"),
                            count("transactionID").alias("franchise_transactions"),
                            avg("totalPrice").alias("franchise_avg_transaction")) \
                       .orderBy("franchise_total_revenue", ascending=False)

print("=== 顧客分析（上位10名） ===")
display(customer_analysis.limit(10))

print("\n=== フランチャイズ分析 ===")
display(franchise_analysis)

# 顧客とフランチャイズの基本統計
print(f"\n=== 基本統計 ===")
print(f"ユニーク顧客数: {df.select('customerID').distinct().count():,}")
print(f"ユニークフランチャイズ数: {df.select('franchiseID').distinct().count():,}")

=== 顧客分析（上位10名） ===

|customerID|customer_total_spent|customer_transactions|customer_avg_spent|
|---|---|---|---|
|1000112|672|20|33.6|
|1000134|519|12|43.25|
|1000248|498|13|38.30769230769231|
|1000006|480|18|26.666666666666668|
|1000275|465|19|24.473684210526315|
|1000229|450|16|28.125|
|1000179|450|11|40.90909090909091|
|1000173|438|16|27.375|
|1000082|423|14|30.214285714285715|
|1000237|417|10|41.7|

=== フランチャイズ分析 ===

franchiseID	franchise_total_revenue	franchise_transactions	franchise_avg_transaction
3000046	6642	63	105.42857142857143
3000047	4512	60	75.2
3000000	2790	83	33.6144578313253
3000021	2514	64	39.28125
3000002	2502	76	32.921052631578945
3000033	1482	90	16.466666666666665
3000010	1479	82	18.036585365853657
3000011	1404	86	16.325581395348838
3000045	1392	82	16.975609756097562
3000030	1371	77	17.805194805194805
3000032	1368	83	16.481927710843372
3000017	1356	76	17.842105263157894
3000027	1353	77	17.571428571428573
3000042	1350	77	17.532467532467532
3000043	1338	78	17.153846153846153
3000036	1329	69	19.26086956521739
3000004	1281	77	16.636363636363637
3000006	1275	84	15.178571428571429
3000005	1272	74	17.18918918918919
3000040	1239	70	17.7
3000044	1239	75	16.52
3000018	1212	77	15.74025974025974
3000007	1212	74	16.37837837837838
3000028	1188	64	18.5625
3000041	1179	70	16.84285714285714
3000015	1176	74	15.891891891891891
3000023	1164	67	17.37313432835821
3000037	1161	67	17.328358208955223
3000034	1155	76	15.197368421052632
3000016	1131	74	15.283783783783784
3000009	1122	67	16.746268656716417
3000029	1104	64	17.25
3000008	1101	69	15.956521739130435
3000019	1083	63	17.19047619047619
3000031	1083	65	16.661538461538463
3000025	1020	60	17
3000003	1008	63	16
3000024	999	59	16.93220338983051
3000022	951	58	16.396551724137932
3000038	942	54	17.444444444444443
3000026	927	59	15.711864406779661
3000035	915	54	16.944444444444443
3000001	879	51	17.235294117647058
3000012	840	48	17.5
3000013	834	54	15.444444444444445
3000014	225	75	3
3000039	195	65	3
3000020	177	59	3

=== 基本統計 ===
ユニーク顧客数: 300
ユニークフランチャイズ数: 48

EDA結果のまとめと洞察

print("""=== Bakehouse Sales Transactions EDA 結果まとめ ===

📊 データセット概要:
- 総レコード数: 3,333件
- 期間: 2024年5月1日～17日（17日間）
- 商品数: 6種類
- 支払い方法: 3種類（Mastercard, Amex, Visa）
- 欠損値: なし（データ品質良好）

💰 売上分析:
- 総売上高: 66,471円
- 平均取引額: 19.94円
- 日平均取引件数: 196件

🍰 商品分析:
1. Golden Gate Ginger: 11,595円 (586件)
2. Outback Oatmeal: 11,199円 (561件)
3. Austin Almond Biscotti: 11,148円 (530件)
4. Tokyo Tidbits: 10,986円 (583件)
5. Pearly Pies: 10,785円 (550件)
6. Orchard Oasis: 10,758円 (523件)

💳 支払い方法分析:
- Amex: 22,434円 (1,106件) - 平均取引額最高
- Mastercard: 22,263円 (1,144件) - 取引件数最多
- Visa: 21,774円 (1,083件)

🕢 時間帯分析:
- ピーク時間帯や曜日のパターンが確認できました

📈 数量・価格分析:
- 数量範囲: 1-60個 (中央値: 6個)
- 価格範囲: 3-180円 (中央値: 18円)
- 単価は一定 (3円) - 総額 = 数量 × 3

🔍 主要な洞察:
1. 全商品がほぼ同等の売上を上げており、バランスの取れた商品ラインナップ
2. 支払い方法はほぼ均等に分散している
3. 単価が固定であるため、売上は数量に比例
4. データ品質が高く、分析に適したデータセット

🚀 推奨される次のステップ:
- 時系列予測モデルの構築
- 顧客セグメンテーション分析
- 商品推薦システムの開発
- A/Bテストの設計
""")

=== Bakehouse Sales Transactions EDA 結果まとめ ===

📊 データセット概要:
- 総レコード数: 3,333件
- 期間: 2024年5月1日～17日（17日間）
- 商品数: 6種類
- 支払い方法: 3種類（Mastercard, Amex, Visa）
- 欠損値: なし（データ品質良好）

💰 売上分析:
- 総売上高: 66,471円
- 平均取引額: 19.94円
- 日平均取引件数: 196件

🍰 商品分析:
1. Golden Gate Ginger: 11,595円 (586件)
2. Outback Oatmeal: 11,199円 (561件)
3. Austin Almond Biscotti: 11,148円 (530件)
4. Tokyo Tidbits: 10,986円 (583件)
5. Pearly Pies: 10,785円 (550件)
6. Orchard Oasis: 10,758円 (523件)

💳 支払い方法分析:
- Amex: 22,434円 (1,106件) - 平均取引額最高
- Mastercard: 22,263円 (1,144件) - 取引件数最多
- Visa: 21,774円 (1,083件)

🕢 時間帯分析:
- ピーク時間帯や曜日のパターンが確認できました

📈 数量・価格分析:
- 数量範囲: 1-60個 (中央値: 6個)
- 価格範囲: 3-180円 (中央値: 18円)
- 単価は一定 (3円) - 総額 = 数量 × 3

🔍 主要な洞察:
1. 全商品がほぼ同等の売上を上げており、バランスの取れた商品ラインナップ
2. 支払い方法はほぼ均等に分散している
3. 単価が固定であるため、売上は数量に比例
4. データ品質が高く、分析に適したデータセット

🚀 推奨される次のステップ:
- 時系列予測モデルの構築
- 顧客セグメンテーション分析
- 商品推薦システムの開発
- A/Bテストの設計

エージェントの画面にも結果が表示されます。

いやー、本当にAIにデータ分析してもらう時代になったんですね。ただ、ここからアクションに繋がる洞察を得るのは(今のところ)人間ですが。

はじめてのDatabricks

Databricks無料トライアル

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up