0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

pandas: PostgreSQL のテーブルの重複を削除

Last updated at Posted at 2024-09-16

Primary Key を設定しないテーブルでは、データが重複する可能性があります。
その重複を削除するプログラムです。

実行前

  id   |  name  | population |  date_mod  
-------+--------+------------+------------
 t3461 | 広島   |      72814 | 2001-09-14
 t3461 | 広島   |      72814 | 2001-09-14
 t3462 | 福山   |      41738 | 2001-07-21
 t3462 | 福山   |      41738 | 2001-07-21
 t3463 | 東広島 |      92513 | 2001-06-12
 t3463 | 東広島 |      92513 | 2001-06-12
 t3464 | 呉     |      93167 | 2001-09-29
 t3464 | 呉     |      93167 | 2001-09-29
(8 rows)

実行後

  id   |  name  | population |  date_mod  
-------+--------+------------+------------
 t3461 | 広島   |      72814 | 2001-09-14
 t3462 | 福山   |      41738 | 2001-07-21
 t3463 | 東広島 |      92513 | 2001-06-12
 t3464 | 呉     |      93167 | 2001-09-29
(4 rows)

プログラム

db_duplicate_omit.py
#! /usr/bin/python
#
#	postgresql/pandas/db_duplicate_omit.py
#
#						Sep/16/2024
# ------------------------------------------------------------------
import pandas as pd
import sys
import os
from dotenv import load_dotenv
from sqlalchemy import create_engine
from sqlalchemy import text as q_text
#
# ------------------------------------------------------------------
def single_proc(engine,schema,table_src):
	sys.stderr.write("*** single_proc *** start\n")
	sql_str = 'SELECT * FROM ' + schema + "." + table_src
	print(sql_str)
	q = q_text(sql_str)
	try:
#
		df=pd.read_sql(sql=q, con=engine.connect())
		print(df)
		print(df.shape)
		dg = df.drop_duplicates()
		print(dg.shape)
#
		dg.to_sql(name=table_src,con=engine,schema=schema,if_exists='replace', index=False)
#
	except Exception as ee:
		sys.stderr.write("*** error *** in read_sql_query ***\n")
		sys.stderr.write(str(ee) + "\n")
# ------------------------------------------------------------------
sys.stderr.write("*** 開始 ***\n")

dotenv_path = '.env'
load_dotenv(dotenv_path)
user=os.environ.get("user")
password=os.environ.get("password")
data_base=os.environ.get("data_base")

host ='localhost'

SQLALCHEMY_DATABASE_URL = "postgresql://" + user + ":" + password + "@" + host + "/" + data_base
engine = create_engine(SQLALCHEMY_DATABASE_URL)
#
#
schema="public"
table_src = "cities"
#
single_proc(engine,schema,table_src)
#
sys.stderr.write("*** 終了 ***\n")
# ------------------------------------------------------------------
.env
user='scott'
password='tiger123'
data_base='city'

SQL

テーブルの作成

postgre_create.sql
drop table if exists cities;
create table cities (id varchar(10), name text, population int, date_mod date);

データの挿入

postgre_insert.sql
insert into cities values ('t3461','広島',72814,'2001-9-14');
insert into cities values ('t3462','福山',41738,'2001-7-21');
insert into cities values ('t3463','東広島',92513,'2001-6-12');
insert into cities values ('t3464','呉',93167,'2001-9-29');

データの読み込み

postgre_read.sql
select * from cities order by id;
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?