0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

[OCI]Autonomous Databaseのサンプルデータセットから1000万件のJSONドキュメントを生成してmongoimportでAutonomous Databaseにインポートしてみた

Last updated at Posted at 2022-02-18

はじめに

こちらのドキュメントに記載があるようにAutonomous Databaseには、サンプルデータセットが用意されています。

今回は、サンプルデータセットのStar Schema Benchmark (SSB)スキーマのデータを使用して、Oracle Database API for MongoDBのパフォーマンス検証用の1000万件のJSONドキュメントを作成してみます。

1.サンプルデータセットの確認

SQL*Plusでadminユーザとして、Autonomous JSON Databaseに接続します。

[opc@work ~]$ sqlplus admin/Demo#1Demo#1@ajd1_medium

SQL*Plus: Release 19.0.0.0.0 - Production on 金 2月 18 08:45:58 2022
Version 19.11.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

最終正常ログイン時間: 金 2月  18 2022 08:12:52 +09:00


Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.14.0.1.0
に接続されました。
SQL> 

SSBスキーマのテーブル一覧と各テーブルの行数を確認します。

SQL> SELECT table_name,num_rows
  2  FROM dba_tables
  3  WHERE owner='SSB';

TABLE_NAME	         NUM_ROWS
-------------------- ----------
CUSTOMER	           30000000
DWDATE			           2556
SUPPLIER		        2000000
PART			        2000000
LINEORDER	         5999989709

SQL> 

今回は3000万レコードあるCUSTOMERテーブルから1000万件のJSONドキュメントを作成したいと思います。
CUSTOMERテーブルの構造を確認します。

SQL> desc ssb.customer
 名前                                       NULL?    
 ----------------------------------------- -------- ----------------------------
 C_CUSTKEY				                   NOT NULL NUMBER
 C_NAME 					                        VARCHAR2(25)
 C_ADDRESS					                        VARCHAR2(25)
 C_CITY 					                        CHAR(10)
 C_NATION					                        CHAR(15)
 C_REGION					                        CHAR(12)
 C_PHONE					                        CHAR(15)
 C_MKTSEGMENT					                    CHAR(10)

SQL> 

CUSTOMERテーブルのレコード数を確認します。

SQL> SELECT COUNT(c_custkey) FROM ssb.customer;               

COUNT(C_CUSTKEY)
----------------
	30000000

SQL>

CUSTOMERテーブルの内容を確認します。

SQL> set linesize 200
SQL> SELECT * FROM ssb.customer
  2  WHERE rownum < 6;

 C_CUSTKEY C_NAME		             C_ADDRESS		           C_CITY	  C_NATION	      C_REGION     C_PHONE	       C_MKTSEGME
---------- ------------------------- ------------------------- ---------- --------------- ------------ --------------- ----------
  17635532 Customer#017635532	     vZYASXLARjB8pS	           UNITED ST8 UNITED STATES   AMERICA      34-113-411-8772 MACHINERY
  17635533 Customer#017635533	     Q2Bv2IPN,		           ETHIOPIA 8 ETHIOPIA	      AFRICA       15-922-544-9678 BUILDING
  17635534 Customer#017635534	     hvHOx1N9DeW5LwiWJzQm7     ETHIOPIA 4 ETHIOPIA  	  AFRICA       15-455-270-1411 MACHINERY
  17635535 Customer#017635535	     wJmDOr		               CHINA	9 CHINA    	      ASIA	       28-756-199-5197 AUTOMOBILE
  17635536 Customer#017635536	     Nq1YMYBx		           UNITED KI8 UNITED KINGDOM  EUROPE       33-241-398-9805 FURNITURE

SQL> 

2.JSONドキュメントの生成

JSON_OBJECTファンクションを使用して、CUSTOMERテーブルの各レコードをJSONドキュメントとして抽出してみます。

SQL> SELECT 
  2  JSON_OBJECT(
  3  	KEY	'custkey'	VALUE	c_custkey,
  4  	KEY	'name'		VALUE	c_name,
  5  	KEY	'address'	VALUE	c_address,
  6  	KEY	'city'		VALUE	RTRIM(c_city),
  7  	KEY	'nation'	VALUE	RTRIM(c_nation),
  8  	KEY	'region'	VALUE	RTRIM(c_region),
  9  	KEY	'phone'		VALUE	RTRIM(c_phone),
 10  	KEY 'segment'	VALUE	RTRIM(c_mktsegment)
 11  ) customer_object
 12  FROM ssb.customer
 13  WHERE rownum < 6;

CUSTOMER_OBJECT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{"custkey":17635532,"name":"Customer#017635532","address":"vZYASXLARjB8pS ","city":"UNITED ST8","nation":"UNITED STATES","region":"AMERICA","phone":"34-113-411-8772","segment":"MACHINERY"}
{"custkey":17635533,"name":"Customer#017635533","address":"Q2Bv2IPN,","city":"ETHIOPIA 8","nation":"ETHIOPIA","region":"AFRICA","phone":"15-922-544-9678","segment":"BUILDING"}
{"custkey":17635534,"name":"Customer#017635534","address":"hvHOx1N9DeW5LwiWJzQm7","city":"ETHIOPIA 4","nation":"ETHIOPIA","region":"AFRICA","phone":"15-455-270-1411","segment":"MACHINERY"}
{"custkey":17635535,"name":"Customer#017635535","address":"wJmDOr","city":"CHINA    9","nation":"CHINA","region":"ASIA","phone":"28-756-199-5197","segment":"AUTOMOBILE"}
{"custkey":17635536,"name":"Customer#017635536","address":"Nq1YMYBx","city":"UNITED KI8","nation":"UNITED KINGDOM","region":"EUROPE","phone":"33-241-398-9805","segment":"FURNITURE"}

SQL>

JSON_OBJECTファンクションを使用して、CUSTOMERテーブルの各レコードをJSONドキュメントとして抽出できることがわかりました。

CUSTOMERテーブルの各レコードをJSONドキュメント化したものを、UTL_FILEプロシージャを使用してAutonomous DatabaseのDATA_PUMP_DIRにファイル"customer_object.json"として書き出すPL/SQLブロックを作成し、実行します。

SQL> set timing on
SQL> DECLARE
  2  	v_filedir VARCHAR2(50) := 'DATA_PUMP_DIR';
  3  	v_filename VARCHAR2(50) := 'customer_object.json';
  4  	v_filehandle UTL_FILE.FILE_TYPE;
  5  
  6  	CURSOR customer_object_cur IS
  7  		SELECT json_object(
  8  			KEY	'custkey'	VALUE	c_custkey,
  9  			KEY	'name'		VALUE	c_name,
 10  			KEY	'address'	VALUE	c_address,
 11  			KEY	'city'		VALUE	RTRIM(c_city),
 12  			KEY	'nation'	VALUE	RTRIM(c_nation),
 13  			KEY	'region'	VALUE	RTRIM(c_region),
 14  			KEY	'phone'		VALUE	RTRIM(c_phone),
 15  			KEY 'segment'	VALUE	RTRIM(c_mktsegment)
 16  			) customer_object
 17  		FROM ssb.customer
 18  		WHERE rownum < 10000001;
 19  
 20  BEGIN
 21  	v_filehandle := UTL_FILE.FOPEN(v_filedir,v_filename,'w');
 22  	FOR customer_object_record IN customer_object_cur LOOP
 23  		UTL_FILE.PUT_LINE(v_filehandle,customer_object_record.customer_object);
 24  	END LOOP;
 25  	UTL_FILE.FCLOSE(v_filehandle);
 26  END;
 27  /

PL/SQLプロシージャが正常に完了しました。

経過: 00:26:38.70
SQL>

DBMS_CLOUD.PUT_OBJECTプロシージャを使用して、DATA_PUMP_DIRに作成されたファイル"customer_object.json"をオブジェクト・ストレージ・バケットにコピーします。

SQL> BEGIN
  2  	DBMS_CLOUD.PUT_OBJECT (
  3  		credential_name	=> 'OCI$RESOURCE_PRINCIPAL',
  4  		object_uri		=> 'https://objectstorage.ap-tokyo-1.oraclecloud.com/n/_namespace_/b/_bucketname_/o/customer_object.json',
  5  		directory_name	=> 'DATA_PUMP_DIR',
  6  		file_name		=> 'customer_object.json'
  7  	);
  8  END;
  9  /

PL/SQLプロシージャが正常に完了しました。

経過: 00:00:44.72
SQL>

OCI CLIを使用して、オブジェクト・ストレージ・バケットからcustomer_object.jsonをComputeインスタンス上にダウンロードします。

[opc@work dump]$ time oci os object get -bn _bucketname_ --name customer_object.json --file customer_object.json
Downloading object  [####################################]  100%          

real	0m17.290s
user	0m4.781s
sys	0m3.144s
[opc@work dump]$

ファイルcustomer_object.jsonのサイズを確認します。

[opc@work dump]$ ls -l
合計 1782352
-rw-rw-r--. 1 opc opc 1825124829  2月 17 18:08 customer_object.json
[opc@work dump]$

サイズが約1.8GBであることがわかりました。

ファイルcustomer_object.jsonの行数を確認します。

[opc@work dump]$ wc -l customer_object.json 
10000000 customer_object.json
[opc@work dump]$ 

行数が1000万行であることがわかりました。

ファイルcustomer_object.jsonの内容を確認します。

[opc@work dump]$ head customer_object.json 
{"custkey":14523948,"name":"Customer#014523948","address":"enR9e VbzXa02e","city":"CHINA    4","nation":"CHINA","region":"ASIA","phone":"28-283-959-5402","segment":"MACHINERY"}
{"custkey":14523949,"name":"Customer#014523949","address":"ldY9gfFUDBIr5Z","city":"GERMANY  1","nation":"GERMANY","region":"EUROPE","phone":"17-714-560-4125","segment":"AUTOMOBILE"}
{"custkey":14523950,"name":"Customer#014523950","address":"LlKmD23VEG5vQaa","city":"UNITED KI9","nation":"UNITED KINGDOM","region":"EUROPE","phone":"33-201-707-1834","segment":"BUILDING"}
{"custkey":14523951,"name":"Customer#014523951","address":"bjUAS4","city":"ARGENTINA6","nation":"ARGENTINA","region":"AMERICA","phone":"11-947-201-7378","segment":"MACHINERY"}
{"custkey":14523952,"name":"Customer#014523952","address":"6gPVwvuMK","city":"EGYPT    3","nation":"EGYPT","region":"MIDDLE EAST","phone":"14-958-713-5993","segment":"AUTOMOBILE"}
{"custkey":14523953,"name":"Customer#014523953","address":"NJ1BBYuC","city":"PERU     3","nation":"PERU","region":"AMERICA","phone":"27-858-598-1805","segment":"HOUSEHOLD"}
{"custkey":14523954,"name":"Customer#014523954","address":"oG0zywN,6oYyCploLewVEB33","city":"SAUDI ARA1","nation":"SAUDI ARABIA","region":"MIDDLE EAST","phone":"30-851-573-2054","segment":"MACHINERY"}
{"custkey":14523955,"name":"Customer#014523955","address":"Kx39O Bh","city":"IRAQ     0","nation":"IRAQ","region":"MIDDLE EAST","phone":"21-586-616-9338","segment":"FURNITURE"}
{"custkey":14523956,"name":"Customer#014523956","address":"tmqX6VCZGyPV abB4hA6F ","city":"CHINA    9","nation":"CHINA","region":"ASIA","phone":"28-329-169-1291","segment":"MACHINERY"}
{"custkey":14523957,"name":"Customer#014523957","address":"IubpA88IlSIxPY  ,","city":"RUSSIA   6","nation":"RUSSIA","region":"EUROPE","phone":"32-191-872-3532","segment":"BUILDING"}
[opc@work dump]$ 
[opc@work dump]$ tail customer_object.json
{"custkey":5019247,"name":"Customer#005019247","address":"tFYGoDrqe1T","city":"MOZAMBIQU9","nation":"MOZAMBIQUE","region":"AFRICA","phone":"26-495-236-3327","segment":"BUILDING"}
{"custkey":5019248,"name":"Customer#005019248","address":"gZFrX2r5N6sW,","city":"ETHIOPIA 6","nation":"ETHIOPIA","region":"AFRICA","phone":"15-660-496-8108","segment":"BUILDING"}
{"custkey":5019249,"name":"Customer#005019249","address":"TyyqY8uRA","city":"SAUDI ARA8","nation":"SAUDI ARABIA","region":"MIDDLE EAST","phone":"30-476-103-1877","segment":"MACHINERY"}
{"custkey":5019250,"name":"Customer#005019250","address":"V6CDL7d8","city":"CHINA    0","nation":"CHINA","region":"ASIA","phone":"28-981-601-6054","segment":"BUILDING"}
{"custkey":5019251,"name":"Customer#005019251","address":"Lh7,tMh7gEvKbC6L6","city":"IRAN     6","nation":"IRAN","region":"MIDDLE EAST","phone":"20-557-423-3568","segment":"AUTOMOBILE"}
{"custkey":5019252,"name":"Customer#005019252","address":"y,p8ICpZxVR8U9LH,2XQMIBR","city":"PERU     4","nation":"PERU","region":"AMERICA","phone":"27-767-937-1287","segment":"HOUSEHOLD"}
{"custkey":5019253,"name":"Customer#005019253","address":"OePOQuxrpV55P","city":"PERU     1","nation":"PERU","region":"AMERICA","phone":"27-698-594-3714","segment":"MACHINERY"}
{"custkey":5019254,"name":"Customer#005019254","address":"J6M8HA","city":"SAUDI ARA2","nation":"SAUDI ARABIA","region":"MIDDLE EAST","phone":"30-616-468-4485","segment":"BUILDING"}
{"custkey":5019255,"name":"Customer#005019255","address":"aQv nL83zMUqLcGFaE,yH","city":"MOROCCO  7","nation":"MOROCCO","region":"AFRICA","phone":"25-361-463-9295","segment":"MACHINERY"}
{"custkey":5019256,"name":"Customer#005019256","address":"8o5sYWq48qzMV","city":"IRAN     2","nation":"IRAN","region":"MIDDLE EAST","phone":"20-382-211-8690","segment":"FURNITURE"}
[opc@work dump]$

想定通り、CUSTOMERテーブルの各レコードがJSONドキュメント化されてファイルcustomer_object.jsonに出力されていることがわかりました。

3.mongoimportを用いたAutonomous JSON DatabaseへのJSONデータのインポート

データの準備ができたので、mongoimportコマンドを使用して、Oracle Database API for MongoDB経由でAutonomous Databaseにcustomer_object.jsonをインポートしてみます。
※mongoimport 100.5.1には不具合があるため、正常にロードができませんでした。
 mongoimportを100.5.2にアップデートすることで正常に動作しました。

[opc@work ~]$ mongoimport --uri='mongodb://mongo:Demo%231Demo%231@XXXXXXXXXXXXXXX-AJD1.adb.ap-tokyo-1.oraclecloudapps.com:27016/mongo?authMechanism=PLAIN&authSource=$external&ssl=true&retryWrites=false' --collection=customer --type=json --file=/home/opc/dump/customer_object.json
2022-02-18T07:40:56.324+0900	connected to: mongodb://[**REDACTED**]@XXXXXXXXXXXXXXX-AJD1.adb.ap-tokyo-1.oraclecloudapps.com:27016/mongo?authMechanism=PLAIN&authSource=$external&ssl=true&retryWrites=false
2022-02-18T07:40:59.324+0900	[........................] mongo.customer	4.88MB/1.70GB (0.3%)
2022-02-18T07:41:02.325+0900	[........................] mongo.customer	10.6MB/1.70GB (0.6%)
2022-02-18T07:41:05.324+0900	[........................] mongo.customer	17.3MB/1.70GB (1.0%)
2022-02-18T07:41:08.325+0900	[........................] mongo.customer	22.3MB/1.70GB (1.3%)
2022-02-18T07:41:11.325+0900	[........................] mongo.customer	28.8MB/1.70GB (1.7%)
<中略>
2022-02-18T07:54:44.324+0900	[#######################.] mongo.customer	1.68GB/1.70GB (98.6%)
2022-02-18T07:54:47.324+0900	[#######################.] mongo.customer	1.68GB/1.70GB (98.9%)
2022-02-18T07:54:50.324+0900	[#######################.] mongo.customer	1.69GB/1.70GB (99.3%)
2022-02-18T07:54:53.324+0900	[#######################.] mongo.customer	1.69GB/1.70GB (99.7%)
2022-02-18T07:54:56.167+0900	[########################] mongo.customer	1.70GB/1.70GB (100.0%)
2022-02-18T07:54:56.167+0900	10000000 document(s) imported successfully. 0 document(s) failed to import.
[opc@work ~]$ 

1000万件のJSONドキュメント(約1.8GB)がおよそ14分でインポートできました。

4.インポートしたデータの確認

mongoshからAutonomous JSON Databaseに接続します。

[opc@work ~]$ mongosh 'mongodb://mongo:Demo%231Demo%231@XXXXXXXXXXXXXXX-AJD1.adb.ap-tokyo-1.oraclecloudapps.com:27017/mongo?authMechanism=PLAIN&authSource=$external&ssl=true&retryWrites=false&loadBalanced=true'
Current Mongosh Log ID:	620ed12187672a2e206e7db3
Connecting to:		mongodb://<credentials>@XXXXXXXXXXXXXXX-AJD1.adb.ap-tokyo-1.oraclecloudapps.com:27017/mongo?authMechanism=PLAIN&authSource=$external&ssl=true&retryWrites=false&loadBalanced=true
Using MongoDB:		4.2.14
Using Mongosh:		1.1.8

For mongosh info see: https://docs.mongodb.com/mongodb-shell/

mongo>

コレクションcustomerのドキュメント件数を確認してみます。

mongo> db.customer.find().count()
10000000
mongo>

1000万件のドキュメントがあることが確認できました。
ファイル"customer_object.json"内の全てのドキュメントが問題なくインポートできたことが確認できました。

find()メソッドを使用して、custkeyが14523948のドキュメントを抽出してみます。

mongo> db.customer.find({"custkey":14523948})
[
  {
    _id: ObjectId("620ecef8fb140ef6065e37f6"),
    custkey: 14523948,
    name: 'Customer#014523948',
    address: 'enR9e VbzXa02e',
    city: 'CHINA    4',
    nation: 'CHINA',
    region: 'ASIA',
    phone: '28-283-959-5402',
    segment: 'MACHINERY'
  }
]
mongo>

explain()メソッドを使用して、実行計画を確認してみます。

mongo> db.customer.find({"custkey":14523948}).explain()
{
  queryPlanner: {
    plannerVersion: 1,
    namespace: 'mongo.customer',
    indexFilterSet: false,
    parsedQuery: { custkey: { '$numberOnly': 14523948 } },
    winningPlan: {
      stage: 'SELECT STATEMENT',
      inputStage: {
        stage: 'TABLE ACCESS',
        options: 'STORAGE FULL',
        source: 'CUSTOMER',
        columns: '"ID"[VARCHAR2,255], "CREATED_ON"[TIMESTAMP,11], "LAST_MODIFIED"[TIMESTAMP,11], "VERSION"[VARCHAR2,255], "DATA"[LOB,4000]'
      }
    },
    rejectPlans: []
  },
  serverInfo: { host: 'localhost', port: 27017, version: '4.2.14' },
  ok: 1
}
mongo>

「TABLE ACCESS STORAGE FULL」となっているので、Autonomous Databaseの基盤となっているExadata固有の高速化処理「Smart Scan」が使用されていることがわかりました。

参考情報

[OCI]MongoDBクライアントからAutonomous JSON Databaseに接続してみた(MongoDB Shell編)
[OCI]MongoDBクライアントからAutonomous JSON Databaseに接続してみた(mongoexport/mongoimport編)
JSON_OBJECTファンクション
UTIL_FILEパッケージ
DBMS_CLOUD.PUT_OBJECTプロシージャ
OCI CLI
oci os object getコマンド
mongoimportコマンド

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?