PythonとQiitaAPIを使って投稿記事を画像ファイルごと保存する。

Last updated at 2025-03-23Posted at 2025-03-23

はじめに

Qiitaの記事データを取得したい場合、Markdownで本文を見るを選択することによってmarkdown形式で取得することができる。しかし、記事内の画像データは、Amazon S3に保存され、そのパスのみmarkdownに記述される。そのためmarkdownのデータ取得だけでは画像を保存できず、取得のためにひと手間必要となる。
この記事では、QiitaAPIとPythonを使って、自分が投降した記事を画像ごと保存する方法をまとめる。

ライブラリ	用途
requests	QiitaAPIをHTTPメソッドでアクセスする。
json	QiitaAPIで取得したデータをJSON方式で一時保存する。
re	取得したデータのファイル名などを加工するため。
os	ファイル保存時のフォルダ操作のため。
tqdm	処理進捗確認のため。必須でない。

userへ取得したい記事の作者を指定。本記事では、自分の記事を取得することを目的にしているので自分のユーザ名を設定する。tokenはQiitaより取得したトークンを設定する。
　QiitaAPIで取得したデータはPOST_jsonlistへJSON形式で一時保存する。sub_post_download()でPOST_jsonlistから1記事ずつ保存していく。記事名をフォルダ名、ファイル名として、markdown形式でデータ保存する。記事内の画像データは、sub_image()でimadeサブフォルダに保存していく。

import requests
import json
import re
import os
from tqdm import tqdm

#---------------------------
# postより、記事を.mdで、imageは/imageフォルダに保存する。
# post:辞書型の記事
#---------------------------
def sub_post_download(post):
	imgcnt =0
	dir =''
	
	#----------------------
	# 画像データの保存
	#----------------------
	def sub_image(match):
		nonlocal imgcnt,dir
		imgcnt+=1
		
		#画像用フォルダ作成(.dir/image)
		if(imgcnt==0):
			os.makedirs(os.path.join(dir, 'image'), exist_ok=True)
	
		#ファイル名(.dir/image/file)
		file =  f'{imgcnt:02d}_{match.group(1)}'
		file_path = os.path.join(os.path.join(dir, 'image'), file)
		
		#画像ファイル出力
		with open(file_path, "wb") as f:
			#画像データ
			img_data = requests.get(match.group(2)).content
			f.write(img_data)
		
		return f'![{file}](image\\{file})'
	
	# データ取得
	title = post['title']
	url   = post['url']
	tags  = ','.join(t['name'] for t in post['tags'])
	body  = post['body']
	
	# 格納フォルダ作成:ファイル名として使用できるようタイトルを正規化してから作成
	dir = re.sub(r'[\\/*?:"<>|]', '_', title)
	os.makedirs(dir, exist_ok=True)
	
	# 画像ファイルの保存と、保存場所変更によるbodyの文字列変更
	body = re.sub(r'!\[(.*?)\]\((.*?)\)',sub_image, body)
	
	# 保存データ成形
	md   = f'---\ntitle : {title}\nurl   : {url}\ntags  : {tags}\n---\n{body}'
	file_path = os.path.join(dir, f"{dir}.md")
	# Markdownファイルを保存
	with open(file_path, "w", encoding="utf-8") as f:
		f.write(md)

#-----------------------
# main()処理
#-----------------------
def main(user,token):
	#userの記事数確認
	qiita_res = requests.get('https://qiita.com/api/v2/users/'+user)
	print(qiita_res)
	user_info = json.loads(qiita_res.text)["items_count"]
	
	# 記事数を100ずつに分けて処理
	POST_jsonlist=[]
	for i in tqdm(range((user_info+100-1)//100)):
		if token !='':
			qiita_res = requests.get('https://qiita.com/api/v2/users/'+user+'/items', params = {"page":f'{(i+1):d}', "per_page":"100"}, headers= { "Authorization": "Bearer "+token})
		else:
			qiita_res = requests.get('https://qiita.com/api/v2/users/'+user+'/items', params = {"page":f'{(i+1):d}', "per_page":"100"})
		
		#json_dataのリストに追加
		POST_jsonlist.append(json.loads(qiita_res.text))
	
	# 記事毎に保存処理
	for cnt in range(len(POST_jsonlist)):
		for POST in POST_jsonlist[cnt]:
			sub_post_download(POST)

if __name__ == "__main__":
	user = input('ユーザ名:')
	token = input('アクセストークン:')
	main(user,token)

戻る

コードのポイント

QiitaAPIで取得したデータ

title = post['title'] #タイトル
url   = post['url'] #記事のURL
tags  = ','.join(t['name'] for t in post['tags']) #タグの一覧
body  = post['body'] #記事のmarkdown本文

Amazon S3のデータ取得方法

markdown本文内で、画像データは下記のようにAmazon S3のパスを指定されているので、正規表現を使って、保存ファイル名、パスを取得する。

画像データ

![aaos.gif](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/1299802/9f2fe24f-2d8a-d53c-a859-8f25b1ae5149.gif)

正規表現

body = re.sub(r'!\[(.*?)\]\((.*?)\)',sub_image, body)

保存時のポイント

フォルダ名で使用できない文字列が記事名に含まれている可能性があるので、下記のように_に置き換える。

dir = re.sub(r'[\\/*?:"<>|]', '_', title)

画像データはimageフォルダに保存するので、記事内の画像データパスを変更する。

return f'![{file}](image\\{file})'

以上
戻る

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up