More than 5 years have passed since last update.

MemoryEnhancer > MEDC > test_get_list_180130.py > GitHub上のMarkdownファイル名(*.html.md)リストを取得する | GitHub REST API v3 | JSONモジュール | BeautifulSoup

Last updated at 2018-01-30Posted at 2018-01-30

動作環境

Ubuntu 16.04.3 LTS desktop amd64
tmux 2.1-3build1
Python 2.7.12
Python 3.5.2

処理概要

GitHub上の/data以下のMarkdownファイル名リストを取得する

MEDC(Memory Enhancer Data Collectorの意味)に使う。

code

test_get_list_180130.py

import requests as rq

# IN_URL = "https://github.com/yasokada/TechEnglish_170903/blob/master/data/5290.html.md"
IN_URL = "https://github.com/yasokada/TechEnglish_170903/tree/master/data"

res = rq.get(IN_URL)
for elem in str(res.content).split('\\n'):
    if "html.md" in elem:
    	wrk = elem.replace(">", " ")
    	wrk = wrk.replace("<", " ")
    	wrk = wrk.split(" ")
    	print(wrk[-5])

run

$ python3 test_get_list_180130.py  | head -n 5
10.html.md
1096.html.md
1097.html.md
1098.html.md
1099.html.md

教えていただいた事項

(追記 2018/01/30)

@SaitoAtsushi さんのコメントにてGitHub APIについて紹介いただきました。

情報感謝です。

@shiracamus さんのコメントにてBeautifulSoupの紹介と具体的なコードを紹介いただきました。

情報感謝です。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

MemoryEnhancer > MEDC > test_get_list_180130.py > GitHub上のMarkdownファイル名(*.html.md)リストを取得する | GitHub REST API v3 | JSONモジュール | BeautifulSoup

処理概要

code

関連

教えていただいた事項