More than 5 years have passed since last update.

文字列に含まれるタグを削除して要素を取り出す

Python

Posted at 2017-04-04

def remove_select_tags(string, start_tag, end_tag):
    start = string.find(start_tag)
    while start != -1:
        end = string.find(end_tag, start)
        string = string[:start] + " " + string[end + 1:]
        start = string.find(start_tag)
    return string.split()

def test_case():
    target_string = '''<h1>Title</h1><p>This is a
                        <a href="mt-takao.top">link</a>.<p>'''
    assert remove_select_tags(target_string, '<', '>') == ['Title', 'This', 'is', 'a', 'link', '.']
    target_string = "[test]アイウエオ[test][next]カキクケコ[next]"
    assert remove_select_tags(target_string, '[', ']') == ['アイウエオ', 'カキクケコ']
    print('test ok')
test_case()

文字列の中の指定した文字の要素を削除し要素を取り出します。
要素が見つからなかったら-1が返るのでそれ以外の場合実行します。
startで見つかったindexの前の部分を空白でつなぎendで見つかったindex以降の文字列とつなぎ合わせます。
そして新たに生成された文字列からさらにstart_tagを探します。
最後に空白をカンマ区切りにして終了します。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up