3
5

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

Pythonのマルチコア並列処理でクローリングを高速化

Posted at

複数のサイトから情報収集する用事があり、簡単なクローラーを書いていたのだがサイトが増えると遅くてしょうがない。そこでmultiprocessingで並列処理してみたらコア数の分だけ高速化できた気がするので公開。

from multiprocessing import Pool
from multiprocessing import Process

import feedparser
import time

keyword = ''
feed_urls = [
    'https://www.theverge.com/rss/index.xml',
    'https://gizmodo.com/rss',
    'https://www.cnet.com/rss/all/',
    'https://techcrunch.com/feed/',
    'https://news.ycombinator.com/rss',
    'http://feeds.arstechnica.com/arstechnica/index/',
    'http://feeds.mashable.com/Mashable',
    'https://hub.packtpub.com/feed/'
]

def function(n):
    count = 0
    feed_result = feedparser.parse(feed_urls[n])
    for entry in feed_result.entries:
        flag = False
        try:
            if keyword in entry.title.lower():
                flag = True
            if keyword in entry.content.lower():
                flag = True
            if keyword in entry.description.lower():
                flag = True
        except:
            pass
        if flag == True:
            print(entry.title)
            print(entry.link)
            print()
            count = count + 1
    return count

def multi(n):
    p = Pool(4) #最大プロセス数
    result = p.map(function, range(n))
    return result

def main():
    global keyword
    print("input keyword:",end='')
    keyword = input().rstrip().lower()

    start = time.time()
    hit_count = 0
    data = multi(len(feed_urls))
    for i in data:
        hit_count = hit_count + i
    print(hit_count)
    print(time.time() - start)

main()

こちらを参考にしました。

3
5
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
5

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?