0
0

RSSFeedLoaderで、「exception: newspaper package not found, please install it with `pip install newspaper3k`」が消えない

Posted at

LangchainのRSSFeedLoaderで、RSSからドキュメントロード時に以下のエラーが発生した

ソース(rss/test.py)

from langchain_community.document_loaders import RSSFeedLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.text_splitter import CharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores.inmemory import InMemoryVectorStore

urls = ["https://rss.itmedia.co.jp/rss/2.0/aiplus.xml"]

loader = RSSFeedLoader(urls=urls)

text_splitter = CharacterTextSplitter(
    separator = "\n",
    chunk_size = 400,
    chunk_overlap = 0,
    length_function = len,
)

index = VectorstoreIndexCreator(
    vectorstore_cls=InMemoryVectorStore,
    embedding=OpenAIEmbeddings(),
    text_splitter=text_splitter,
).from_loaders([loader])

エラーメッセージ

Error processing entry https://www.itmedia.co.jp/business/articles/2406/25/news053.html, exception: newspaper package not found, please install it with `pip install newspaper3k`

newspaper3kをインストールした後でも、同じエラーがでる。

% pip install newspaper3k
% python rss/test.py
Error processing entry https://www.itmedia.co.jp/news/articles/2406/25/news101.html, exception: newspaper package not found, please install it with `pip install newspaper3k`

lxml[html_clean]をインストールすることでエラーが解消した

lxml[html_clean]インストール

% pip install "lxml[html_clean]" 

エラー解消

% python rss/test.py            
Created a chunk of size 850, which is longer than the specified 400
Created a chunk of size 593, which is longer than the specified 400
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0