1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

Google Colab上でdatasets.load_dataset()で.zstのデータをロードできない

Last updated at Posted at 2023-10-17

TL; DR

結論:ランタイムを再起動しましょう。

背景

Hugging Face NLP Course 5章のビッグデータ処理部分を学習中に遭遇したエラー。

困ったこと

zstandardで圧縮されたデータをload_dataset()でロードしようとしたが、ImportErrorpip install zstandardを実行してくださいと言われる。

が、実行しても同じエラーが出る。

from datasets import load_datasets

data_files = "https://the-eye.eu/public/AI/training_data/code_clippy_data/code_clippy_dedup_data/train/data_1761_time1626321250_default.jsonl.zst"
pubmed_dataset = load_dataset('json', data_files=data_files, split="train")
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-4-f85ab24d8195> in <cell line: 5>()
      3 # This takes a few minutes to run, so go grab a tea or coffee while you wait :)
      4 data_files = "https://the-eye.eu/public/AI/training_data/code_clippy_data/code_clippy_dedup_data/train/data_1761_time1626321250_default.jsonl.zst"
----> 5 pubmed_dataset = load_dataset('json', data_files=data_files, split="train")
      6 pubmed_dataset

14 frames
/usr/local/lib/python3.10/dist-packages/datasets/utils/extract.py in extract(input_path, output_path)
    223     def extract(input_path: Union[Path, str], output_path: Union[Path, str]) -> None:
    224         if not config.ZSTANDARD_AVAILABLE:
--> 225             raise ImportError("Please pip install zstandard")
    226         import zstandard as zstd
    227 

ImportError: Please pip install zstandard

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------

原因

  • pip install datasets
  • load_dataset()
  • pip install zstandard

の順で実行してしまうと、load_dataset()の後にzstandardをインストールしても直らない。

実行後はエラーメッセージ内のconfig.ZSTANDARD_AVAILABLE変数が書き換わらないと思われる。

解決策

ランタイムを再起動し、load_dataset()を実行する前にzstandardをインストールする。

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?