0
0

Git Large File Storageを使って大容量ファイルをGitHubで管理する

Posted at

1.この記事の内容

Jupyter Notebookのファイルサイズに気づかず,git pushをするときに100MBを超えていて怒られたのでGit Large File Storageを活用して対策を行いましたので,手順を紹介します.

大まかな流れは,以下の通りです.

  1. 100MBを超えるファイルを登録する前のコミットまで戻る
    • 今回ここはgit cloneしなおすことで対応しました
  2. Git LFSの設定を行う
  3. 100MB超のファイルをコミット・プッシュする

1-1.利用環境

  • WSL2

2.背景

git push時に遭遇したエラーは下記のとおりです.
開発途中の3世代分のnotebooks/016_Sample-Compare-Features-VGG16-PyTorch.ipynbのファイルで問題が起きてしまっていました.

$ git push
Enumerating objects: 39, done.
Counting objects: 100% (39/39), done.
Delta compression using up to 16 threads
Compressing objects: 100% (28/28), done.
Writing objects: 100% (31/31), 237.08 MiB | 9.45 MiB/s, done.
Total 31 (delta 16), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (16/16), completed with 5 local objects.
remote: error: Trace: 918c861b20b3921941b018e3179b321c748a6d281d65dc2b0c7d558f685784a6
remote: error: See https://gh.io/lfs for more information.
remote: error: File notebooks/016_Sample-Compare-Features-VGG16-PyTorch.ipynb is 156.98 MB; this exceeds GitHub's file size limit of 100.00 MB
remote: error: File notebooks/016_Sample-Compare-Features-VGG16-PyTorch.ipynb is 154.33 MB; this exceeds GitHub's file size limit of 100.00 MB
remote: error: File notebooks/016_Sample-Compare-Features-VGG16-PyTorch.ipynb is 156.55 MB; this exceeds GitHub's file size limit of 100.00 MB
remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com.
To https://github.com/ryoma-jp/machine_learning.git
 ! [remote rejected] main -> main (pre-receive hook declined)
error: failed to push some refs to 'https://github.com/ryoma-jp/machine_learning.git'

3.Git Large File Storageを用いた対策

3-1.Git Large File Storageのインストール

Installing on Linux using packagecloudを参考にGit Large File Storageをインストールします.

$ curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
Detected operating system as Ubuntu/jammy.
Checking for curl...
Detected curl...
Checking for gpg...
Detected gpg...
Detected apt version as 2.4.12
Running apt-get update... done.
Installing apt-transport-https... done.
Installing /etc/apt/sources.list.d/github_git-lfs.list...done.
Importing packagecloud gpg key... Packagecloud gpg key imported to /etc/apt/keyrings/github_git-lfs-archive-keyring.gpg
done.
Running apt-get update... done.

The repository is setup! You can now install packages.
$ sudo apt install git-lfs
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages were automatically installed and are no longer required:
  linux-tools-5.15.0-60 linux-tools-5.15.0-60-generic
Use 'sudo apt autoremove' to remove them.
The following NEW packages will be installed:
  git-lfs
0 upgraded, 1 newly installed, 0 to remove and 15 not upgraded.
Need to get 7420 kB of archives.
After this operation, 16.5 MB of additional disk space will be used.
Get:1 https://packagecloud.io/github/git-lfs/ubuntu jammy/main amd64 git-lfs amd64 3.5.1 [7420 kB]
Fetched 7420 kB in 1s (7406 kB/s)
Selecting previously unselected package git-lfs.
(Reading database ... 58152 files and directories currently installed.)
Preparing to unpack .../git-lfs_3.5.1_amd64.deb ...
Unpacking git-lfs (3.5.1) ...
Setting up git-lfs (3.5.1) ...
Git LFS initialized.
Processing triggers for man-db (2.10.2-1) ...
$ git lfs --version
git-lfs/3.5.1 (GitHub; linux amd64; go 1.21.8)

3-2.Git Large File Storageの設定

$ git clone https://github.com/ryoma-jp/machine_learning.git
$ cd machine_learning
$ git lfs track "*.ipynb"
Tracking "*.ipynb"
$ cat .gitattributes 
*.ipynb filter=lfs diff=lfs merge=lfs -text
$ git add .gitattributes
$ git commit -m "add gitattributes"
[main 6373371] add gitattributes
 1 file changed, 1 insertion(+)
 create mode 100644 .gitattributes

3-3. 大容量ファイルのコミットとプッシュ

$ cd notebooks/
$ git add *.ipynb
$ git restore --staged 016_Sample-Compare-Features-VGG16-PyTorch.ipynb
$ git commit -m "add ipynb for git lfs"
$ git add utils/utils.py notebooks/016_Sample-Compare-Features-VGG16-PyTorch.ipynb
$ git commit -m "add sample code to compare features each preprocessing for input image"
(3世代分の変更を反映)
$ git push
Uploading LFS objects: 100% (25/25), 536 MB | 29 MB/s, done.
Enumerating objects: 77, done.
Counting objects: 100% (77/77), done.
Delta compression using up to 16 threads
Compressing objects: 100% (55/55), done.
Writing objects: 100% (59/59), 5.91 MiB | 4.10 MiB/s, done.
Total 59 (delta 18), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (18/18), completed with 5 local objects.
To https://github.com/ryoma-jp/machine_learning.git
  3f31c39..87abb05  main -> main

3-4. Git LFSの状態を確認

$ git lfs ls-files -s
8900fc87c1 * notebooks/001_ImageClassification-CIFAR10-SimpleCNN-PyTorch.ipynb (32 KB)
04834a705a * notebooks/002_ImageClassification-Food101-SimpleCNN-PyTorch.ipynb (75 KB)
266a054efe * notebooks/003_ImageClassification-Food101-SimpleCNN-GradCAM_Heatmap-PyTorch.ipynb (88 KB)
91c038b405 * notebooks/004_ImageClassification-Food101-SimpleCNN-EigenCAM_Heatmap-PyTorch.ipynb (93 KB)
a3ea5a68ab * notebooks/005_ImageClassification-Food101-SimpleCNN-ParameterSpaceSaliency_Heatmap-PyTorch.ipynb (313 KB)
3e11f60d5e * notebooks/006_Algorithm_Welfords-Method-for-Computing-Variance.ipynb (8.5 KB)
96bba755e0 * notebooks/007_DomainAdaptation-OfficeHome-VGG16-PyTorch.ipynb (30 KB)
b683cf3833 * notebooks/008_Sample_ModelSeparate.ipynb (58 KB)
149caf1aa6 * notebooks/009_ImageClassification-Food101-VGG16-PyTorch.ipynb (73 KB)
4aaf91047c * notebooks/010_Sample-Compare-Weights-of-VGG16-PyTorch-and-Keras.ipynb (19 KB)
b8310b6d9c * notebooks/011_ImageClassification-COCO2014-VGG16-PyTorch.ipynb (2.4 MB)
a70440fb27 * notebooks/012_Sample-PyTorch-Learning-Rate-Scheduler.ipynb (332 KB)
b8d764646c * notebooks/013_Sample-Evaluation-of-ObjectDetection-SSD-PyTorch.ipynb (1.8 MB)
65e6696bb9 * notebooks/014-2_ImageClassification-CIFAR100-SimpleCNN-SGD-PyTorch.ipynb (4.4 MB)
e930ef0708 * notebooks/014-3_ImageClassification-CIFAR100-SimpleCNN-Momentum-PyTorch.ipynb (4.5 MB)
7ffa0ddedc * notebooks/014-4_ImageClassification-CIFAR100-SimpleCNN-Adagrad-PyTorch.ipynb (4.6 MB)
5dcfdefa20 * notebooks/014-5_ImageClassification-CIFAR100-SimpleCNN-RMSProp-PyTorch.ipynb (4.6 MB)
306fb4cd72 * notebooks/014-6_ImageClassification-CIFAR100-SimpleCNN-Adadelta-PyTorch.ipynb (4.5 MB)
3921aff1a8 * notebooks/014-7_ImageClassification-CIFAR100-SimpleCNN-Adam-PyTorch.ipynb (4.7 MB)
545d7ee7f2 * notebooks/014-8_ImageClassification-CIFAR100-SimpleCNN-AdamW-PyTorch.ipynb (4.8 MB)
b3e2076ffd * notebooks/014_Sample-Comparing-Optimizers.ipynb (58 KB)
6e091fe987 * notebooks/015_Sample-Feature-Extraction-VGG16-PyTorch.ipynb (8.2 MB)
ed954fa9d4 * notebooks/016_Sample-Compare-Features-VGG16-PyTorch.ipynb (165 MB)

4.さいごに

大容量ファイルのGit管理は,通常は行いませんが,Git LFSというサービスを知ったため,お試しも兼ねて今回はpushすることにしました.

5.関連リンク

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0