LoginSignup
0
1

More than 1 year has passed since last update.

特定のディレクトリにどんなファイルが存在するか簡易的に調査するには

Last updated at Posted at 2019-06-15

実現したいこと

  • リポジトリ内部の構造を簡易的に把握したい。

実現方法および解決方法

  • Terminalで find, awk, sort, uniq, fileコマンドを利用して簡易的に集計する

実行コマンド

find . -type f -exec file {} \; | awk -F: '{ print $2 }' | env LANG=c sort | uniq -c | env LANG=c sort -nr

.gitディレクトリを除外する場合

find . -type d -name .git -prune -o -type f -exec file {} \; | \
awk -F: '{ print $2 }' | env LANG=c sort | uniq -c | env LANG=c sort -nr

例えば、Visual Studio Code のリポジトリの場合

$ git clone https://github.com/microsoft/vscode.git
$ cd vscode
$ find ./ -type f -exec file {} \; | awk -F: '{ print $2 }' | sort | uniq -c | sort -nr
1499 Java source, ASCII text
1156 ASCII text
 700 SVG Scalable Vector Graphics image
 355 C++ source text, ASCII text
 70 Java source, ASCII text, with very long lines
 68 Java source, UTF-8 Unicode text
 55 ASCII text, with very long lines
 48 HTML document text, ASCII text
 41 UTF-8 Unicode text
 28 C++ source text, ASCII text, with very long lines
 28 Bourne-Again shell script text executable, ASCII text
 27 MS Windows icon resource - 9 icons, 16x16, 32 bits/pixel, 20x20, 32 bits/pixel
 22 Algol 68 source text, ASCII text
 20 c program text, ASCII text
 17 C++ source text, UTF-8 Unicode text
 16 empty
 16 ASCII text, with no line terminators
 14 UTF-8 Unicode text, with very long lines
 13 POSIX shell script text executable, ASCII text
 13 ISO-8859 text
 10 exported SGML document text, ASCII text
 10 PNG image data, 60 x 60, 8-bit/color RGBA, non-interlaced
 9 XML 1.0 document text
 9 Java source, UTF-8 Unicode text, with very long lines
 9 DOS batch file text, ASCII text, with CRLF line terminators
 8 data
 6 troff or preprocessor input text, ASCII text
 5 HTML document text, ASCII text, with very long lines
 5 C++ source text, UTF-8 Unicode text, with very long lines
 4 XML 1.0 document text, ASCII text
 4 PNG image data, 128 x 128, 8-bit/color RGBA, non-interlaced
 4 PNG image data, 128 x 128, 8-bit colormap, non-interlaced
 4 Little-endian UTF-16 Unicode text, with CRLF, CR line terminators
 4 ISO-8859 text, with very long lines
 4 HTML document text, UTF-8 Unicode text
 3 exported SGML document text, ASCII text, with very long lines
 3 c program text, UTF-8 Unicode text
 3 assembler source text, ASCII text
 3 UTF-8 Unicode text, with no line terminators
 3 UTF-8 Unicode (with BOM) text
 3 SVG XML document
 3 ISO-8859 text, with no line terminators
 2 UTF-8 Unicode (with BOM) text, with no line terminators
 2 Python script text executable, ASCII text
 2 PNG image data, 73 x 67, 8-bit/color RGB, non-interlaced
 2 PNG image data, 395 x 287, 8-bit/color RGB, non-interlaced
 2 PNG image data, 383 x 383, 8-bit colormap, non-interlaced
 2 PNG image data, 120 x 120, 8-bit colormap, non-interlaced
 2 PE32 executable (console) Intel 80386 Mono/.Net assembly, for MS Windows
 2 Objective-C source text, ASCII text
 2 Non-ISO extended-ASCII text, with no line terminators
 2 Big-endian UTF-16 Unicode text, with CRLF line terminators
 2 AppleScript compiled
 2 Algol 68 source text, UTF-8 Unicode text, with very long lines
 2 ASCII text, with CRLF line terminators
 1 unified diff output text, ASCII text
 1 makefile script text, ASCII text
 1 exported SGML document text, UTF-8 Unicode text, with very long lines
 1 c program text, ASCII text, with very long lines
 1 a /usr/bin/env node script text executable, ASCII text
 1 Zip archive data, at least v1.0 to extract
 1 XML document text, ASCII text
 1 XML 1.0 document text, UTF-8 Unicode text
 1 X pixmap image text, ASCII text, with very long lines
 1 Web Open Font Format, TrueType, length 33904, version 1.0
 1 Web Open Font Format, TrueType, length 18824, version 1.0
 1 UTF-8 Unicode text, with very long lines, with CRLF line terminators
 1 UTF-8 Unicode text, with CRLF line terminators
 1 UTF-8 Unicode (with BOM) text, with very long lines
 1 TrueType font data
 1 Ruby module source text, ASCII text
 1 Perl5 module source text, ASCII text
 1 Perl script text executable
 1 PNG image data, 9 x 9, 8-bit/color RGBA, non-interlaced
 1 PNG image data, 70 x 70, 8-bit/color RGBA, non-interlaced
 1 PNG image data, 576 x 364, 8-bit/color RGBA, non-interlaced
 1 PNG image data, 4 x 4, 8-bit/color RGBA, non-interlaced
 1 PNG image data, 393 x 393, 8-bit/color RGBA, non-interlaced
 1 PNG image data, 34 x 38, 8-bit/color RGBA, non-interlaced
 1 PNG image data, 28 x 21, 8-bit/color RGBA, non-interlaced
 1 PNG image data, 256 x 256, 8-bit/color RGBA, non-interlaced
 1 PNG image data, 242 x 242, 4-bit colormap, non-interlaced
 1 PNG image data, 150 x 150, 2-bit colormap, non-interlaced
 1 PNG image data, 128 x 67, 8-bit/color RGBA, non-interlaced
 1 PNG image data, 114 x 255, 8-bit/color RGBA, non-interlaced
 1 PNG image data, 1024 x 1024, 4-bit colormap, non-interlaced
 1 PE32 executable (GUI) Intel 80386, for MS Windows
 1 PE32 executable (DLL) (console) Intel 80386, for MS Windows
 1 PDF document, version 1.3
 1 PC bitmap, Windows 3.x format, 55 x -58 x 32
 1 PC bitmap, Windows 3.x format, 164 x -314 x 32
 1 Non-ISO extended-ASCII text, with very long lines, with LF, NEL line terminators
 1 Non-ISO extended-ASCII text, with very long lines
 1 Non-ISO extended-ASCII text, with NEL line terminators
 1 Non-ISO extended-ASCII text, with LF, NEL line terminators
 1 Mac OS X icon, 67384 bytes, "info" type
 1 Mac OS X icon, 64483 bytes, "info" type
 1 Mac OS X icon, 61881 bytes, "info" type
 1 Mac OS X icon, 60031 bytes, "info" type
 1 Mac OS X icon, 59688 bytes, "info" type
 1 Mac OS X icon, 59381 bytes, "info" type
 1 Mac OS X icon, 59170 bytes, "info" type
 1 Mac OS X icon, 58531 bytes, "info" type
 1 Mac OS X icon, 57981 bytes, "info" type
 1 Mac OS X icon, 57873 bytes, "info" type
 1 Mac OS X icon, 57227 bytes, "info" type
 1 Mac OS X icon, 57134 bytes, "info" type
 1 Mac OS X icon, 56862 bytes, "info" type
 1 Mac OS X icon, 56473 bytes, "info" type
 1 Mac OS X icon, 56288 bytes, "info" type
 1 Mac OS X icon, 55805 bytes, "info" type
 1 Mac OS X icon, 55756 bytes, "info" type
 1 Mac OS X icon, 55607 bytes, "info" type
 1 Mac OS X icon, 55465 bytes, "info" type
 1 Mac OS X icon, 54090 bytes, "info" type
 1 Mac OS X icon, 53497 bytes, "info" type
 1 Mac OS X icon, 53290 bytes, "info" type
 1 Mac OS X icon, 52800 bytes, "info" type
 1 Mac OS X icon, 52375 bytes, "info" type
 1 Mac OS X icon, 51394 bytes, "info" type
 1 Mac OS X icon, 50848 bytes, "info" type
 1 Mac OS X icon, 49224 bytes, "s8mk" type
 1 Mac OS X icon, 48708 bytes, "info" type
 1 Mac OS X icon, 47448 bytes, "info" type
 1 MS Windows icon resource - 4 icons, 16x16, 32 bits/pixel, 32x32, 32 bits/pixel
 1 Little-endian UTF-16 Unicode text, with very long lines
 1 Little-endian UTF-16 Unicode text, with no line terminators
 1 Git pack, version 2, 706768 objects
 1 Git pack index, version 2
 1 Git index, version 2, 4402 entries
 1 Big-endian UTF-16 Unicode text, with very long lines
 1 Apache Avro version 101
 1 Algol 68 source text, UTF-8 Unicode text
$

例えば、gVisor のリポジトリの場合

$ git clone https://github.com/google/gvisor.git
$ cd gvisor
$ find ./ -type f -exec file {} \; | awk -F: '{ print $2 }' | sort | uniq -c | sort -nr
1161 ASCII text
 224 c program text, ASCII text
 65 C++ source text, ASCII text
 24 c program text, UTF-8 Unicode text
 11 POSIX shell script text executable, ASCII text
 10 Bourne-Again shell script text executable, ASCII text
 8 Perl5 module source text, ASCII text
 5 UTF-8 Unicode text
 4 ASCII text, with very long lines
 3 Python script text executable, ASCII text
 3 Algol 68 source text, ASCII text
 2 assembler source text, ASCII text
 1 makefile script text, ASCII text
 1 Ruby script text, ASCII text
 1 Perl script text executable
 1 PNG image data, 608 x 208, 8-bit/color RGBA, non-interlaced
 1 PNG image data, 1705 x 923, 8-bit/color RGBA, non-interlaced
 1 HTML document text, ASCII text
 1 Git pack, version 2, 22494 objects
 1 Git pack index, version 2
 1 Git index, version 2, 1507 entries
$

情報源/参考文献

補足

  • Office Documentの場合は、こんな感じに表示される
  • まさか設計書たぐいのOffice Documentが、Visual Studio Code のリポジトリのディレクトリにあるのかと勘ぐったが、存在しなかった。時代は変わった。よかった。
$ ls
example.doc example.docx example.ppt example.pptx example.xls example.xlsx
$
$ find . -type f -exec file {} \;
./example.docx: Microsoft Word 2007+
./example.doc: Composite Document File V2 Document, Little Endian, Os: MacOS, Version 13.10, Code page: 10001, Author: ~c j, Template: Normal.dotm, Last Saved By: ~c j, Revision Number: 2, Name of Creating Application: Microsoft Office Word, Create Time/Date: Sat Jun 15 12:07:00 2019, Last Saved Time/Date: Sat Jun 15 12:07:00 2019, Number of Pages: 1, Number of Words: 0, Number of Characters: 5, Security: 0
./example.ppt: Composite Document File V2 Document, Little Endian, Os: MacOS, Version 13.10, Code page: 10001, Title: ?????, Author: ~c j, Last Saved By: ~c j, Revision Number: 1, Name of Creating Application: Microsoft Macintosh PowerPoint, Total Editing Time: 00:43, Create Time/Date: Sat Jun 15 12:08:17 2019, Last Saved Time/Date: Sat Jun 15 12:09:01 2019, Number of Words: 6
./example.pptx: Microsoft PowerPoint 2007+
./example.xls: Composite Document File V2 Document, Little Endian, Os: MacOS, Version 13.10, Code page: 10001, Author: ~c j, Last Saved By: ~c j, Name of Creating Application: Microsoft Macintosh Excel, Create Time/Date: Sat Jun 15 12:09:32 2019, Last Saved Time/Date: Sat Jun 15 12:10:43 2019, Security: 0
./example.xlsx: Microsoft Excel 2007+
$

注意:一部のJavaScriptとTypeScriptが、 Algol 68 source と表示される。

例えば、Visual Studio Code のリポジトリの場合

$ git clone https://github.com/microsoft/vscode.git
$ cd vscode
$ find . -type f -exec file {} \; | grep Algol
./build/gulpfile.extensions.js: Algol 68 source text, ASCII text
./build/lib/util.js: Algol 68 source text, ASCII text
./build/lib/nls.js: Algol 68 source text, ASCII text
./build/lib/optimize.js: Algol 68 source text, ASCII text
./build/lib/nls.ts: Algol 68 source text, ASCII text
./build/lib/util.ts: Algol 68 source text, ASCII text
./build/lib/optimize.ts: Algol 68 source text, ASCII text
./src/vs/workbench/test/electron-browser/api/extHostWorkspace.test.ts: Algol 68 source text, ASCII text
./src/vs/workbench/contrib/preferences/browser/keybindingsEditor.ts: Algol 68 source text, UTF-8 Unicode text, with very long lines
./src/vs/workbench/contrib/webview/browser/webviewEditor.ts: Algol 68 source text, ASCII text
./src/vs/workbench/contrib/output/browser/outputPanel.ts: Algol 68 source text, ASCII text
./src/vs/workbench/contrib/welcome/walkThrough/browser/editor/editorWalkThrough.ts: Algol 68 source text, ASCII text
./src/vs/workbench/contrib/files/browser/editors/textFileEditor.ts: Algol 68 source text, ASCII text
./src/vs/workbench/contrib/files/browser/editors/binaryFileEditor.ts: Algol 68 source text, ASCII text
./src/vs/workbench/contrib/debug/common/debug.ts: Algol 68 source text, ASCII text
./src/vs/workbench/browser/parts/editor/textDiffEditor.ts: Algol 68 source text, ASCII text
./src/vs/workbench/browser/parts/editor/binaryEditor.ts: Algol 68 source text, ASCII text
./src/vs/workbench/browser/parts/editor/textResourceEditor.ts: Algol 68 source text, ASCII text
./src/vs/workbench/api/browser/mainThreadEditors.ts: Algol 68 source text, UTF-8 Unicode text, with very long lines
./src/vs/workbench/services/extensions/electron-browser/cachedExtensionScanner.ts: Algol 68 source text, ASCII text
./src/vs/workbench/services/search/test/common/replace.test.ts: Algol 68 source text, ASCII text
./src/vs/workbench/services/editor/test/browser/editorGroupsService.test.ts: Algol 68 source text, ASCII text
./src/vs/workbench/services/editor/browser/codeEditorService.ts: Algol 68 source text, ASCII text
./src/vs/base/test/common/json.test.ts: Algol 68 source text, UTF-8 Unicode text
./src/vs/base/browser/ui/tree/dataTree.ts: Algol 68 source text, ASCII text
$
0
1
1

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1