テキストデータから英単語を抽出するシェルスクリプト（ChatGPT）

Last updated at 2025-05-08Posted at 2025-05-08

こんにちは。
テキストデータから英単語を抽出するシェルスクリプトを作るよう ChatGPT へ依頼し、その回答を基にしてシェルスクリプトを編集し完成させました。満足できそうな水準かなと思いました。

ChatGPT への質問

「与えたテキストデータから自然な英単語のみを全て抽出するシェルスクリプト、および、そのヘルプメッセージを作ってください。」

ChatGPT からの回答（抜粋）

了解しました！

# 入力ファイルが指定されていればそれを、なければ標準入力から読み込む
INPUT="${1:-/dev/stdin}"
# システム辞書を使って照合（辞書に存在する単語のみ表示）
tr -cs 'A-Za-z' '\n' < "$INPUT" | grep -iFx -f /usr/share/dict/words

完成させたシェルスクリプト

上記回答を基にして編集を加え、完成させました。

extract_words.sh

#!/bin/sh

# constants
TRUE="true"
LF=$(printf '\n_');LF=${LF%_}

# messages
PROGNAME=$(basename "$0");
WARNING_HEAD="Warning: Type \"$PROGNAME --help\" for usage instructions."
USAGE="
Usage:
  $PROGNAME [INPUT_FILE]$LF
Description:
  Extracts all Latin-alphabet words (A–Z, a–z) from the input and filters out unnatural or invalid ones by comparing them against a system dictionary (/usr/share/dict/words).$LF
  If no input file is specified, standard input is used.$LF
Options:
  -h, --help    Show this help message and exit.$LF
Examples:
  echo \"Ths is an example text.\" | $PROGNAME | sort -u
"

# functions
append_warning_f() {
  [ -z "$warning" ] && warning="$1" || warning="$warning$LF""$1"
}

# parsing optional arguments
while [ $# -gt 0 ]; do
  case "$1" in
    -h|--help)   help_enabled=$TRUE;;
    -*)          append_warning_f "Unknown option: '$1'"; break;;
    *)           break;;
  esac
  shift
done

# confirmations
for file in "$@"; do [ -e $file ] || append_warning_f "file not found: '$file'"; done

# warn or print usage instruction
if [ -n "$warning" ]; then
  echo "$WARNING_HEAD" >&2
  echo "$warning" | sed -e 's/^/- /' >&2
  exit 1
elif [ -n "$help_enabled" ]; then
  echo "$USAGE" | fmt -w78
  exit
fi

# main
cat "$@" | tr -cs 'A-Za-z' '\n' | grep -iFx -f /usr/share/dict/words

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up