LoginSignup
8
6

More than 5 years have passed since last update.

PowerShellでgrepもどき

Last updated at Posted at 2017-07-02

目的

PowerShellでWindows用のテキスト検索ツールを作ってみました。正規表現のテスト・確認にも使えると思います。

環境

OS : Windows 7 以降
PowerShell: Version 3 以降

特長

正規表現を用いた複数ファイルの一括検索ができ、マッチした箇所がハイライトされます。複数行にわたるパターンの検索ができます(RegExS関数のヘッダーコメントに実行例として記載したようなC言語型のコメントも検索可能)。

備考

  • Write-Hostコマンドレットを多用しています(結果をパイプへ出力するような利用方法は想定していません)。6番目のストリームをファイルにリダイレクトして保存しても体裁が崩れます。ファイルにリダイレクト出来ない代わりに、HTMLファイルを出力するオプションがあります。
  • マッチ箇所の背景色は、PowerShellのコンソールの背景色と同じ DarkMagenta を指定(-bc DarkMagenta)すると目立たなくなります。
  • パイプ入力版はこちら

出力例

コンソール上での実行例

コンソール上での実行例

コード

PowerShellスクリプトですが、Windows バッチファイルとしても実行できる特殊な構造になっています(ダブルクリック起動も可能。PowerShellスクリプトとして実行する場合は拡張子を.ps1に変更してください)。複数行検索版は、実験的に.docx/.pptx/.xlsx やPDF形式のファイルにも対応しています。

zipファイル取得の際は、ダウンロード後にファイルのプロパティでブロックの解除をおこなってから解凍してください。

ファイルのプロパティ
RegExS_multiline_en.bat
<# : Batch Commands (PowerShell Comments) Start
@echo off & setlocal
rem
rem  Multi-line Regular Expression Text Search
rem
rem
rem     RegExS.bat  [<Regular Expression>] [<Target Folder>] [<Target Files>] 
rem                 [-e <Encoding>] [-g <Capture Group>]
rem                 [-h] [-i] [-n] [-r] [-s] [-x <Exclude Files>]
rem                 [-bc <Console-Color>] [-cc <Console-Color>] [-fc <Console-Color>]
rem                 [-Q/-Interactive]
rem
rem
rem     <Regular Expression>      text pattern in .NET regular expression
rem                               "^" and "$" characters match the beginning and the end of
rem                               lines respectively. In order to make "."(period) match to
rem                               "\n"(newline), prepend "(?s)" to the regular expression.
rem                               You'll be prompted for it if not specified in the command
rem                               line arguments or when this script is started by a double click.
rem     <Target Folder>           target folder in which search is performed
rem                               You'll be prompted to choose one in the popped-up selection
rem                               dialog box if not specified.
rem     <Target Files>            comma-delimited list of filenames to be searched
rem                               Wildcards may be used. The default value is "*". 
rem                               (e.g. "*.cpp, *.h")
rem     -e <Encoding>             character encoding for the files without BOMs(Byte Order Marks)
rem                               The default value is "Default"
rem     -g <Capture-Group>        capture-group name or number
rem                               The default value is "0".
rem     -h                        Search result is output to a HTML file.
rem     -i                        Uppercase/lowercase are ignored.
rem     -n                        Wide characters are converted into narrow ones prior to 
rem                               search. Useful when you don't want to distinguish them.
rem     -r                        Search is performed in subfolders recursively.
rem     -s                        Search is performed by a simple text match.
rem     -x <Exclude Files>        comma-delimited list of filenames to be excluded from search
rem                               Wildcards may be used. (e.g. "*.img, *.dat")
rem                               The following binary format files are excluded by default.
rem                               "*.exe, *.dll, *.lnk, *.zip, *.bmp, *.gif, *.jpg, *.png"
rem     -bc <ConsoleColor>        background color for matches
rem     -cc <ConsoleColor>        foreground color for capture-group matches
rem     -fc <ConsoleColor>        foreground color for matches
rem     -Q                        Interactive mode is forcedly enabled.
rem
rem     Remarks: Files with .docx/.pptx/.xlsx/.xlsm extensions are supported experimentally.
rem              In addition, PDF files can be searched if "itextsharp.dll" exists in the same
rem              folder as this script (except for rasterized texts and numbers/dates in Excel).
rem
rem     Created by earthdiver1
rem     Version 2.00
rem     Licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
rem
rem ----------------------------------------------------------------------------
echo %CMDCMDLINE% | findstr /i /c:"%~nx0" >NUL && set DC=1
rem The following is a preamble for converting a PowerShell script into polyglot
rem one that also runs as a batch script. Change the file extension to .ps1 and
rem run it from PowerShell console when debugging.
set BATCH_ARGS=%*
if defined BATCH_ARGS set BATCH_ARGS=%BATCH_ARGS:"=\"%
if defined BATCH_ARGS set BATCH_ARGS=%BATCH_ARGS:^^=^% 
set P_CMD=$DoubleClicked=0%DC%;$PSScriptRoot='%~dp0'.TrimEnd('\');$input^|^
&([ScriptBlock]::Create((${%~f0}^|Out-String)))
endlocal & PowerShell -NoProfile -Command "%P_CMD%" %BATCH_ARGS%
exit/b
rem ----------------------------------------------------------------------------
: Batch Commands (PowerShell Comments) End #>
#Requires -Version 3
param (
    [String]$Pattern,
    [String]$Dir,
    [String]$Include                              = "*",
    [Alias("e")][String]$Encoding                 = "Default",
    [Alias("x")][String]$Exclude                  = "",
    [Alias("g")][String]$Group                    = "0",
    [Alias("h")][Switch]$HtmlOutput               = $False,
    [Alias("i")][Switch]$IgnoreCase               = $False,
    [Alias("n")][Switch]$Narrow                   = $False,
    [Alias("r")][Switch]$Recurse                  = $False,
    [Alias("s")][Switch]$SimpleMatch              = $False,
    [Alias("bc")][ConsoleColor]$BackgroundColor   = "Blue",
    [Alias("cc")][ConsoleColor]$CapturegroupColor = "Red",
    [Alias("fc")][ConsoleColor]$ForegroundColor   = "White",

    [Alias("Q")][Switch]$Interactive = $False
)

#$DebugPreference = "Continue"

Function RegExS {
<#
    .SYNOPSIS
        Regular-Expression text Search

    .DESCRIPTION
        This function searches for text patterns in multiple text files.
        Matched characters are highlighted.

    .PARAMETER Pattern
        Specifies the text to find. Type a string or regular expression. 
        If you type a string, use the SimpleMatch parameter.

    .PARAMETER Dir
        Specifies the target folder in which search is performed.

    .PARAMETER Include
        Specifies the comma-delimited list of filenames to be searched.
        Wildcards may be used; e.g. "*.cpp, *.h". 
        The default value is "*".

    .PARAMETER Encoding
        Specifies the character encoding for the files without a BOM. (Alias: -e)
        The default value is "String".

    .PARAMETER BackgroundColor
        Specifies the background color for matches. (Alias: -bc)
        The default value is "Blue".

    .PARAMETER CapturegroupColor
        Specifies the foreground color for capture-group matches. (Alias: -cc)
        The default value is "Red".

    .PARAMETER ForegroundColor
        Specifies the foreground color for matches. (Alias: -fc)
        The default value is "White".

    .PARAMETER Group
        Specifies the name or number of capture group. (Alias: -g)
        The default value is "0".

    .PARAMETER HtmlOutput
        Redirects output to a HTML file. (Alias: -h)

    .PARAMETER IgnoreCase
        Makes matches case-insensitive. By default, matches are case-sensitive. (Alias: -i)

    .PARAMETER Narrow
        Converts wide characters into narrow ones. (Alias: -n)
        Useful when you don't want to distinguish between narrow and wide characters.

    .PARAMETER Recurse
        Searches all files in all subfolders. (Alias: -r)

    .PARAMETER SimpleMatch
        Uses a simple match rather than a regular expression match. (Alias: -s)

    .PARAMETER Exclude
        Specifies the comma-delimited list of filenames to be excluded. (Alias: -x)
        Wildcards may be used; e.g. "*.img, *.dat".
        Note that the following binary format files are excluded by default:
        *.exe, *.dll, *.lnk, *.zip, *.bmp, *.gif, *.jpg, *.png .

    .NOTES
        Author:   earthdiver1
        Version:  V2.00
        Licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

    .EXAMPLE
        ./RegExS.ps1 """(?:[^""\\\n]|\\.)*""|'(?:[^'\\\n]|\\.)*'|(/\*[\S\s]*?\*/|//.*$)" . "*.cpp,*.h" -g 1
        Comments in the C++ source files in the current work folder are searched.

    .REMARKS
        Files with .docx/.pptx/.xlsx/.xlsm extensions are supported experimentally.
        In addition, PDF files can be searched if "itextsharp.dll" exists in the same
        folder as this script. (Except for rasterized texts and numbers/dates in Excel)
#>
    param (
        [String]$Pattern,
        [String]$Dir,
        [String]$Include = "*",
        [ValidateSet("", "Unknown", "String", "Unicode", "Byte", "BigEndianUnicode", `
                     "UTF8", "UTF7", "UTF32", "Ascii", "Default", "Oem", "BigEndianUTF32")]
        [Alias("e")][String]$Encoding                 = "Default",
        [Alias("x")][String]$Exclude                  = "",
        [Alias("g")][String]$Group                    = "0",
        [Alias("h")][Switch]$HtmlOutput,
        [Alias("i")][Switch]$IgnoreCase,
        [Alias("n")][Switch]$Narrow,
        [Alias("r")][Switch]$Recurse,
        [Alias("s")][Switch]$SimpleMatch,
        [Alias("bc")][ConsoleColor]$BackgroundColor   = "Blue",
        [Alias("cc")][ConsoleColor]$CapturegroupColor = "Red",
        [Alias("fc")][ConsoleColor]$ForegroundColor   = "White"
    )
############ Edit Here ############
    $BinaryFiles = "*.exe, *.dll, *.lnk, *.zip, *.bmp, *.gif, *.jpg, *.png"
    $HtmlFile = "$($script:PSScriptRoot)\RegExS_Output.htm"
#   $HtmlFile = "$($script:PSScriptRoot)\RegExS_Output_$((Get-Date).ToString('yyyyMMddHHmm')).htm"
###################################
    if (-not $Pattern) { Write-Host "'Pattern' not specified" -Fore Red; Get-Help RegExS; return }
    if (-not $Dir)     { Write-Host "'Dir' not specified."    -Fore Red; Get-Help RegExS; return }
    $Dir = Convert-Path -LiteralPath $Dir
    if (-not $Dir)     { Write-Host "'Dir' does not exist."   -Fore Red; Get-Help RegExS; return }
    [Array]$IsContainer = Get-Item $Dir | %{ $_.PSIsContainer }
    if ((-not $Recurse) -and $IsContainer.Length -eq 1 -and $IsContainer[0]) { $Dir += "\*" }
    if ($SimpleMatch) {
        $Pattern = [regex]::Escape($Pattern) 
    } else {
        $Pattern = "(?m)" + $Pattern
    }
    if ($IgnoreCase) { $Pattern = "(?i)" + $Pattern }
    if ($HtmlOutput) { HtmlHeader }
    [String[]]$IncludeList = $Include -Split "," | %{ $_.Trim() }
    [String[]]$ExcludeList = ($Exclude + "," + $BinaryFiles) -Split "," | %{ $_.Trim() } | ?{ $_ }
    $global:ErrorView = "CategoryView"
    if ($IncludeList.Length -eq 1) { # "-Filter" is faster than "-Include"
        $files = Get-ChildItem $Dir -Filter $Include      -Exclude $ExcludeList -Recurse:$Recurse -File
    } else {
        $files = Get-ChildItem $Dir -Include $IncludeList -Exclude $ExcludeList -Recurse:$Recurse -File
    }
    $global:ErrorView = "NormalView"
    $ErrorActionPreference = "Stop"
    trap [System.Exception] {
        Write-Host "Sytem error occurred." -Fore Red
        Write-Host $Error[0].ToString() $Error[0].InvocationInfo.PositionMessage
        if ($HtmlOutput) {
            HtmlFooter
            try { 
                [Text.Encoding]::UTF8.GetBytes($script:Html.ToString()) | Set-Content -Path $HtmlFile -Encoding Byte # UTF-8N
                [void]$script:Html.Clear()
            } catch [System.Exception] {}
        }
        return
    }
    [Int]$Nmfile = 0
    [Int]$Nmatch = 0
    [Int]$Nmline = 0
    [Int]$Nmchar = 0
    if ($files) {
        :LOOP foreach ($file in $files) {
            Write-Debug "RegExS: $($file.FullName)"
            try {
                switch ($file.Extension) {
                    ".docx" { $io = ReadDocxFile $file.FullName ; break }
                    ".pptx" { $io = ReadPptxFile $file.FullName ; break }
                    ".xlsm" { $io = ReadXlsxFile $file.FullName ; break }
                    ".xlsx" { $io = ReadXlsxFile $file.FullName ; break }
                    ".pdf"  { $io = ReadPdfFile  $file.FullName ; break }
                    Default {
                        $enc = GetEncodingFromBOM $file
                        if (-not $enc) { 
                            if (IsBinary $file) {
                                Write-Host "Skipping binary format file: $($file.FullName)." -Fore Green
                                continue LOOP
                            }
                            $enc = $Encoding 
                        }
                        $io = (Get-Content $file.FullName -Encoding $enc -Raw) -Replace "\r",""
                    }
                }
            } catch [System.UnauthorizedAccessException], `
                    [System.Management.Automation.ItemNotFoundException], `
                    [System.IO.IOException] {
                Write-Host "$($Error[0].Exception.Message) Continuing processing." -Fore Red
                if ($HtmlOutput) {
                    [void]$script:Html.Append("<span class=`"y`">$(Htmlify $Error[0].Exception.Message) Continuing processing.</span>`n")
                }
                continue
            }
            if ($Narrow) {
                Add-Type -AssemblyName "Microsoft.VisualBasic"
                $io = [Microsoft.VisualBasic.Strings]::StrConv($io,[Microsoft.VisualBasic.VbStrConv]::Narrow)
                $Pattern = [Microsoft.VisualBasic.Strings]::StrConv($Pattern,[Microsoft.VisualBasic.VbStrConv]::Narrow)
            }
            [Array]$Matches = Select-String -InputObject $io -Pattern $Pattern -CaseSensitive -AllMatches `
                            | %{ $_.Matches } | ?{ $_.Groups[$Group].Success }
            if ($Matches) {
                $Nmfile++
                Write-Host $file.FullName -Fore Yellow
                $bol = [Array](Select-String -InputObject $io -Pattern '(?m)^' -AllMatches | %{ $_.Matches } | %{ $_.Index }) `
                     + ($io.Length + 1)
                [Int]$nl = -1
                if (-not $HtmlOutput) {
                    if ($Group -eq "0") {
                        for ([Int]$i=0; $i -lt $Matches.Length; $i++) {
                            $MatchIndex  = $Matches[$i].Groups[0].Index
                            $MatchLength = $Matches[$i].Groups[0].Length
                            $MatchString = $Matches[$i].Groups[0].Value -Replace "`n","`n`t"
                            $NextMatch   = $Matches[$i+1]
                            if ($bol[$nl+1] -le $MatchIndex) {
                                while ($bol[$nl+1] -le $MatchIndex) { $nl++ }
                                $index = $bol[$nl]
                                if ($index -ge $io.Length) { break }
                                $Nmline++
                                if ($file.Extension -eq ".xlsx" -or $file.Extension -eq ".xlsm") {
                                    Write-Host $("{0,5}:" -F $script:cell[$nl]) -NoNewline
                                } else {
                                    Write-Host $("{0,5}:" -F ($nl+1)) -NoNewline
                                }
                            }
                            $Nmatch++
                            if ($MatchLength -gt 0) {
                                $Nmchar += $MatchLength
                                Write-Host $io.SubString($index, $MatchIndex - $index) -NoNewline
                                Write-Host $MatchString -Back $BackgroundColor -Fore $ForegroundColor -NoNewline
                                $index = $MatchIndex + $MatchLength
                                while ($bol[$nl+1] -le $index) {
                                    $nl++
                                    $Nmline++
                                }
                            }
                            if ($NextMatch -and $NextMatch.Index -lt $bol[$nl+1]) { continue }
                            $eol = $bol[$nl+1] - 1
                            if ($eol -eq $io.Length) { $eol-- }
                            if ($io[$eol] -eq "`n")  { $eol-- }
                            if ($index -le $eol) {
                                Write-Host $io.SubString($index, $eol+1 - $index)
                            } else {
                                Write-Host
                            }
                        }
                    } else {
                        for ([Int]$i=0; $i -lt $Matches.Length; $i++) {
                            $Match0Index  = $Matches[$i].Groups[0].Index
                            $Match0Length = $Matches[$i].Groups[0].Length
                            $MatchIndex   = $Matches[$i].Groups[$Group].Index
                            $MatchLength  = $Matches[$i].Groups[$Group].Length
                            $MatchString  = $Matches[$i].Groups[$Group].Value -Replace "`n","`n`t"
                            $NextMatch    = $Matches[$i+1]
                            if ($bol[$nl+1] -le $Match0Index) {
                                while ($bol[$nl+1] -le $Match0Index) { $nl++ }
                                $index = $bol[$nl]
                                if ($index -ge $io.Length) { break }
                                $Nmline++
                                if ($file.Extension -eq ".xlsx" -or $file.Extension -eq ".xlsm") {
                                    Write-Host $("{0,5}:" -F $script:cell[$nl]) -NoNewline
                                } else {
                                    Write-Host $("{0,5}:" -F ($nl+1)) -NoNewline
                                }
                            }
                            $Nmatch++
                            if ($Match0Length -gt 0) {
                                $Nmchar += $MatchLength
                                Write-Host $io.SubString($index, $Match0Index - $index) -NoNewline
                                Write-Host $($io.SubString($Match0Index, $MatchIndex - $Match0Index) -Replace "`n","`n`t") `
                                           -Back $BackgroundColor -Fore $ForegroundColor -NoNewline
                                Write-Host $MatchString -Back $BackgroundColor -Fore $CapturegroupColor -NoNewline 
                                $index  = $Match0Index + $Match0Length
                                $index0 = $MatchIndex + $MatchLength
                                Write-Host $($io.SubString($index0, $index - $index0) -Replace "`n","`n`t") `
                                           -Back $BackgroundColor -Fore $ForegroundColor -NoNewline
                                while ($bol[$nl+1] -le $index) {
                                    $nl++
                                    $Nmline++
                                }
                            }
                            if ($NextMatch -and $NextMatch.Index -lt $bol[$nl+1]) { continue }
                            $eol = $bol[$nl+1] - 1
                            if ($eol -eq $io.Length) { $eol-- }
                            if ($io[$eol] -eq "`n")  { $eol-- }
                            if ($index -le $eol) {
                                Write-Host $io.SubString($index, $eol+1 - $index)
                            } else {
                                Write-Host
                            }
                        }
                    }
                } else {
                    [void]$script:Html.Append("<a href=`"file:///$($file.FullName)`" type=`"text`" class=`"y`">$($file.FullName)</a>`n")
                    if ($Group -eq "0") {
                        for ([Int]$i=0; $i -lt $Matches.Length; $i++) {
                            $MatchIndex  = $Matches[$i].Groups[0].Index
                            $MatchLength = $Matches[$i].Groups[0].Length
                            $MatchString = $Matches[$i].Groups[0].Value -Replace "`n","`n`t"
                            $NextMatch   = $Matches[$i+1]
                            if ($bol[$nl+1] -le $MatchIndex) {
                                while ($bol[$nl+1] -le $MatchIndex) { $nl++ }
                                $index = $bol[$nl]
                                if ($index -ge $io.Length) { break }
                                $Nmline++
                                if ($file.Extension -eq ".xlsx" -or $file.Extension -eq ".xlsm") {
                                    [void]$script:Html.Append(("{0,5}:" -F $script:cell[$nl]))
                                } else {
                                    [void]$script:Html.Append(("{0,5}:" -F ($nl+1)))
                                }
                            }
                            $Nmatch++
                            if ($MatchLength -gt 0) {
                                $Nmchar += $MatchLength
                                [void]$script:Html.Append($(Htmlify $io.SubString($index, $MatchIndex - $index)))
                                [void]$script:Html.Append("<span class=`"fc`">$(Htmlify $MatchString)</span>")
                                $index = $MatchIndex + $MatchLength
                                while ($bol[$nl+1] -le $index) {
                                    $nl++
                                    $Nmline++
                                }
                            }
                            if ($NextMatch -and $NextMatch.Index -lt $bol[$nl+1]) { continue }
                            $eol = $bol[$nl+1] - 1
                            if ($eol -eq $io.Length) { $eol-- }
                            if ($io[$eol] -eq "`n")  { $eol-- }
                            if ($index -le $eol) {
                                [void]$script:Html.Append("$(Htmlify $io.SubString($index, $eol+1 - $index))`n")
                            } else {
                                [void]$script:Html.Append("`n")
                            }
                        }
                    } else {
                        for ([Int]$i=0; $i -lt $Matches.Length; $i++) {
                            $Match0Index  = $Matches[$i].Groups[0].Index
                            $Match0Length = $Matches[$i].Groups[0].Length
                            $MatchIndex   = $Matches[$i].Groups[$Group].Index
                            $MatchLength  = $Matches[$i].Groups[$Group].Length
                            $MatchString  = $Matches[$i].Groups[$Group].Value -Replace "`n","`n`t"
                            $NextMatch    = $Matches[$i+1]
                            if ($bol[$nl+1] -le $Match0Index) {
                                while ($bol[$nl+1] -le $Match0Index) { $nl++ }
                                $index = $bol[$nl]
                                if ($index -ge $io.Length) { break }
                                $Nmline++
                                if ($file.Extension -eq ".xlsx" -or $file.Extension -eq ".xlsm") {
                                    [void]$script:Html.Append(("{0,5}:" -F $script:cell[$nl]))
                                } else {
                                    [void]$script:Html.Append(("{0,5}:" -F ($nl+1)))
                                }
                            }
                            $Nmatch++
                            if ($Match0Length -gt 0) {
                                $Nmchar += $MatchLength
                                [void]$script:Html.Append((Htmlify $io.SubString($index, $Match0Index - $index)))
                                [void]$script:Html.Append("<span class=`"fc`">$(Htmlify $io.SubString($Match0Index, $MatchIndex - $Match0Index))</span>")
                                [void]$script:Html.Append("<span class=`"cc`">$(Htmlify $MatchString)</span>")
                                $index = $Match0Index + $Match0Length
                                $index0 = $MatchIndex + $MatchLength
                                [void]$script:Html.Append("<span class=`"fc`">$(Htmlify $io.SubString($index0, $index - $index0))</span>")
                                while ($bol[$nl+1] -le $index) {
                                    $nl++
                                    $Nmline++
                                }
                            }
                            if ($NextMatch -and $NextMatch.Index -lt $bol[$nl+1]) { continue }
                            $eol = $bol[$nl+1] - 1
                            if ($eol -eq $io.Length) { $eol-- }
                            if ($io[$eol] -eq "`n")  { $eol-- }
                            if ($index -le $eol) {
                                [void]$script:Html.Append("$(Htmlify $io.SubString($index, $eol+1 - $index))`n")
                            } else {
                                [void]$script:Html.Append("`n")
                            }
                        }
                    }
                }
            }
            if ($file.Extension -eq ".xlsx" -or $file.Extension -eq ".xlsm") {
                $script:cell.Clear()
            }
        }
    }
    Write-Host "$Nmfile file, $Nmline line, $Nmatch string, $Nmchar character matches found." -Fore Green
    if ($HtmlOutput) {
        [void]$script:Html.Append("<span class=`"g`">$Nmfile file, $Nmline line, $Nmatch string, $Nmchar character matches found.</span>`n")
        HtmlFooter
        try {
            [Text.Encoding]::UTF8.GetBytes($script:Html.ToString()) | Set-Content -Path $HtmlFile -Encoding Byte # UTF-8N
            [void]$script:Html.Clear()
            Write-Host "The result has been output to $HtmlFile." -Fore Green
            Start-Process -FilePath "file:///$HtmlFile"
        } catch [System.Exception] {
            Write-Host "failed to output a HTML file." -Fore Red
            Write-Host $Error[0].Exception.Message
        }
    }
}

Function IsBinary($File) {
    if ($File.Length -lt 2)        { return $False }
    if ($File.Length -gt 20000000) { return $True  }
    $bytes = Get-Content $File.FullName -ReadCount 4096 -TotalCount 4096 -Encoding Byte
    [Int]$Nbo=0
    [Int]$Nbe=0
    [Int]$Nzo=0
    [Int]$Nze=0
    for ([Int]$i=0; $i -lt $bytes.Length; $i+=2) {
        $Nbo++
        if ($bytes[$i] -eq 0) { $Nzo++ }
    }
    for ([Int]$i=1; $i -lt $bytes.Length; $i+=2) {
        $Nbe++
        if ($bytes[$i] -eq 0) { $Nze++ }
    }
    if (($Nzo+$Nze -gt 0) -and ([System.Math]::Abs($Nzo/$Nbo-$Nze/$Nbe)/($Nbo+$Nbe) -lt 0.1)) { return $True }
    Write-Debug "IsBinary: $($file.Name) $($Nzo+$Nze), $([System.Math]::Abs($Nzo/$Nbo-$Nze/$Nbe)/($Nbo+$Nbe))"
    return $False
}

Function GetEncodingFromBOM($File) {
    $bytes = Get-Content $File.FullName -ReadCount 4 -TotalCount 4 -Encoding Byte
    $string = ($bytes | %{ "{0:X2}" -F $_ }) -Join ""
    switch -Regex ($string) {
       "^EFBBBF"              { $enc="UTF8"             ; break }
       "^FFFE0000"            { $enc="UTF32"            ; break }
       "^FFFE"                { $enc="Unicode"          ; break }
       "^0000FEFF"            { $enc="BigEndianUTF32"   ; break }
       "^FEFF"                { $enc="BigEndianUnicode" ; break }
       "^2B2F76(38|39|2B|2F)" { $enc="UTF7"             ; break }
       Default                { $enc="" }
    }
    Write-Debug "GetEncodingFromBOM: $($File.Name) $($string) $($enc)"
    return $enc
}

Function ReadDocxFile($DocxFile) {
    Add-Type -AssemblyName WindowsBase
    $file      = (Get-Childitem -Path $DocxFile).FullName
    $package   = [System.IO.Packaging.Package]::Open($file, [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read)
    $parts     = $package.GetParts() | %{ $_ }
    $document  = $parts | ?{ $_.Uri.OriginalString -eq "/word/document.xml"  }
    $footnotes = $parts | ?{ $_.Uri.OriginalString -eq "/word/footnotes.xml" }
    $endnotes  = $parts | ?{ $_.Uri.OriginalString -eq "/word/endnotes.xml"  }
    $comments  = $parts | ?{ $_.Uri.OriginalString -eq "/word/comments.xml"  }
    $enc       = [System.Text.Encoding]::UTF8
    $sr        = New-Object System.IO.StreamReader $document.GetStream(),$enc
    $text      = New-Object System.Text.StringBuilder
    [void]$text.Append($sr.ReadToEnd())
    $sr.Close()
    if ($footnotes) {
        $sr = New-Object System.IO.StreamReader $footnotes.GetStream(),$enc
        [void]$text.Append($sr.ReadToEnd())
        $sr.Close()
    }
    if ($endnotes) {
        $sr = New-Object System.IO.StreamReader $endnotes.GetStream(),$enc
        [void]$text.Append($sr.ReadToEnd())
        $sr.Close()
    }
    if ($comments) {
        $sr = New-Object System.IO.StreamReader $comments.GetStream(),$enc
        [void]$text.Append("</w:r></w:p>" + $sr.ReadToEnd())
        $sr.Close()
    }
    $package.Close()
    $t = $text.ToString()
    [void]$text.Clear()
    $t = $t -Replace "\r?\n",""
    $t = $t -Replace "</w:r></w:p></w:tc><w:tc>"," "
    $t = $t -Replace "\s+"," "
    $t = $t -Replace "</w:r></w:p>","`n"
    $t = $t -Replace "<[^>]+>",""
    $t = $t -Replace "&amp;","&"
    $t = $t -Replace "&lt;","<"
    $t = $t -Replace "&gt;",">"
    $t = $t -Replace "(?m)^ ?\r?\n",""
    return $t
}

Function ReadPptxFile($PptxFile) {
    Add-Type -AssemblyName WindowsBase
    $file    = (Get-Childitem -Path $PptxFile).FullName
    $package = [System.IO.Packaging.Package]::Open($file, [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read)
    $parts   = $package.GetParts() | %{ $_ }
    $enc     = [System.Text.Encoding]::UTF8
    $text    = New-Object System.Text.StringBuilder
    for ([Int]$i = 1; $i -le 999; $i++) {
        $slide = $parts | ?{ $_.Uri.OriginalString -eq "/ppt/slides/slide$i.xml" }
        if (-not $slide) { break }
        $sr  = New-Object System.IO.StreamReader $slide.GetStream(),$enc
        $tmp = $sr.ReadToEnd()
        $sr.Close()
        $sliderels = $parts | ?{ $_.Uri.OriginalString -eq "/ppt/slides/_rels/slide$i.xml.rels" }
        if ($sliderels) {
            $sr  = New-Object System.IO.StreamReader $sliderels.GetStream(),$enc
            $rel = [xml]$sr.ReadToEnd() | %{ $_.Relationships.Relationship } | ?{ $_.Type -Match "notesSlide" }
            $sr.Close()
            if ($rel) {
                $nfile = $([io.path]::GetFileName($rel.Target))
                $note  = $parts | ?{ $_.Uri.OriginalString -eq "/ppt/notesSlides/$nfile" }
                $sr    = New-Object System.IO.StreamReader $note.GetStream(),$enc
                $tmp  += " " + $sr.ReadToEnd()
                $sr.Close()
            }
        }
        $tmp = $tmp -Replace "\r?\n",""
        $tmp = $tmp -Replace "\s+"," "
        [void]$text.Append($tmp + "`n")
    }
    $package.Close()
    $t = $text.ToString()
    [void]$text.Clear()
    $t = $t -Replace "<[^>]+>",""
    $t = $t -Replace "&amp;","&"
    $t = $t -Replace "&lt;","<"
    $t = $t -Replace "&gt;",">"
    return $t
}

Function ReadXlsxFile($XlsxFile) {
    Add-Type -AssemblyName WindowsBase
    $enc     = [System.Text.Encoding]::UTF8
    $file    = (Get-Childitem -Path $XlsxFile).FullName
    $package = [System.IO.Packaging.Package]::Open($file, [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read)
    $parts   = $package.GetParts() | %{ $_ }
    $text    = New-Object System.Text.StringBuilder
    $script:Cell = New-Object 'System.Collections.Generic.List[System.String]' 100000
    $sst = $parts | ?{ $_.Uri.OriginalString -eq "/xl/sharedStrings.xml" }
    if ($sst) {
        $strings = New-Object 'System.Collections.Generic.List[System.String]' 5000
        $sr = New-Object System.IO.StreamReader $sst.GetStream(),$enc
        $si = [xml]$sr.ReadToEnd() | %{ $_.sst.si }
        foreach ($s in $si) {
            if ($s.t -and $s.t.GetType().Name -eq "String") {
                $tmp = $s.t
            } elseif ($s.t."#text") {
                $tmp = $s.t."#text"
            } elseif ($s.r) {
                $tmp = ""
                foreach ($r in $s.r) {
                    if ($r.t -and $r.t.GetType().Name -eq "String") {
                        $tmp += $r.t
                    } elseif ($r.t."#text") { 
                        $tmp += $r.t."#text"
                    }
                }
            }
            $tmp = $tmp -Replace "\r?\n",""
            $tmp = $tmp -Replace "\s+"," "
            [void]$strings.Add($tmp)
        }
        $sr.Close()
    }
    for ([Int]$i = 1; $i -le 999; $i++) {
        $sheet = $parts | ?{ $_.Uri.OriginalString -eq "/xl/worksheets/sheet$i.xml" }
        if (-not $sheet) { break }
        $sr   = New-Object System.IO.StreamReader $sheet.GetStream(),$enc
        $rows = [xml]$sr.ReadToEnd() | %{ $_.worksheet.sheetdata.row }
        foreach ($row in $rows) {
            foreach ($col in $row.c) {
                switch ($col.t) {
                    "s"         { [void]$text.Append($strings[$col.v] + "`n") 
                                  [void]$script:Cell.Add("S$($i.ToString())$($col.r)") }
                    "inlineStr" { if ($col.is.t -and $col.is.t.GetType().Name -eq "String") {
                                      $tmp = $col.is.t
                                  } elseif ($col.is.t."#text") {
                                      $tmp = $col.is.t."#text"
                                  } elseif ($col.is.r) {
                                      $tmp = ""
                                      foreach ($r in $col.is.r) {
                                          if ($r.t -and $r.t.GetType().Name -eq "String") {
                                              $tmp += $r.t
                                          } elseif ($r.t."#text") { 
                                              $tmp += $r.t."#text"
                                          }
                                      }
                                  }
                                  $tmp = $tmp -Replace "\r?\n",""
                                  $tmp = $tmp -Replace "\s+"," "
                                  [void]$text.Append($tmp + "`n")
                                  [void]$script:Cell.Add("S$($i.ToString())$($col.r)") }
                }
            }
        }
        $sr.Close()
    }
    if ($strings) { $strings.Clear() }
    $NumSheets = ($parts | %{ $_.Uri.OriginalString -Match "/xl/worksheets/sheet[0-9]+.xml" }).Length
    for ([Int]$i = 1; $i -le $NumSheets; $i++) {
        $sheetrels = $parts | ?{ $_.Uri.OriginalString -eq "/xl/worksheets/_rels/sheet$i.xml.rels" }
        if (-not $sheetrels) { continue }
        $sr  = New-Object System.IO.StreamReader $sheetrels.GetStream(),$enc
        $rel = [xml]$sr.ReadToEnd() | %{ $_.Relationships.Relationship } | ?{ $_.Type -Match "comments" }
        $sr.Close()
        if (-not $rel) { continue }
        $cfile = $([io.path]::GetFileName($rel.Target))
        $comment = $parts | ?{ $_.Uri.OriginalString -eq "/xl/$cfile" }
        $sr = New-Object System.IO.StreamReader $comment.GetStream(),$enc
        $cc = [xml]$sr.ReadToEnd() | %{ $_.comments.commentList.comment }
        foreach ($c in $cc) {
            $tmp = ""
            foreach ($r in $c.text.r) {
                if ($r.t -and $r.t.GetType().Name -eq "String") {
                    $tmp += $r.t
                } elseif ($r.t."#text") {
                    $tmp += $r.t."#text"
                }
            }
            $tmp = $tmp -Replace "\r?\n",""
            $tmp = $tmp -Replace "\s+"," "
            [void]$text.Append($tmp + "`n")
            [void]$script:Cell.Add("S$($i.ToString())$($c.ref)C")
        }
        $sr.Close()
    }
    $package.Close()
    $t = $text.ToString()
    [void]$text.Clear()
    return $t
}

Function ReadPdfFile($PdfFile) {
    if (-not (Test-Path $script:PSScriptRoot\itextsharp.dll)) { 
        Write-Host "Unable to search in $PdfFile because isharptext.dll doesn't exist." -Fore Red
        return 
    }
    Add-Type -Path  $script:PSScriptRoot\itextsharp.dll
    $reader = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $PdfFile
    $text   = New-Object System.Text.StringBuilder
    for ([Int]$i = 1; $i -le $reader.NumberOfPages; $i++) {
        $strategy = New-Object iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy
        [void]$text.Append([iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($reader, $i, $strategy))
        remove-variable strategy
    }
    $reader.Close()
    $t = $text.ToString()
    [void]$text.Clear()
    return $t
}

Function RGBColor([ConsoleColor]$Color) {
    switch -Exact ($Color) {
        "Black"       { return "#000000" }
        "DarkBlue"    { return "#000080" }
        "DarkGreen"   { return "#008000" }
        "DarkCyan"    { return "#008080" }
        "DarkRed"     { return "#800000" }
        "DarkMagenta" { return "#012456" }
        "DarkYellow"  { return "#EEEDF0" }
        "Gray"        { return "#C0C0C0" }
        "DarkGray"    { return "#808080" }
        "Blue"        { return "#0000FF" }
        "Green"       { return "#008000" }
        "Cyan"        { return "#00FFFF" }
        "Red"         { return "#FF0000" }
        "Magenta"     { return "#FF00FF" }
        "Yellow"      { return "#FFFF00" }
        "White"       { return "#FFFFFF" }
        Default       { return ""        }
    }
}

Function HtmlHeader() {
    $script:Html = New-Object System.Text.StringBuilder
    [void]$script:Html.Append("<!DOCTYPE html>`n")
    [void]$script:Html.Append("<html lang=`"ja`">`n")        # <- You may want to change this
    [void]$script:Html.Append("<meta charset=`"UTF-8`">`n")
    [void]$script:Html.Append("<head>`n")
    [void]$script:Html.Append("<title>RegExS Output</title>`n")
    [void]$script:Html.Append("<style type=`"text/css`">`n")
    [void]$script:Html.Append("body {color:#ffffff; background-color:$(RGBColor("DarkMagenta"))}`n")
    [void]$script:Html.Append("pre{white-space:pre-wrap}`n")
    [void]$script:Html.Append(".fc {color:$(RGBColor($ForegroundColor)); background-color:$(RGBColor($BackgroundColor))}`n")
    [void]$script:Html.Append(".cc {color:$(RGBColor($CapturegroupColor)); background-color:$(RGBColor($BackgroundColor))}`n")
    [void]$script:Html.Append(".g {color:lime}`n")
    [void]$script:Html.Append(".y {color:yellow}`n")
    [void]$script:Html.Append("</style>`n")
    [void]$script:Html.Append("</head>`n<body>`n<pre>`n")
}

Function HtmlFooter() {
    [void]$script:Html.Append("</pre>`n</body>`n</html>")
}

Function Htmlify($t) {
    $t = $t -Replace "&", "&amp;"
    $t = $t -Replace ">", "&gt;"
    $t = $t -Replace "<", "&lt;"
    $t = $t -Replace """", "&quot;"
    return $t
}

Function Pause_and_Exit() {
    Write-Host "Press any key to exit . . ." -Fore Green
    $Host.UI.RawUI.FlushInputBuffer()
    $Host.UI.RawUI.ReadKey("NoEcho,IncludeKeyUp") | Out-Null
    exit
}

Function Main() {
    if ($DoubleClicked) { $Interactive = $True  }
    if ($Interactive) {
        if ($Pattern) {
            Write-Host "Enter a search text. [$Pattern]:" -NoNewline -Fore Green
            $Answer = ""
            try { $Answer = Read-Host } catch [System.Exception] {}
            if ($Answer) { $Pattern = $Answer }
        } else {
            Write-Host "Enter a search text.:" -NoNewline -Fore Green
            try { $Pattern = Read-Host } catch [System.Exception] {}
            if (-not $Pattern) { Pause_and_Exit }
        }

        Write-Host "Enter a capture-group name or number. [$Group]:" -NoNewline -Fore Green
        $Answer = ""
        try { $Answer = (Read-Host).Trim() } catch [System.Exception] {}
        if ($Answer) { $Group = $Answer }

        Write-Host "Select a target folder." -Fore Green
        Add-Type -AssemblyName System.Windows.Forms
        $fbd = New-Object System.Windows.Forms.FolderBrowserDialog
        $fbd.ShowNewFolderButton = $false
        $fbd.Description = "Select a target folder."
        if ($Dir -and (Test-Path -LiteralPath $Dir)) {
            $fbd.SelectedPath = Convert-Path -LiteralPath $Dir
        } else {
            $fbd.SelectedPath = [string]$PWD
        }
        $Result = $fbd.ShowDialog((New-Object System.Windows.Forms.Form -Property @{TopMost = $true}))
        if ($Result -eq [System.Windows.Forms.DialogResult]::OK) { 
            $Dir = $fbd.SelectedPath 
        } else {
            Write-Host "invalid input" -Fore Red; Pause_and_Exit
        }
        $fbd.Dispose()
        #Begin Get-WindowFocus
        Add-Type @"
            using System;
            using System.Runtime.InteropServices;
            public class SFW {
                [DllImport("user32.dll")]
                [return: MarshalAs(UnmanagedType.Bool)]
                public static extern bool SetForegroundWindow(IntPtr hWnd);
            }
"@
        $PPID=$PID
        For ([Int]$i=0; $i -lt 2; $i++) {
            $PPID=(Get-WmiObject Win32_Process -Filter "ProcessID=$PPID").ParentProcessID
            try {
                $WindowTitle  = (Get-Process -ID $PPID -ErrorAction SilentlyContinue).MainWindowTitle
                $WindowHandle = (Get-Process -ID $PPID -ErrorAction SilentlyContinue).MainWindowHandle
                if ($WindowTitle) { 
                    [void][SFW]::SetForegroundWindow($WindowHandle)
                    break
                }
            } catch [System.Exception] { break }
        }
        #End Get-WindowFocus

        Write-Host "Include sub-folders? Y/N [$(if($Recurse){"Y"}else{"N"})]:" -NoNewline -Fore Green
        try { $Answer = (Read-Host).Trim().ToUpper() } catch [System.Exception] {}
        switch ($Answer) {
               "Y" { $Recurse = $True  }
               "N" { $Recurse = $False }
               ""  { }
          default  { Write-Host "invalid input" -Fore Red; Pause_and_Exit }
        }

        Write-Host "Enter file names. (wildcard allowed) [$Include]:" -NoNewline -Fore Green
        $Answer = ""
        try { $Answer = (Read-Host).Trim() } catch [System.Exception] {}
        if ($Answer) { $Include = $Answer }

        Write-Host "Specify the other options? Y/N [N]:" -NoNewline -Fore Green
        $Answer = ""
        try { $Answer = (Read-Host).Trim().ToUpper() } catch [System.Exception] {}
        if ($Answer -eq "Y") {
            Write-Host "Output the search result to a HTML file? Y/N [$(if($HtmlOutput){"Y"}else{"N"})]:" -NoNewline -Fore Green
            try { $Answer = (Read-Host).Trim().ToUpper() } catch [System.Exception] {}
            switch ($Answer) {
                   "Y" { $HtmlOutput = $True  }
                   "N" { $HtmlOutput = $False }
                   ""  { }
              default  { Write-Host "invalid input" -Fore Red; Pause_and_Exit }
            }

            Write-Host "Disable regular expression search? Y/N [$(if($SimpleMatch){"Y"}else{"N"})]:" -NoNewline -Fore Green
            try { $Answer = (Read-Host).Trim().ToUpper() } catch [System.Exception] {}
            switch ($Answer) {
                   "Y" { $SimpleMatch = $True  }
                   "N" { $SimpleMatch = $False }
                   ""  { }
              default  { Write-Host "invalid input" -Fore Red; Pause_and_Exit }
            }

            Write-Host "Ignore upper/lower cases? Y/N [$(if($IgnoreCase){"Y"}else{"N"})]:" -NoNewline -Fore Green
            try { $Answer = (Read-Host).Trim().ToUpper() } catch [System.Exception] {}
            switch ($Answer) {
                   "Y" { $IgnoreCase = $True  }
                   "N" { $IgnoreCase = $False }
                   ""  { }
              default  { Write-Host "invalid input" -Fore Red; Pause_and_Exit }
            }

            Write-Host "Convert wide characters into narrow ones before search? Y/N [$(if($Narrow){"Y"}else{"N"})]:" -NoNewline -Fore Green
            try { $Answer = (Read-Host).Trim().ToUpper() } catch [System.Exception] {}
            switch ($Answer) {
                   "Y" { $Narrow = $True  }
                   "N" { $Narrow = $False }
                   ""  { }
              default  { Write-Host "invalid input" -Fore Red; Pause_and_Exit }
            }

            Write-Host "Enter file names to be excluded. (wildcard allowed) [$Exclude]:" -NoNewline -Fore Green
            $Answer = ""
            try { $Answer = (Read-Host).Trim() } catch [System.Exception] {}
            if ($Answer) { $Exclude = $Answer }

            Write-Host "Enter a character encoding for the files without BOMs. [$Encoding]:" -NoNewline -Fore Green
            $Answer = ""
            try { $Answer = (Read-Host).Trim() } catch [System.Exception] {}
            if ($Answer) { 
                $Encoding = $Answer 
                $EncodingList = @("unkown", "string", "unicode", "byte", "bigendianunicode", `
                                  "utf8", "utf7", "utf32", "ascii", "default", "oem", "bigendianutf32")
                if (-not ($EncodingList -Contains $Encoding.ToLower())) {
                    Write-Host "invalid input" -Fore Red; Pause_and_Exit
                }
            }
        } elseif ($Answer -ne "N" -and $Answer -ne "") {
            Write-Host "invalid input" -Fore Red; Pause_and_Exit
        }
    }

    RegExS $Pattern $Dir $Include -e $Encoding -g $Group -h:$HtmlOutput -i:$IgnoreCase -n:$Narrow -r:$Recurse `
                                  -s:$SimpleMatch -x $Exclude `
                                  -bc $BackgroundColor -cc $CapturegroupColor -fc $ForegroundColor

    if ($Interactive) { Pause_and_Exit }
}

Main
8
6
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
8
6