#目的
PowerShellでWindows用のテキスト検索ツールを作ってみました。正規表現のテスト・確認にも使えると思います。
#環境
OS : Windows 7 以降
PowerShell: Version 3 以降
#特長
正規表現を用いた複数ファイルの一括検索ができ、マッチした箇所がハイライトされます。複数行にわたるパターンの検索ができます(RegExS関数のヘッダーコメントに実行例として記載したようなC言語型のコメントも検索可能)。
#備考
- Write-Hostコマンドレットを多用しています(結果をパイプへ出力するような利用方法は想定していません)。6番目のストリームをファイルにリダイレクトして保存しても体裁が崩れます。ファイルにリダイレクト出来ない代わりに、HTMLファイルを出力するオプションがあります。
- マッチ箇所の背景色は、PowerShellのコンソールの背景色と同じ DarkMagenta を指定(-bc DarkMagenta)すると目立たなくなります。
- パイプ入力版はこちら
#出力例
コンソール上での実行例
#コード
PowerShellスクリプトですが、Windows バッチファイルとしても実行できる特殊な構造になっています(ダブルクリック起動も可能。PowerShellスクリプトとして実行する場合は拡張子を.ps1に変更してください)。複数行検索版は、実験的に.docx/.pptx/.xlsx やPDF形式のファイルにも対応しています。
- RegExS.zip (下記4ファイル同梱)
- RegExS_multiline.bat (日本語)
- RegExS_multiline_en.bat (English)
- RegExS.bat (日本語)
- RegExS_en.bat (English)
zipファイル取得の際は、ダウンロード後にファイルのプロパティでブロックの解除をおこなってから解凍してください。
RegExS_multiline_en.bat
<# : Batch Commands (PowerShell Comments) Start
@echo off & setlocal
rem
rem Multi-line Regular Expression Text Search
rem
rem
rem RegExS.bat [<Regular Expression>] [<Target Folder>] [<Target Files>]
rem [-e <Encoding>] [-g <Capture Group>]
rem [-h] [-i] [-n] [-r] [-s] [-x <Exclude Files>]
rem [-bc <Console-Color>] [-cc <Console-Color>] [-fc <Console-Color>]
rem [-Q/-Interactive]
rem
rem
rem <Regular Expression> text pattern in .NET regular expression
rem "^" and "$" characters match the beginning and the end of
rem lines respectively. In order to make "."(period) match to
rem "\n"(newline), prepend "(?s)" to the regular expression.
rem You'll be prompted for it if not specified in the command
rem line arguments or when this script is started by a double click.
rem <Target Folder> target folder in which search is performed
rem You'll be prompted to choose one in the popped-up selection
rem dialog box if not specified.
rem <Target Files> comma-delimited list of filenames to be searched
rem Wildcards may be used. The default value is "*".
rem (e.g. "*.cpp, *.h")
rem -e <Encoding> character encoding for the files without BOMs(Byte Order Marks)
rem The default value is "Default"
rem -g <Capture-Group> capture-group name or number
rem The default value is "0".
rem -h Search result is output to a HTML file.
rem -i Uppercase/lowercase are ignored.
rem -n Wide characters are converted into narrow ones prior to
rem search. Useful when you don't want to distinguish them.
rem -r Search is performed in subfolders recursively.
rem -s Search is performed by a simple text match.
rem -x <Exclude Files> comma-delimited list of filenames to be excluded from search
rem Wildcards may be used. (e.g. "*.img, *.dat")
rem The following binary format files are excluded by default.
rem "*.exe, *.dll, *.lnk, *.zip, *.bmp, *.gif, *.jpg, *.png"
rem -bc <ConsoleColor> background color for matches
rem -cc <ConsoleColor> foreground color for capture-group matches
rem -fc <ConsoleColor> foreground color for matches
rem -Q Interactive mode is forcedly enabled.
rem
rem Remarks: Files with .docx/.pptx/.xlsx/.xlsm extensions are supported experimentally.
rem In addition, PDF files can be searched if "itextsharp.dll" exists in the same
rem folder as this script (except for rasterized texts and numbers/dates in Excel).
rem
rem Created by earthdiver1
rem Version 2.00
rem Licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
rem
rem ----------------------------------------------------------------------------
echo %CMDCMDLINE% | findstr /i /c:"%~nx0" >NUL && set DC=1
rem The following is a preamble for converting a PowerShell script into polyglot
rem one that also runs as a batch script. Change the file extension to .ps1 and
rem run it from PowerShell console when debugging.
set BATCH_ARGS=%*
if defined BATCH_ARGS set BATCH_ARGS=%BATCH_ARGS:"=\"%
if defined BATCH_ARGS set BATCH_ARGS=%BATCH_ARGS:^^=^%
set P_CMD=$DoubleClicked=0%DC%;$PSScriptRoot='%~dp0'.TrimEnd('\');$input^|^
&([ScriptBlock]::Create((${%~f0}^|Out-String)))
endlocal & PowerShell -NoProfile -Command "%P_CMD%" %BATCH_ARGS%
exit/b
rem ----------------------------------------------------------------------------
: Batch Commands (PowerShell Comments) End #>
#Requires -Version 3
param (
[String]$Pattern,
[String]$Dir,
[String]$Include = "*",
[Alias("e")][String]$Encoding = "Default",
[Alias("x")][String]$Exclude = "",
[Alias("g")][String]$Group = "0",
[Alias("h")][Switch]$HtmlOutput = $False,
[Alias("i")][Switch]$IgnoreCase = $False,
[Alias("n")][Switch]$Narrow = $False,
[Alias("r")][Switch]$Recurse = $False,
[Alias("s")][Switch]$SimpleMatch = $False,
[Alias("bc")][ConsoleColor]$BackgroundColor = "Blue",
[Alias("cc")][ConsoleColor]$CapturegroupColor = "Red",
[Alias("fc")][ConsoleColor]$ForegroundColor = "White",
[Alias("Q")][Switch]$Interactive = $False
)
#$DebugPreference = "Continue"
Function RegExS {
<#
.SYNOPSIS
Regular-Expression text Search
.DESCRIPTION
This function searches for text patterns in multiple text files.
Matched characters are highlighted.
.PARAMETER Pattern
Specifies the text to find. Type a string or regular expression.
If you type a string, use the SimpleMatch parameter.
.PARAMETER Dir
Specifies the target folder in which search is performed.
.PARAMETER Include
Specifies the comma-delimited list of filenames to be searched.
Wildcards may be used; e.g. "*.cpp, *.h".
The default value is "*".
.PARAMETER Encoding
Specifies the character encoding for the files without a BOM. (Alias: -e)
The default value is "String".
.PARAMETER BackgroundColor
Specifies the background color for matches. (Alias: -bc)
The default value is "Blue".
.PARAMETER CapturegroupColor
Specifies the foreground color for capture-group matches. (Alias: -cc)
The default value is "Red".
.PARAMETER ForegroundColor
Specifies the foreground color for matches. (Alias: -fc)
The default value is "White".
.PARAMETER Group
Specifies the name or number of capture group. (Alias: -g)
The default value is "0".
.PARAMETER HtmlOutput
Redirects output to a HTML file. (Alias: -h)
.PARAMETER IgnoreCase
Makes matches case-insensitive. By default, matches are case-sensitive. (Alias: -i)
.PARAMETER Narrow
Converts wide characters into narrow ones. (Alias: -n)
Useful when you don't want to distinguish between narrow and wide characters.
.PARAMETER Recurse
Searches all files in all subfolders. (Alias: -r)
.PARAMETER SimpleMatch
Uses a simple match rather than a regular expression match. (Alias: -s)
.PARAMETER Exclude
Specifies the comma-delimited list of filenames to be excluded. (Alias: -x)
Wildcards may be used; e.g. "*.img, *.dat".
Note that the following binary format files are excluded by default:
*.exe, *.dll, *.lnk, *.zip, *.bmp, *.gif, *.jpg, *.png .
.NOTES
Author: earthdiver1
Version: V2.00
Licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
.EXAMPLE
./RegExS.ps1 """(?:[^""\\\n]|\\.)*""|'(?:[^'\\\n]|\\.)*'|(/\*[\S\s]*?\*/|//.*$)" . "*.cpp,*.h" -g 1
Comments in the C++ source files in the current work folder are searched.
.REMARKS
Files with .docx/.pptx/.xlsx/.xlsm extensions are supported experimentally.
In addition, PDF files can be searched if "itextsharp.dll" exists in the same
folder as this script. (Except for rasterized texts and numbers/dates in Excel)
#>
param (
[String]$Pattern,
[String]$Dir,
[String]$Include = "*",
[ValidateSet("", "Unknown", "String", "Unicode", "Byte", "BigEndianUnicode", `
"UTF8", "UTF7", "UTF32", "Ascii", "Default", "Oem", "BigEndianUTF32")]
[Alias("e")][String]$Encoding = "Default",
[Alias("x")][String]$Exclude = "",
[Alias("g")][String]$Group = "0",
[Alias("h")][Switch]$HtmlOutput,
[Alias("i")][Switch]$IgnoreCase,
[Alias("n")][Switch]$Narrow,
[Alias("r")][Switch]$Recurse,
[Alias("s")][Switch]$SimpleMatch,
[Alias("bc")][ConsoleColor]$BackgroundColor = "Blue",
[Alias("cc")][ConsoleColor]$CapturegroupColor = "Red",
[Alias("fc")][ConsoleColor]$ForegroundColor = "White"
)
############ Edit Here ############
$BinaryFiles = "*.exe, *.dll, *.lnk, *.zip, *.bmp, *.gif, *.jpg, *.png"
$HtmlFile = "$($script:PSScriptRoot)\RegExS_Output.htm"
# $HtmlFile = "$($script:PSScriptRoot)\RegExS_Output_$((Get-Date).ToString('yyyyMMddHHmm')).htm"
###################################
if (-not $Pattern) { Write-Host "'Pattern' not specified" -Fore Red; Get-Help RegExS; return }
if (-not $Dir) { Write-Host "'Dir' not specified." -Fore Red; Get-Help RegExS; return }
$Dir = Convert-Path -LiteralPath $Dir
if (-not $Dir) { Write-Host "'Dir' does not exist." -Fore Red; Get-Help RegExS; return }
[Array]$IsContainer = Get-Item $Dir | %{ $_.PSIsContainer }
if ((-not $Recurse) -and $IsContainer.Length -eq 1 -and $IsContainer[0]) { $Dir += "\*" }
if ($SimpleMatch) {
$Pattern = [regex]::Escape($Pattern)
} else {
$Pattern = "(?m)" + $Pattern
}
if ($IgnoreCase) { $Pattern = "(?i)" + $Pattern }
if ($HtmlOutput) { HtmlHeader }
[String[]]$IncludeList = $Include -Split "," | %{ $_.Trim() }
[String[]]$ExcludeList = ($Exclude + "," + $BinaryFiles) -Split "," | %{ $_.Trim() } | ?{ $_ }
$global:ErrorView = "CategoryView"
if ($IncludeList.Length -eq 1) { # "-Filter" is faster than "-Include"
$files = Get-ChildItem $Dir -Filter $Include -Exclude $ExcludeList -Recurse:$Recurse -File
} else {
$files = Get-ChildItem $Dir -Include $IncludeList -Exclude $ExcludeList -Recurse:$Recurse -File
}
$global:ErrorView = "NormalView"
$ErrorActionPreference = "Stop"
trap [System.Exception] {
Write-Host "Sytem error occurred." -Fore Red
Write-Host $Error[0].ToString() $Error[0].InvocationInfo.PositionMessage
if ($HtmlOutput) {
HtmlFooter
try {
[Text.Encoding]::UTF8.GetBytes($script:Html.ToString()) | Set-Content -Path $HtmlFile -Encoding Byte # UTF-8N
[void]$script:Html.Clear()
} catch [System.Exception] {}
}
return
}
[Int]$Nmfile = 0
[Int]$Nmatch = 0
[Int]$Nmline = 0
[Int]$Nmchar = 0
if ($files) {
:LOOP foreach ($file in $files) {
Write-Debug "RegExS: $($file.FullName)"
try {
switch ($file.Extension) {
".docx" { $io = ReadDocxFile $file.FullName ; break }
".pptx" { $io = ReadPptxFile $file.FullName ; break }
".xlsm" { $io = ReadXlsxFile $file.FullName ; break }
".xlsx" { $io = ReadXlsxFile $file.FullName ; break }
".pdf" { $io = ReadPdfFile $file.FullName ; break }
Default {
$enc = GetEncodingFromBOM $file
if (-not $enc) {
if (IsBinary $file) {
Write-Host "Skipping binary format file: $($file.FullName)." -Fore Green
continue LOOP
}
$enc = $Encoding
}
$io = (Get-Content $file.FullName -Encoding $enc -Raw) -Replace "\r",""
}
}
} catch [System.UnauthorizedAccessException], `
[System.Management.Automation.ItemNotFoundException], `
[System.IO.IOException] {
Write-Host "$($Error[0].Exception.Message) Continuing processing." -Fore Red
if ($HtmlOutput) {
[void]$script:Html.Append("<span class=`"y`">$(Htmlify $Error[0].Exception.Message) Continuing processing.</span>`n")
}
continue
}
if ($Narrow) {
Add-Type -AssemblyName "Microsoft.VisualBasic"
$io = [Microsoft.VisualBasic.Strings]::StrConv($io,[Microsoft.VisualBasic.VbStrConv]::Narrow)
$Pattern = [Microsoft.VisualBasic.Strings]::StrConv($Pattern,[Microsoft.VisualBasic.VbStrConv]::Narrow)
}
[Array]$Matches = Select-String -InputObject $io -Pattern $Pattern -CaseSensitive -AllMatches `
| %{ $_.Matches } | ?{ $_.Groups[$Group].Success }
if ($Matches) {
$Nmfile++
Write-Host $file.FullName -Fore Yellow
$bol = [Array](Select-String -InputObject $io -Pattern '(?m)^' -AllMatches | %{ $_.Matches } | %{ $_.Index }) `
+ ($io.Length + 1)
[Int]$nl = -1
if (-not $HtmlOutput) {
if ($Group -eq "0") {
for ([Int]$i=0; $i -lt $Matches.Length; $i++) {
$MatchIndex = $Matches[$i].Groups[0].Index
$MatchLength = $Matches[$i].Groups[0].Length
$MatchString = $Matches[$i].Groups[0].Value -Replace "`n","`n`t"
$NextMatch = $Matches[$i+1]
if ($bol[$nl+1] -le $MatchIndex) {
while ($bol[$nl+1] -le $MatchIndex) { $nl++ }
$index = $bol[$nl]
if ($index -ge $io.Length) { break }
$Nmline++
if ($file.Extension -eq ".xlsx" -or $file.Extension -eq ".xlsm") {
Write-Host $("{0,5}:" -F $script:cell[$nl]) -NoNewline
} else {
Write-Host $("{0,5}:" -F ($nl+1)) -NoNewline
}
}
$Nmatch++
if ($MatchLength -gt 0) {
$Nmchar += $MatchLength
Write-Host $io.SubString($index, $MatchIndex - $index) -NoNewline
Write-Host $MatchString -Back $BackgroundColor -Fore $ForegroundColor -NoNewline
$index = $MatchIndex + $MatchLength
while ($bol[$nl+1] -le $index) {
$nl++
$Nmline++
}
}
if ($NextMatch -and $NextMatch.Index -lt $bol[$nl+1]) { continue }
$eol = $bol[$nl+1] - 1
if ($eol -eq $io.Length) { $eol-- }
if ($io[$eol] -eq "`n") { $eol-- }
if ($index -le $eol) {
Write-Host $io.SubString($index, $eol+1 - $index)
} else {
Write-Host
}
}
} else {
for ([Int]$i=0; $i -lt $Matches.Length; $i++) {
$Match0Index = $Matches[$i].Groups[0].Index
$Match0Length = $Matches[$i].Groups[0].Length
$MatchIndex = $Matches[$i].Groups[$Group].Index
$MatchLength = $Matches[$i].Groups[$Group].Length
$MatchString = $Matches[$i].Groups[$Group].Value -Replace "`n","`n`t"
$NextMatch = $Matches[$i+1]
if ($bol[$nl+1] -le $Match0Index) {
while ($bol[$nl+1] -le $Match0Index) { $nl++ }
$index = $bol[$nl]
if ($index -ge $io.Length) { break }
$Nmline++
if ($file.Extension -eq ".xlsx" -or $file.Extension -eq ".xlsm") {
Write-Host $("{0,5}:" -F $script:cell[$nl]) -NoNewline
} else {
Write-Host $("{0,5}:" -F ($nl+1)) -NoNewline
}
}
$Nmatch++
if ($Match0Length -gt 0) {
$Nmchar += $MatchLength
Write-Host $io.SubString($index, $Match0Index - $index) -NoNewline
Write-Host $($io.SubString($Match0Index, $MatchIndex - $Match0Index) -Replace "`n","`n`t") `
-Back $BackgroundColor -Fore $ForegroundColor -NoNewline
Write-Host $MatchString -Back $BackgroundColor -Fore $CapturegroupColor -NoNewline
$index = $Match0Index + $Match0Length
$index0 = $MatchIndex + $MatchLength
Write-Host $($io.SubString($index0, $index - $index0) -Replace "`n","`n`t") `
-Back $BackgroundColor -Fore $ForegroundColor -NoNewline
while ($bol[$nl+1] -le $index) {
$nl++
$Nmline++
}
}
if ($NextMatch -and $NextMatch.Index -lt $bol[$nl+1]) { continue }
$eol = $bol[$nl+1] - 1
if ($eol -eq $io.Length) { $eol-- }
if ($io[$eol] -eq "`n") { $eol-- }
if ($index -le $eol) {
Write-Host $io.SubString($index, $eol+1 - $index)
} else {
Write-Host
}
}
}
} else {
[void]$script:Html.Append("<a href=`"file:///$($file.FullName)`" type=`"text`" class=`"y`">$($file.FullName)</a>`n")
if ($Group -eq "0") {
for ([Int]$i=0; $i -lt $Matches.Length; $i++) {
$MatchIndex = $Matches[$i].Groups[0].Index
$MatchLength = $Matches[$i].Groups[0].Length
$MatchString = $Matches[$i].Groups[0].Value -Replace "`n","`n`t"
$NextMatch = $Matches[$i+1]
if ($bol[$nl+1] -le $MatchIndex) {
while ($bol[$nl+1] -le $MatchIndex) { $nl++ }
$index = $bol[$nl]
if ($index -ge $io.Length) { break }
$Nmline++
if ($file.Extension -eq ".xlsx" -or $file.Extension -eq ".xlsm") {
[void]$script:Html.Append(("{0,5}:" -F $script:cell[$nl]))
} else {
[void]$script:Html.Append(("{0,5}:" -F ($nl+1)))
}
}
$Nmatch++
if ($MatchLength -gt 0) {
$Nmchar += $MatchLength
[void]$script:Html.Append($(Htmlify $io.SubString($index, $MatchIndex - $index)))
[void]$script:Html.Append("<span class=`"fc`">$(Htmlify $MatchString)</span>")
$index = $MatchIndex + $MatchLength
while ($bol[$nl+1] -le $index) {
$nl++
$Nmline++
}
}
if ($NextMatch -and $NextMatch.Index -lt $bol[$nl+1]) { continue }
$eol = $bol[$nl+1] - 1
if ($eol -eq $io.Length) { $eol-- }
if ($io[$eol] -eq "`n") { $eol-- }
if ($index -le $eol) {
[void]$script:Html.Append("$(Htmlify $io.SubString($index, $eol+1 - $index))`n")
} else {
[void]$script:Html.Append("`n")
}
}
} else {
for ([Int]$i=0; $i -lt $Matches.Length; $i++) {
$Match0Index = $Matches[$i].Groups[0].Index
$Match0Length = $Matches[$i].Groups[0].Length
$MatchIndex = $Matches[$i].Groups[$Group].Index
$MatchLength = $Matches[$i].Groups[$Group].Length
$MatchString = $Matches[$i].Groups[$Group].Value -Replace "`n","`n`t"
$NextMatch = $Matches[$i+1]
if ($bol[$nl+1] -le $Match0Index) {
while ($bol[$nl+1] -le $Match0Index) { $nl++ }
$index = $bol[$nl]
if ($index -ge $io.Length) { break }
$Nmline++
if ($file.Extension -eq ".xlsx" -or $file.Extension -eq ".xlsm") {
[void]$script:Html.Append(("{0,5}:" -F $script:cell[$nl]))
} else {
[void]$script:Html.Append(("{0,5}:" -F ($nl+1)))
}
}
$Nmatch++
if ($Match0Length -gt 0) {
$Nmchar += $MatchLength
[void]$script:Html.Append((Htmlify $io.SubString($index, $Match0Index - $index)))
[void]$script:Html.Append("<span class=`"fc`">$(Htmlify $io.SubString($Match0Index, $MatchIndex - $Match0Index))</span>")
[void]$script:Html.Append("<span class=`"cc`">$(Htmlify $MatchString)</span>")
$index = $Match0Index + $Match0Length
$index0 = $MatchIndex + $MatchLength
[void]$script:Html.Append("<span class=`"fc`">$(Htmlify $io.SubString($index0, $index - $index0))</span>")
while ($bol[$nl+1] -le $index) {
$nl++
$Nmline++
}
}
if ($NextMatch -and $NextMatch.Index -lt $bol[$nl+1]) { continue }
$eol = $bol[$nl+1] - 1
if ($eol -eq $io.Length) { $eol-- }
if ($io[$eol] -eq "`n") { $eol-- }
if ($index -le $eol) {
[void]$script:Html.Append("$(Htmlify $io.SubString($index, $eol+1 - $index))`n")
} else {
[void]$script:Html.Append("`n")
}
}
}
}
}
if ($file.Extension -eq ".xlsx" -or $file.Extension -eq ".xlsm") {
$script:cell.Clear()
}
}
}
Write-Host "$Nmfile file, $Nmline line, $Nmatch string, $Nmchar character matches found." -Fore Green
if ($HtmlOutput) {
[void]$script:Html.Append("<span class=`"g`">$Nmfile file, $Nmline line, $Nmatch string, $Nmchar character matches found.</span>`n")
HtmlFooter
try {
[Text.Encoding]::UTF8.GetBytes($script:Html.ToString()) | Set-Content -Path $HtmlFile -Encoding Byte # UTF-8N
[void]$script:Html.Clear()
Write-Host "The result has been output to $HtmlFile." -Fore Green
Start-Process -FilePath "file:///$HtmlFile"
} catch [System.Exception] {
Write-Host "failed to output a HTML file." -Fore Red
Write-Host $Error[0].Exception.Message
}
}
}
Function IsBinary($File) {
if ($File.Length -lt 2) { return $False }
if ($File.Length -gt 20000000) { return $True }
$bytes = Get-Content $File.FullName -ReadCount 4096 -TotalCount 4096 -Encoding Byte
[Int]$Nbo=0
[Int]$Nbe=0
[Int]$Nzo=0
[Int]$Nze=0
for ([Int]$i=0; $i -lt $bytes.Length; $i+=2) {
$Nbo++
if ($bytes[$i] -eq 0) { $Nzo++ }
}
for ([Int]$i=1; $i -lt $bytes.Length; $i+=2) {
$Nbe++
if ($bytes[$i] -eq 0) { $Nze++ }
}
if (($Nzo+$Nze -gt 0) -and ([System.Math]::Abs($Nzo/$Nbo-$Nze/$Nbe)/($Nbo+$Nbe) -lt 0.1)) { return $True }
Write-Debug "IsBinary: $($file.Name) $($Nzo+$Nze), $([System.Math]::Abs($Nzo/$Nbo-$Nze/$Nbe)/($Nbo+$Nbe))"
return $False
}
Function GetEncodingFromBOM($File) {
$bytes = Get-Content $File.FullName -ReadCount 4 -TotalCount 4 -Encoding Byte
$string = ($bytes | %{ "{0:X2}" -F $_ }) -Join ""
switch -Regex ($string) {
"^EFBBBF" { $enc="UTF8" ; break }
"^FFFE0000" { $enc="UTF32" ; break }
"^FFFE" { $enc="Unicode" ; break }
"^0000FEFF" { $enc="BigEndianUTF32" ; break }
"^FEFF" { $enc="BigEndianUnicode" ; break }
"^2B2F76(38|39|2B|2F)" { $enc="UTF7" ; break }
Default { $enc="" }
}
Write-Debug "GetEncodingFromBOM: $($File.Name) $($string) $($enc)"
return $enc
}
Function ReadDocxFile($DocxFile) {
Add-Type -AssemblyName WindowsBase
$file = (Get-Childitem -Path $DocxFile).FullName
$package = [System.IO.Packaging.Package]::Open($file, [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read)
$parts = $package.GetParts() | %{ $_ }
$document = $parts | ?{ $_.Uri.OriginalString -eq "/word/document.xml" }
$footnotes = $parts | ?{ $_.Uri.OriginalString -eq "/word/footnotes.xml" }
$endnotes = $parts | ?{ $_.Uri.OriginalString -eq "/word/endnotes.xml" }
$comments = $parts | ?{ $_.Uri.OriginalString -eq "/word/comments.xml" }
$enc = [System.Text.Encoding]::UTF8
$sr = New-Object System.IO.StreamReader $document.GetStream(),$enc
$text = New-Object System.Text.StringBuilder
[void]$text.Append($sr.ReadToEnd())
$sr.Close()
if ($footnotes) {
$sr = New-Object System.IO.StreamReader $footnotes.GetStream(),$enc
[void]$text.Append($sr.ReadToEnd())
$sr.Close()
}
if ($endnotes) {
$sr = New-Object System.IO.StreamReader $endnotes.GetStream(),$enc
[void]$text.Append($sr.ReadToEnd())
$sr.Close()
}
if ($comments) {
$sr = New-Object System.IO.StreamReader $comments.GetStream(),$enc
[void]$text.Append("</w:r></w:p>" + $sr.ReadToEnd())
$sr.Close()
}
$package.Close()
$t = $text.ToString()
[void]$text.Clear()
$t = $t -Replace "\r?\n",""
$t = $t -Replace "</w:r></w:p></w:tc><w:tc>"," "
$t = $t -Replace "\s+"," "
$t = $t -Replace "</w:r></w:p>","`n"
$t = $t -Replace "<[^>]+>",""
$t = $t -Replace "&","&"
$t = $t -Replace "<","<"
$t = $t -Replace ">",">"
$t = $t -Replace "(?m)^ ?\r?\n",""
return $t
}
Function ReadPptxFile($PptxFile) {
Add-Type -AssemblyName WindowsBase
$file = (Get-Childitem -Path $PptxFile).FullName
$package = [System.IO.Packaging.Package]::Open($file, [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read)
$parts = $package.GetParts() | %{ $_ }
$enc = [System.Text.Encoding]::UTF8
$text = New-Object System.Text.StringBuilder
for ([Int]$i = 1; $i -le 999; $i++) {
$slide = $parts | ?{ $_.Uri.OriginalString -eq "/ppt/slides/slide$i.xml" }
if (-not $slide) { break }
$sr = New-Object System.IO.StreamReader $slide.GetStream(),$enc
$tmp = $sr.ReadToEnd()
$sr.Close()
$sliderels = $parts | ?{ $_.Uri.OriginalString -eq "/ppt/slides/_rels/slide$i.xml.rels" }
if ($sliderels) {
$sr = New-Object System.IO.StreamReader $sliderels.GetStream(),$enc
$rel = [xml]$sr.ReadToEnd() | %{ $_.Relationships.Relationship } | ?{ $_.Type -Match "notesSlide" }
$sr.Close()
if ($rel) {
$nfile = $([io.path]::GetFileName($rel.Target))
$note = $parts | ?{ $_.Uri.OriginalString -eq "/ppt/notesSlides/$nfile" }
$sr = New-Object System.IO.StreamReader $note.GetStream(),$enc
$tmp += " " + $sr.ReadToEnd()
$sr.Close()
}
}
$tmp = $tmp -Replace "\r?\n",""
$tmp = $tmp -Replace "\s+"," "
[void]$text.Append($tmp + "`n")
}
$package.Close()
$t = $text.ToString()
[void]$text.Clear()
$t = $t -Replace "<[^>]+>",""
$t = $t -Replace "&","&"
$t = $t -Replace "<","<"
$t = $t -Replace ">",">"
return $t
}
Function ReadXlsxFile($XlsxFile) {
Add-Type -AssemblyName WindowsBase
$enc = [System.Text.Encoding]::UTF8
$file = (Get-Childitem -Path $XlsxFile).FullName
$package = [System.IO.Packaging.Package]::Open($file, [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read)
$parts = $package.GetParts() | %{ $_ }
$text = New-Object System.Text.StringBuilder
$script:Cell = New-Object 'System.Collections.Generic.List[System.String]' 100000
$sst = $parts | ?{ $_.Uri.OriginalString -eq "/xl/sharedStrings.xml" }
if ($sst) {
$strings = New-Object 'System.Collections.Generic.List[System.String]' 5000
$sr = New-Object System.IO.StreamReader $sst.GetStream(),$enc
$si = [xml]$sr.ReadToEnd() | %{ $_.sst.si }
foreach ($s in $si) {
if ($s.t -and $s.t.GetType().Name -eq "String") {
$tmp = $s.t
} elseif ($s.t."#text") {
$tmp = $s.t."#text"
} elseif ($s.r) {
$tmp = ""
foreach ($r in $s.r) {
if ($r.t -and $r.t.GetType().Name -eq "String") {
$tmp += $r.t
} elseif ($r.t."#text") {
$tmp += $r.t."#text"
}
}
}
$tmp = $tmp -Replace "\r?\n",""
$tmp = $tmp -Replace "\s+"," "
[void]$strings.Add($tmp)
}
$sr.Close()
}
for ([Int]$i = 1; $i -le 999; $i++) {
$sheet = $parts | ?{ $_.Uri.OriginalString -eq "/xl/worksheets/sheet$i.xml" }
if (-not $sheet) { break }
$sr = New-Object System.IO.StreamReader $sheet.GetStream(),$enc
$rows = [xml]$sr.ReadToEnd() | %{ $_.worksheet.sheetdata.row }
foreach ($row in $rows) {
foreach ($col in $row.c) {
switch ($col.t) {
"s" { [void]$text.Append($strings[$col.v] + "`n")
[void]$script:Cell.Add("S$($i.ToString())$($col.r)") }
"inlineStr" { if ($col.is.t -and $col.is.t.GetType().Name -eq "String") {
$tmp = $col.is.t
} elseif ($col.is.t."#text") {
$tmp = $col.is.t."#text"
} elseif ($col.is.r) {
$tmp = ""
foreach ($r in $col.is.r) {
if ($r.t -and $r.t.GetType().Name -eq "String") {
$tmp += $r.t
} elseif ($r.t."#text") {
$tmp += $r.t."#text"
}
}
}
$tmp = $tmp -Replace "\r?\n",""
$tmp = $tmp -Replace "\s+"," "
[void]$text.Append($tmp + "`n")
[void]$script:Cell.Add("S$($i.ToString())$($col.r)") }
}
}
}
$sr.Close()
}
if ($strings) { $strings.Clear() }
$NumSheets = ($parts | %{ $_.Uri.OriginalString -Match "/xl/worksheets/sheet[0-9]+.xml" }).Length
for ([Int]$i = 1; $i -le $NumSheets; $i++) {
$sheetrels = $parts | ?{ $_.Uri.OriginalString -eq "/xl/worksheets/_rels/sheet$i.xml.rels" }
if (-not $sheetrels) { continue }
$sr = New-Object System.IO.StreamReader $sheetrels.GetStream(),$enc
$rel = [xml]$sr.ReadToEnd() | %{ $_.Relationships.Relationship } | ?{ $_.Type -Match "comments" }
$sr.Close()
if (-not $rel) { continue }
$cfile = $([io.path]::GetFileName($rel.Target))
$comment = $parts | ?{ $_.Uri.OriginalString -eq "/xl/$cfile" }
$sr = New-Object System.IO.StreamReader $comment.GetStream(),$enc
$cc = [xml]$sr.ReadToEnd() | %{ $_.comments.commentList.comment }
foreach ($c in $cc) {
$tmp = ""
foreach ($r in $c.text.r) {
if ($r.t -and $r.t.GetType().Name -eq "String") {
$tmp += $r.t
} elseif ($r.t."#text") {
$tmp += $r.t."#text"
}
}
$tmp = $tmp -Replace "\r?\n",""
$tmp = $tmp -Replace "\s+"," "
[void]$text.Append($tmp + "`n")
[void]$script:Cell.Add("S$($i.ToString())$($c.ref)C")
}
$sr.Close()
}
$package.Close()
$t = $text.ToString()
[void]$text.Clear()
return $t
}
Function ReadPdfFile($PdfFile) {
if (-not (Test-Path $script:PSScriptRoot\itextsharp.dll)) {
Write-Host "Unable to search in $PdfFile because isharptext.dll doesn't exist." -Fore Red
return
}
Add-Type -Path $script:PSScriptRoot\itextsharp.dll
$reader = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $PdfFile
$text = New-Object System.Text.StringBuilder
for ([Int]$i = 1; $i -le $reader.NumberOfPages; $i++) {
$strategy = New-Object iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy
[void]$text.Append([iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($reader, $i, $strategy))
remove-variable strategy
}
$reader.Close()
$t = $text.ToString()
[void]$text.Clear()
return $t
}
Function RGBColor([ConsoleColor]$Color) {
switch -Exact ($Color) {
"Black" { return "#000000" }
"DarkBlue" { return "#000080" }
"DarkGreen" { return "#008000" }
"DarkCyan" { return "#008080" }
"DarkRed" { return "#800000" }
"DarkMagenta" { return "#012456" }
"DarkYellow" { return "#EEEDF0" }
"Gray" { return "#C0C0C0" }
"DarkGray" { return "#808080" }
"Blue" { return "#0000FF" }
"Green" { return "#008000" }
"Cyan" { return "#00FFFF" }
"Red" { return "#FF0000" }
"Magenta" { return "#FF00FF" }
"Yellow" { return "#FFFF00" }
"White" { return "#FFFFFF" }
Default { return "" }
}
}
Function HtmlHeader() {
$script:Html = New-Object System.Text.StringBuilder
[void]$script:Html.Append("<!DOCTYPE html>`n")
[void]$script:Html.Append("<html lang=`"ja`">`n") # <- You may want to change this
[void]$script:Html.Append("<meta charset=`"UTF-8`">`n")
[void]$script:Html.Append("<head>`n")
[void]$script:Html.Append("<title>RegExS Output</title>`n")
[void]$script:Html.Append("<style type=`"text/css`">`n")
[void]$script:Html.Append("body {color:#ffffff; background-color:$(RGBColor("DarkMagenta"))}`n")
[void]$script:Html.Append("pre{white-space:pre-wrap}`n")
[void]$script:Html.Append(".fc {color:$(RGBColor($ForegroundColor)); background-color:$(RGBColor($BackgroundColor))}`n")
[void]$script:Html.Append(".cc {color:$(RGBColor($CapturegroupColor)); background-color:$(RGBColor($BackgroundColor))}`n")
[void]$script:Html.Append(".g {color:lime}`n")
[void]$script:Html.Append(".y {color:yellow}`n")
[void]$script:Html.Append("</style>`n")
[void]$script:Html.Append("</head>`n<body>`n<pre>`n")
}
Function HtmlFooter() {
[void]$script:Html.Append("</pre>`n</body>`n</html>")
}
Function Htmlify($t) {
$t = $t -Replace "&", "&"
$t = $t -Replace ">", ">"
$t = $t -Replace "<", "<"
$t = $t -Replace """", """
return $t
}
Function Pause_and_Exit() {
Write-Host "Press any key to exit . . ." -Fore Green
$Host.UI.RawUI.FlushInputBuffer()
$Host.UI.RawUI.ReadKey("NoEcho,IncludeKeyUp") | Out-Null
exit
}
Function Main() {
if ($DoubleClicked) { $Interactive = $True }
if ($Interactive) {
if ($Pattern) {
Write-Host "Enter a search text. [$Pattern]:" -NoNewline -Fore Green
$Answer = ""
try { $Answer = Read-Host } catch [System.Exception] {}
if ($Answer) { $Pattern = $Answer }
} else {
Write-Host "Enter a search text.:" -NoNewline -Fore Green
try { $Pattern = Read-Host } catch [System.Exception] {}
if (-not $Pattern) { Pause_and_Exit }
}
Write-Host "Enter a capture-group name or number. [$Group]:" -NoNewline -Fore Green
$Answer = ""
try { $Answer = (Read-Host).Trim() } catch [System.Exception] {}
if ($Answer) { $Group = $Answer }
Write-Host "Select a target folder." -Fore Green
Add-Type -AssemblyName System.Windows.Forms
$fbd = New-Object System.Windows.Forms.FolderBrowserDialog
$fbd.ShowNewFolderButton = $false
$fbd.Description = "Select a target folder."
if ($Dir -and (Test-Path -LiteralPath $Dir)) {
$fbd.SelectedPath = Convert-Path -LiteralPath $Dir
} else {
$fbd.SelectedPath = [string]$PWD
}
$Result = $fbd.ShowDialog((New-Object System.Windows.Forms.Form -Property @{TopMost = $true}))
if ($Result -eq [System.Windows.Forms.DialogResult]::OK) {
$Dir = $fbd.SelectedPath
} else {
Write-Host "invalid input" -Fore Red; Pause_and_Exit
}
$fbd.Dispose()
#Begin Get-WindowFocus
Add-Type @"
using System;
using System.Runtime.InteropServices;
public class SFW {
[DllImport("user32.dll")]
[return: MarshalAs(UnmanagedType.Bool)]
public static extern bool SetForegroundWindow(IntPtr hWnd);
}
"@
$PPID=$PID
For ([Int]$i=0; $i -lt 2; $i++) {
$PPID=(Get-WmiObject Win32_Process -Filter "ProcessID=$PPID").ParentProcessID
try {
$WindowTitle = (Get-Process -ID $PPID -ErrorAction SilentlyContinue).MainWindowTitle
$WindowHandle = (Get-Process -ID $PPID -ErrorAction SilentlyContinue).MainWindowHandle
if ($WindowTitle) {
[void][SFW]::SetForegroundWindow($WindowHandle)
break
}
} catch [System.Exception] { break }
}
#End Get-WindowFocus
Write-Host "Include sub-folders? Y/N [$(if($Recurse){"Y"}else{"N"})]:" -NoNewline -Fore Green
try { $Answer = (Read-Host).Trim().ToUpper() } catch [System.Exception] {}
switch ($Answer) {
"Y" { $Recurse = $True }
"N" { $Recurse = $False }
"" { }
default { Write-Host "invalid input" -Fore Red; Pause_and_Exit }
}
Write-Host "Enter file names. (wildcard allowed) [$Include]:" -NoNewline -Fore Green
$Answer = ""
try { $Answer = (Read-Host).Trim() } catch [System.Exception] {}
if ($Answer) { $Include = $Answer }
Write-Host "Specify the other options? Y/N [N]:" -NoNewline -Fore Green
$Answer = ""
try { $Answer = (Read-Host).Trim().ToUpper() } catch [System.Exception] {}
if ($Answer -eq "Y") {
Write-Host "Output the search result to a HTML file? Y/N [$(if($HtmlOutput){"Y"}else{"N"})]:" -NoNewline -Fore Green
try { $Answer = (Read-Host).Trim().ToUpper() } catch [System.Exception] {}
switch ($Answer) {
"Y" { $HtmlOutput = $True }
"N" { $HtmlOutput = $False }
"" { }
default { Write-Host "invalid input" -Fore Red; Pause_and_Exit }
}
Write-Host "Disable regular expression search? Y/N [$(if($SimpleMatch){"Y"}else{"N"})]:" -NoNewline -Fore Green
try { $Answer = (Read-Host).Trim().ToUpper() } catch [System.Exception] {}
switch ($Answer) {
"Y" { $SimpleMatch = $True }
"N" { $SimpleMatch = $False }
"" { }
default { Write-Host "invalid input" -Fore Red; Pause_and_Exit }
}
Write-Host "Ignore upper/lower cases? Y/N [$(if($IgnoreCase){"Y"}else{"N"})]:" -NoNewline -Fore Green
try { $Answer = (Read-Host).Trim().ToUpper() } catch [System.Exception] {}
switch ($Answer) {
"Y" { $IgnoreCase = $True }
"N" { $IgnoreCase = $False }
"" { }
default { Write-Host "invalid input" -Fore Red; Pause_and_Exit }
}
Write-Host "Convert wide characters into narrow ones before search? Y/N [$(if($Narrow){"Y"}else{"N"})]:" -NoNewline -Fore Green
try { $Answer = (Read-Host).Trim().ToUpper() } catch [System.Exception] {}
switch ($Answer) {
"Y" { $Narrow = $True }
"N" { $Narrow = $False }
"" { }
default { Write-Host "invalid input" -Fore Red; Pause_and_Exit }
}
Write-Host "Enter file names to be excluded. (wildcard allowed) [$Exclude]:" -NoNewline -Fore Green
$Answer = ""
try { $Answer = (Read-Host).Trim() } catch [System.Exception] {}
if ($Answer) { $Exclude = $Answer }
Write-Host "Enter a character encoding for the files without BOMs. [$Encoding]:" -NoNewline -Fore Green
$Answer = ""
try { $Answer = (Read-Host).Trim() } catch [System.Exception] {}
if ($Answer) {
$Encoding = $Answer
$EncodingList = @("unkown", "string", "unicode", "byte", "bigendianunicode", `
"utf8", "utf7", "utf32", "ascii", "default", "oem", "bigendianutf32")
if (-not ($EncodingList -Contains $Encoding.ToLower())) {
Write-Host "invalid input" -Fore Red; Pause_and_Exit
}
}
} elseif ($Answer -ne "N" -and $Answer -ne "") {
Write-Host "invalid input" -Fore Red; Pause_and_Exit
}
}
RegExS $Pattern $Dir $Include -e $Encoding -g $Group -h:$HtmlOutput -i:$IgnoreCase -n:$Narrow -r:$Recurse `
-s:$SimpleMatch -x $Exclude `
-bc $BackgroundColor -cc $CapturegroupColor -fc $ForegroundColor
if ($Interactive) { Pause_and_Exit }
}
Main