1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

VBAとPowerQueryでUTF-8で文字の読み書きを行う

Posted at
  • VBAでUTF-8を扱う方法といえばADODB.Streamが定番
  • PowerQueryでもUTF-8が扱えるのでそれを利用してVBAで読み書きを試みる
  • ある程度動作確認はしているがちゃんと整理はしていないメモ書き
  • バイト列の書き込みにはOpenとかPutを使うがここでは細かく解説しない

手順

  • ファイルの読み込みはPowerQueryを使って行うのが楽なので、PowerQueryで行う
    • M言語部分はプレースホルダを用意して、そこにファイルパスなどを渡して処理する
  • テキストの書き込みではテキストをテーブルに分割して入力して、PowerQueryで連結して指定の文字コードのバイト列にする
    • 1行や1列で表現するには大きいのでテーブルに折り返しで1セル1バイトとして展開してVBAから読み取ってバイト列として書き込みを行う

使える文字コード

使える番号は以下の通りでEnumを作って管理したほうがやりやすいはず

Public Enum TextEncoding
    teUtf8 = 65001
    teUtf16 = 1200
    teUtf16Le = 1200
    teUtf16Be = 1201
    teShiftJis = 932
End Enum

テキストファイルを読み取るためのM言語コード

{filepath}{encoding}にファイルパスとエンコード用の番号を指定する
ここの処理ではテキストファイルを指定のエンコードで読み込み、読み込んだ内容をtxtChunkSizeで指定する大きさに分割して折り返しつつテーブルに展開する
これによってVBAからもテキストを読み事ができるようにする

 let
     txt = Text.FromBinary(File.Contents({filepath}), {encoding}),
     getOffset = (length as number, chunkSize as number) => 
         List.Numbers(0, Number.RoundUp(length / chunkSize, 0), chunkSize),
     blocks = let
             txtLength = Text.Length(txt),
             txtChunkSize = 500,
             txtOffset = getOffset(txtLength, txtChunkSize),
             txtCount = List.Numbers(
                 txtChunkSize, List.Count(txtOffset) - 1, 0
             ) & {Number.Mod(txtLength, txtChunkSize)},
             txtZip = List.Zip({txtOffset, txtCount})
         in
             List.Transform(txtZip, each Text.Range(txt,{0},{1})),
     blockChunkSize = 4,
     blockOffset = getOffset(List.Count(blocks), blockChunkSize),
     fold = List.Transform(blockOffset, each let
         block = List.Range(blocks,, blockChunkSize),
         pad = List.Transform(
             List.Numbers(0, blockChunkSize - List.Count(block)),
             each null
         )
         in Record.FromList(
             block & pad, 
             List.Transform(
                 List.Numbers(1, blockChunkSize),
                 (num as number) => Column & Text.From(num)
             )
         )
     )
in
     Table.FromRecords(fold)

テキストをバイト列に変換するM言語

{table_name}{encoding}にテキストを展開したテーブル名とエンコードを入れる

let
     tableName = "{table_name}",
     raw = Excel.CurrentWorkbook(){{[Name=tableName]}}[Content],
     txt = Text.Combine(
         Table.TransformRows(raw, each Text.Combine(Record.FieldValues(_)))
     ),
     bin = Text.ToBinary(txt, {encoding}, false),
     TABLE_WIDTH = 50,
     arr = Binary.ToList(bin),
     rowsOffset = List.Numbers(0, Number.RoundUp(List.Count(arr) / TABLE_WIDTH, 0), TABLE_WIDTH),
     fold = List.Transform(rowsOffset, each let
         bytes = List.Range(arr,, TABLE_WIDTH),
         pad = List.Transform(List.Numbers(0, TABLE_WIDTH - List.Count(bytes)), each null)
         in Record.FromList(
             bytes & pad,
             List.Transform(List.Numbers(1, TABLE_WIDTH), (num as number) => Column & Text.From(num))
         )
     ),
     return = Table.FromRecords(fold)
 in
     return

テキストをシートに書き込むための2次元配列にする

引数で1セルあたりの文字数や展開するテーブルで何列使って表現するかを決める
分割した文字が数字だけになると変な変換をされることがあるので「'」を先頭に付与して文字列として扱わせる

Private Function SplitTextChunk(ByVal Text As String, Optional ByVal ChunkSize As Long = 500, Optional ByVal TableWidth As Long = 8) As Variant()
    Dim TextRemainder As Long
    TextRemainder = Len(Text) Mod ChunkSize
    Dim ChunkCount As Long
    ChunkCount = (Len(Text) \ ChunkSize) + IIf(TextRemainder, 1, 0)
    Dim ChunkRemainder As Long
    ChunkRemainder = ChunkCount Mod TableWidth
    Dim TextBlock() As Variant
    ReDim TextBlock(1 To (ChunkCount \ TableWidth) + IIf(ChunkRemainder, 1, 0), 1 To TableWidth) As Variant
    Dim i As Long
    For i = LBound(TextBlock, 1) To UBound(TextBlock, 1)
        Dim j As Long
        For j = LBound(TextBlock, 2) To UBound(TextBlock, 2)
            Dim Index As Long
            Index = ((i - 1) * TableWidth + j - 1) * ChunkSize + 1
            Dim Chunk As String
            Chunk = Mid(Text, Index, ChunkSize)
            If Chunk <> "" Then
                TextBlock(i, j) = "'" & Chunk
            End If
        Next
    Next
    SplitTextChunk = TextBlock
End Function

クエリを使い捨てするための関数

文字列の読み込みはこれとクエリ用のM言語のやつがあればStringの配列を返すので、それでJoinすればいい

Private Function ExecuteQuery(ByVal QueryText As String) As Variant()
    On Error Resume Next
    ThisWorkbook.Queries("temp_query").Delete
    On Error GoTo 0
    Dim Query As Excel.WorkbookQuery
    Set Query = ThisWorkbook.Queries.Add("temp_query", QueryText)
    Dim TempSheet As Excel.Worksheet
    Set TempSheet = ThisWorkbook.Worksheets.Add
    Dim QueryText As String
    QueryText = "OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=""" & Query.Name & """;Extended Properties="""""
    On Error GoTo Finally
    With TempSheet.ListObjects.Add(xlSrcQuery, QueryText, , xlYes, TempSheet.Range("A1"))
        With .QueryTable
            .CommandType = xlCmdSql
            .CommandText = "select * from [" & Query.Name & "]"
            .Refresh False
        End With
        ExecuteQuery = FlattenFromMatrix(.DataBodyRange.Value)
        .Delete
    End With
Finally:
    Query.Delete True
    If Application.DisplayAlerts Then
        Application.DisplayAlerts = False
        TempSheet.Delete
        Application.DisplayAlerts = True
    Else
        TempSheet.Delete
    End If
    If Err Then
        Debug.Print Err.Description
    End If
End Function

2次元配列を1次元配列にする関数

読み書きの際に出てくるテキストを分割したテーブルやバイト列のテーブルは折り返す都合で末尾に空値のセルが出るので、それを抜いた値の入ったセルだけを使って平たくした配列を返す

Private Function FlattenFromMatrix(ByVal Matrix As Variant) As Variant()
    Dim y As Long
    y = UBound(Matrix, 1)
    Dim x As Long
    For x = LBound(Matrix, 2) To UBound(Matrix, 2)
        If IsEmpty(Matrix(y, x)) Then
            Dim TailNullCount As Long
            TailNullCount = TailNullCount + 1
        End If
    Next
    Dim Items() As Variant
    ReDim Items(1 To UBound(Matrix, 1) * UBound(Matrix, 2) - TailNullCount) As Variant
    For y = LBound(Matrix, 1) To UBound(Matrix, 1)
        For x = LBound(Matrix, 2) To UBound(Matrix, 2)
            Dim Index As Long
            Index = (y - 1) * UBound(Matrix, 2) + x
            If Index <= UBound(Items) Then
                Items(Index) = Matrix(y, x)
            End If
        Next
    Next
    FlattenFromMatrix = Items
End Function

テキストをPowerQueryを使って任意の文字コードのバイト列に変換する関数

CONVERT_BINARY_QUERYにM言語の文字列を定義しておく
文字列を文字列を分割して2次元配列にする関数を使ってテーブルに展開して、それをPowerQueryでバイト列に変換させている
あとはこのバイト列をファイルに書き込めばExcelの機能だけで文字コード変換が行える

Private Function TextToBytes(ByVal Text As String, ByVal Encoding As TextEncoding) As Byte()
    Dim TempSheet As Excel.Worksheet
    Set TempSheet = ThisWorkbook.Worksheets.Add
    Dim StartCell As Excel.Range
    Set StartCell = TempSheet.Range("A1")
    Dim Chunk() As Variant
    Chunk = SplitTextChunk(Text)
    Dim Header() As String
    ReDim Header(LBound(Chunk, 2) To UBound(Chunk, 2)) As String
    Dim i As Long
    For i = LBound(Chunk, 2) To UBound(Chunk, 2)
        Header(i) = "Column" & i
    Next
    StartCell.Resize(1, UBound(Chunk, 2)).Value = Header
    StartCell.Offset(1).Resize(UBound(Chunk, 1), UBound(Chunk, 2)).Value = Chunk
    Dim TempTable As Excel.ListObject
    Set TempTable = TempSheet.ListObjects.Add(xlSrcRange, StartCell.CurrentRegion, , xlYes)
    Dim QueryText As String
    QueryText = Replace(Replace(CONVERT_BINARY_QUERY, "{table_name}", TempTable.Name),"{encoding}",Encoding)
    Dim Result() As Variant
    Result = ExecuteQuery(Query)
    Dim Bytes() As Byte
    ReDim Bytes(LBound(Result) To UBound(Result)) As Byte
    For i = LBound(Result) To UBound(Result)
        Bytes(i) = Result(i)
    Next
    TextToBytes = Bytes
    Application.DisplayAlerts = False
    TempSheet.Delete
    Application.DisplayAlerts = True
End Function
1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?