More than 1 year has passed since last update.

C# ファイルの末尾から一行ずつテキストを取り出す

Last updated at 2023-03-11Posted at 2023-03-11

経緯

C#はWindows環境ならどこでも開発でき、どこでも使えるので最近勉強しだした。
そんな中で、いつも思っていたログの読み取りの自動化について考えてみた。

基本的にログは末尾に追記する (先頭追記はどの言語も用意されていないと思う) ため、
ファイルの末尾 (最新のログ情報) からデータを読みたいと誰しも思ったことだろう。

C#やほかの言語でも、ファイルの末尾から読み込む方式のメソッドは用意がないと思う。
そこで、今回はそんな夢にまでみたファイルを末尾から読み込むメソッドを作っていこうと思う！！！

開発方針

①今回はバッファサイズなども考慮して開発を行うこととする (ログファイルって容量大きいからね)
②ファイルを末尾から一行ずつ取り出せるものとする
③バッファサイズを考慮するといっても、一行データはstring型の最大容量までなので、
　それ以上の容量の一行データがあるとすれば、考慮範囲外とする (まあ、それはログの出し方考えなおしてほしい)
④使い方はStreamReaderクラスのReadLineメソッドと同様とする

ソースコード

それじゃあ早速作成したクラスを見てってくれい

RevStreamReader.cs

class RevStreamReader : StreamReader
    {
        private int peekIndex = 0;

        public RevStreamReader(Stream stream) : base(stream)
        {
            this.BaseStream.Position = this.BaseStream.Seek(0, SeekOrigin.End);
        }

        public RevStreamReader(string path) : base(path)
        {
            this.BaseStream.Position = this.BaseStream.Seek(0, SeekOrigin.End);
        }

        public RevStreamReader(Stream stream, bool detectEncodingFromByteOrderMarks) : base(stream, detectEncodingFromByteOrderMarks)
        {
            this.BaseStream.Position = this.BaseStream.Seek(0, SeekOrigin.End);
        }

        public RevStreamReader(Stream stream, Encoding encoding) : base(stream, encoding)
        {
            this.BaseStream.Position = this.BaseStream.Seek(0, SeekOrigin.End);
        }

        public RevStreamReader(string path, bool detectEncodingFromByteOrderMarks) : base(path, detectEncodingFromByteOrderMarks)
        {
            this.BaseStream.Position = this.BaseStream.Seek(0, SeekOrigin.End);
        }

        public RevStreamReader(string path, Encoding encoding) : base(path, encoding)
        {
            this.BaseStream.Position = this.BaseStream.Seek(0, SeekOrigin.End);
        }

        public RevStreamReader(Stream stream, Encoding encoding, bool detectEncodingFromByteOrderMarks) : base(stream, encoding, detectEncodingFromByteOrderMarks)
        {
            this.BaseStream.Position = this.BaseStream.Seek(0, SeekOrigin.End);
        }

        public RevStreamReader(string path, Encoding encoding, bool detectEncodingFromByteOrderMarks) : base(path, encoding, detectEncodingFromByteOrderMarks)
        {
            this.BaseStream.Position = this.BaseStream.Seek(0, SeekOrigin.End);
        }

        public RevStreamReader(Stream stream, Encoding encoding, bool detectEncodingFromByteOrderMarks, int bufferSize) : base(stream, encoding, detectEncodingFromByteOrderMarks, bufferSize)
        {
            this.BaseStream.Position = this.BaseStream.Seek(0, SeekOrigin.End);
        }

        public RevStreamReader(string path, Encoding encoding, bool detectEncodingFromByteOrderMarks, int bufferSize) : base(path, encoding, detectEncodingFromByteOrderMarks, bufferSize)
        {
            this.BaseStream.Position = this.BaseStream.Seek(0, SeekOrigin.End);
        }

        public override int Peek()
        {
            return this.peekIndex;
        }

        public override string ReadLine()
        {
            const int bufferSize = 4096;
            string lineText = "";
            int crIndex = -1;
            int lfIndex = -1;
            long buffaLength;
            long startPosition;
            byte[] crByte = this.CurrentEncoding.GetBytes("\r");
            byte[] lfByte = this.CurrentEncoding.GetBytes("\n");

            while (true)
            {
                //先頭が改行コードの場合を考慮
                if (this.BaseStream.Position == 0)
                {
                    this.peekIndex = -1;
                    return lineText;
                }
                //バッファサイズ調整
                else if (this.BaseStream.Position < bufferSize)
                {
                    buffaLength = this.BaseStream.Position;
                    this.BaseStream.Position = 0;
                }
                else
                {
                    buffaLength = bufferSize;
                    this.BaseStream.Position -= bufferSize;
                }

                //一応入れてるけど意味ないかも
                if (!this.BaseStream.CanSeek)
                {
                    return lineText;
                }

                //バッファサイズ分読み取る前に、初期ポジションを取得
                startPosition = this.BaseStream.Position;
                byte[] bytes = new byte[buffaLength];

                //取得Byteを1Byteずつ配列格納
                for (int index = 0; index < bytes.GetLength(0); index++)
                {
                    int read = this.BaseStream.ReadByte();

                    //改行コードの最終ポジションを記憶する
                    //CR
                    if (crByte[0] == (byte)read)
                    {
                        crIndex = index;
                    }
                    //LF
                    else if (lfByte[0] == (byte)read)
                    {
                        lfIndex = index;
                    }
                    bytes[index] = (byte)read;
                }

                //CRLF
                if (crIndex >= 0 && lfIndex >= 0 && crIndex == lfIndex - 1)
                {
                    //取得Byte1行分を文字列変換
                    byte[] copys= new byte[bytes.GetLength(0) - (lfIndex + 1)];
                    Array.Copy(bytes, lfIndex + 1, copys, 0, copys.GetLength(0));
                    lineText = this.CurrentEncoding.GetString(copys) + lineText;
                    this.BaseStream.Position = startPosition + crIndex;
                    return lineText;
                }
                //CR
                else if (crIndex >= 0 && lfIndex < crIndex)
                {
                    //取得Byte1行分を文字列変換
                    byte[] copys = new byte[bytes.GetLength(0) - (crIndex + 1)];
                    Array.Copy(bytes, crIndex + 1, copys, 0, copys.GetLength(0));
                    lineText = this.CurrentEncoding.GetString(copys) + lineText;
                    this.BaseStream.Position = startPosition + crIndex;
                    return lineText;
                }
                //LF
                else if (lfIndex >= 0 && lfIndex > crIndex)
                {
                    //取得Byte1行分を文字列変換
                    byte[] copys = new byte[bytes.GetLength(0) - (lfIndex + 1)];
                    Array.Copy(bytes, lfIndex + 1, copys, 0, copys.GetLength(0));
                    lineText = this.CurrentEncoding.GetString(copys) + lineText;
                    this.BaseStream.Position = startPosition + lfIndex;
                    return lineText;
                }
                //改行コードなし
                else
                {
                    //取得Byteを文字列変換
                    lineText = this.CurrentEncoding.GetString(bytes) + lineText;
                    this.BaseStream.Position = startPosition;
                }
            }
        }
    }

解説

まず、StreamRederクラスを使用して、同じ形で使用できるようにするために、クラスを継承して、
新たにRevStreamReaderクラスを作成

StreamRederクラスのコンストラクタで、ポジションを末尾に上書きする

StreamRederクラスと同じ形式で使用できるように、PeekメソッドとReadLineメソッドをオーバーライドする

Peekメソッドは多分本来と出力する値は違うが、今回は末尾に達したら-1を返却し、それ以外の場合は0を返却する
あんまり使用回数について考えていなかったのだが、今回は末尾まで達したら最後、もう一度末尾から読み込むなんてことはできない
もう一度読み込みたい場合は、インスタンス生成からやり直してくれ

ReadLineメソッドではまず、現在のポジションからバッファサイズを算出する
現在のポジションからバッファサイズ分前のバイナリデータを取得する
その際、改行コードのポジションを記憶する
改行コードがある場合は、改行コード以降のバイナリデータを取得し、文字列変換し、ポジションを更新して返却する
改行コードがない場合は、ポジションを更新し、再度バッファサイズの算出に戻る

使用方法

StreamRederクラスのReadLineメソッドと同様だが、一応記載しておく

使用方法.cs

    //読み込みファイルパス、文字コードを指定
    RevStreamReader rsr = new RevStreamReader("C:\\Users\\userName\\Desktop\\test.txt", Encoding.GetEncoding("UTF-8"));
    //先頭まで繰り返し
    while (rsr.Peek() >= 0)
    {
        //末尾から一行データ取得
        string txt = rsr.ReadLine();
        //以降で取得した一行データを処理する
    }

注意点

RevStreamReaderクラスはReadLineメソッドのみを実装しているため、
ほかのメソッドを使いたい場合はStreamRederクラスを使ってほしい
でないと、ポジション移動がおかしなことになるので。

処理速度に関して

測ってはいないが、極力バイナリデータで保持しているため、かなり早いと思う。
それに、わざわざバイト配列を使用しているため、メモリ使用量も抑えてると思う。
あと高速化対応できそうなのは、lineTextをバイト配列で保持しておくことかな。
返却時にバイト配列をくっつけて、一気に文字列化して返却とかありかも。
まあこれでも十分だと思うけどね。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up