More than 5 years have passed since last update.

[車輪の再発明]固定長文字列の解析バイト単位での文字の切り出し

Java

Last updated at 2014-06-27Posted at 2014-06-25

この間作ったのは文字の長さで切り出していいたのでバイト単位で切り出せるように改造。

全角文字と半角文字が混在しているような文字列はこれでいけるはず。
そもそもインターフェースとして全角半角が混ざったものを固定長データでやりとりするのをやめさせるのが先決だと思うけど念のため作っておく。

追記）
UTF-8で処理するように書いたけどUTF-8の日本語は3～4バイトに定義されているので意図しない位置で切り出す可能性があるのを注意する必要がある。Shift_JISとかEUC-JPとか使ったほうが良さげ。

FixedLengthStringByteParser.java

package tools;

import java.nio.charset.Charset;
import java.util.Objects;

/**
 * 固定長文字列パーサ
 */
public class FixedLengthStringByteParser {

	/** 固定長文字列のバイト配列 */
	private final byte[] source;
	/** 文字コード */
	private final Charset charset;
	/** 現在位置 */
	private int position = 0;

	/**
	 * コンストラクタ
	 * @param source 固定長文字列
	 * @param charset 文字コード
	 */
	public FixedLengthStringByteParser(String source, String charset) {
		Objects.requireNonNull(source);
		Objects.requireNonNull(charset);
		this.charset = Charset.forName(charset);
		this.source = source.getBytes(this.charset);
	}

	/**
	 * 次の文字列を切り出す
	 * @param length 長さ
	 * @return 切り出した文字列
	 */
	public String next(int length) {
		if (length < 1) {
			return null;
		}
		if (position >= source.length) {
			return null;
		}
		String ret = null;
		if (position + length > source.length) {
			int len = source.length - position;
			ret = new String(source, position, len, charset);
			position += length;
			return ret;
		}
		ret = new String(source, position, length, charset);
		position += length;
		return ret;
	}

	public static void main(String[] args) {
		// UTF-8の日本語は全角３バイト
		// （全角４バイトにも漢字が定義されているので注意）
		FixedLengthStringByteParser parser = new FixedLengthStringByteParser("1あbcd文字列", "UTF-8");
		System.out.println(parser.next(4));
		System.out.println(parser.next(3));
		System.out.println(parser.next(3));
		System.out.println(parser.next(7));
		System.out.println(parser.next(5));
	}
}

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

[車輪の再発明]固定長文字列の解析 バイト単位での文字の切り出し

[車輪の再発明]固定長文字列の解析バイト単位での文字の切り出し