More than 3 years have passed since last update.

Java 8 Stream API にテキストを流してみた（終端操作編）

Last updated at 2021-12-26Posted at 2015-10-18

Java 8 Stream API にテキストを流してみる（生成編）- Qiita
Java 8 Stream API にテキストを流してみた（終端操作編）- Qiita
Java 8 Stream API にテキストを流してみて（中間操作編）- Qiita

要するにだ

Java 8 Stream は実行前も実行後もデータを保持しない。

Streamクラス自体はコレクションではなく、流れてくる個々の要素に対する操作のパイプラインを構成しているだけだ。従って Stream を通して加工された要素をデータとして扱うためには、最終的にまた何らかのデータ構造なり値なりに戻されなければならない。

そのためのメソッド群を「終端操作(Terminal operation）」と呼び、そのほかの「中間操作(Intermediate operation)」と区別する。

中間操作メソッドは要素に適用する関数をパイプラインに追加し、新たなStreamオブジェクトを返す。終端操作メソッドはメソッドチェインの最後に呼び出し、Stream パイプラインを出た要素を別のコレクションや集約値に変換して返す。

様々な形式のデータソースから Stream を生成できたように、Streamクラスには様々な終端操作が用意されていて、その最終成果物はどの終端操作メソッドを選択するかによって決まる。

終端操作が呼ばれて初めてソースの走査が実行される。そしてStreamオブジェクトを花火のように使い切る（Consumable）。その意味でもターミネータだ。

    Stream<String> stream = Stream.of("アカ","アオ","キ","モモ","ミド");
            
    stream.forEach(s -> {
       System.out.println(s);
    });
    
    // Stream オブジェクトは再利用できない。
    long c = stream.count(); 
    // java.lang.IllegalStateException: stream has already been operated upon or closed

なるほど。

ちょっと分かりにくいが、まあ、多くの議論が尽くされた上でそうなったのだろうことは想像できる。以下の Stack Overflow の回答で Java の中の人(Stuart Marks氏)が Stream の設計のいきさつを語っていて参考になる。

Why are Java Streams once-off? - Stack Overflow

2. 終端操作

終端操作メソッドは多岐にわたる機能を提供していてちょっと戸惑う。ここではテキスト処理の観点から整理したい。

テキストデータを Stream に流していろいろ出来たとして、それで最終的にどうしたいのか。とりあえず以下のような機能に整理できる。

検索したい
集約したい
変換したい
出力したい

これらの視点で終端操作メソッドを仕分けし、どれが適用できてその使い勝手はどうなるのかを検証してみたい。

2.1. 検索する

2.1.1. findFirst()/findAny()

メソッド名findFirst()/findAny()はそれ自体が要素を検索するわけではない。それはfilter()などの中間処理の役割だ。

findFirst()は始めの要素をOptionalで返す。
findAny()は初めの要素をOptionalで返す。
Optionalは空かもしれない。

findLast()的な最後の要素を得る終端操作は用意されていない。

List.2-1_find系終端処理の基本

    String[] words = {"aaaaaa", "bbbbbb", "cccccc"};

    List<String> list = Arrays.asList(words);
    Optional<String> first = list.stream().findFirst();
    first.ifPresent(s -> {
        System.out.println(s);  // "aaaaaa"
    });

    Set<String> set = new HashSet<>(list);
    Optional<String> any = set.stream().findAny();
    any.ifPresent(s -> {
        System.out.println(s);  // "cccccc"
    });

List.2-2_Setの要素を１個だけ取得する

    // のはけっこうめんどくさい
    Set<String> set = ...

    // Iterator経由
    final String one = set.isEmpty() ? "N/A" : set.iterator().next();

    // Stream経由
    final String any = set.stream().findAny().orElse("N/A");

List.2-3_HTMLからtitleを抽出する

        Pattern p = Pattern.compile("<title>(.+?)</title>", Pattern.CASE_INSENSITIVE);

        // ファイルの読み込みは最初のtitleが検出された行で中断される。
        String title = Files.lines(Paths.get("index.html"))
                .map(s -> p.matcher(s))
                .filter(m -> m.find())
                .map(m -> m.group(1))
                .findFirst()
                .orElse("*** NO TITLE ***");

単にfindFirst()とfindAny()を差し替えてみても結果は同じだ。両者の違いは Stream を並列化した時に現れる。

List.2-4_素数を見つける

    final int from = 1_000_000;
    final int to = from + new Random(System.currentTimeMillis()).nextInt(from); // 揺さぶり。

    int p = IntStream.range(from, to)
            .parallel()
            //.sequential() // sequencial ではどちらでも同じ結果になる。(=1000003)
            .filter(n -> {
                return IntStream.range(2, n)
                        .noneMatch(m -> n % m == 0);
            })
            .findAny()     // prallel ではどの素数が得られるかわからない。
            //.findFirst() // prallel でも最小の素数が得られる(=1000003)
            .getAsInt();

    System.out.println(p);

要素が得られた時点で以降の Stream 処理は打ち切られる(短絡操作)。並列処理にすればfindFirst()よりもfindAny()の方が速く結果を返せる可能性がある。

テキスト処理では検索対象そのものより、その前後や位置の方を知りたいことも多いが、副作用や状態を持たない Stream では難しい。

List.2-5_リストに要素の重複があるかチェックする

public boolean hasDuplicate(List<String> list) {
    Set<String> set = new HashSet<>();
    return list.stream()
            .filter(e -> !set.add(e))
            .findFirst()
            .isPresent();
}

List.2-6_あいまいMap

// Mapからキーの大小文字の区別なしでエントリを取得する
public Optional<Map.Entry<String, String>> looseGetEntry(Map<String, String> map, String key) {
    return map.entrySet().stream()
            .filter(e -> e.getKey().equalsIgnoreCase(key))
            .findAny();
}

    Map<String, String> emails = ...;

    looseGetEntry(emails, "Suzuki").ifPresent(e -> {
        System.out
                .format("%s <%s>", e.getKey(), e.getValue())
                .println();
                // SUZUKI <suzuki@exemple.jp>
    });

2.1.2. allMatch() / anyMatch() / noneMatch()

allMatch()/anyMatch()/noneMatch()は、与えられた述語関数(Predicate) を条件に Stream 結果の要素を検索し、マッチする要素の存在状態を判定する。

3種の違いはその名が表すの通りだが、論理学のようでちょっと理屈っぽい。特に否定条件の述語が与えられるとパッと見で分からず５秒くらい考えてしまう。実用上はforall()とかexists()だった方が直感的でがわかりやすかった。

結果は真偽値で得られるだけで大した芸はないが、入力値の検証などでは重宝するだろう。

List.2-7_match系終端操作の基本

    List<String> list = ... ;
    boolean ok;
    
    // ラムダ式
    ok = list.stream()
            .allMatch(s -> s != null && !s.isEmpty()); // nullと空文字列を含まない      
    // メソッド参照
    ok = list.stream()
            .allMatch(Objects::nonNull);    // null を含まない
    // 述語関数
    ok = list.stream()
            .noneMatch(Predicate.isEqual("")); // null可で空文字列を含まない

List.2-8_全てのファイルがそろっているかチェックする

    boolean ok = fileNames.parallelStream()
            .map(Paths::get)
            .allMatch(path -> Files.exists(path, LinkOption.NOFOLLOW_LINKS));

List.2-9_リストの全ての要素が等しい事をチェック

    String sample = resultList.get(0);
    assertTrue("結果の要素がすべて等しい", resultList.stream().allMatch(sample::equals));

match系は短絡終端操作だ(Short-circuiting)。

反例が見つかれば以降の処理は打ち切る。見つからなければ全ての要素が処理されることになる。

List.2-10_ヘッダ出力

    // ファイルは最初の空行まで読み込まれる。
    Files.lines(path)
            .peek(s -> {
                System.out.println(s);
            })
            .anyMatch(String::isEmpty);

全ての要素とは、０要素の場合も含むので注意が必要。０要素の空 Stream での以下の挙動を意外だと思うなら、どこかに潜在的なバグを仕込んでしまっているかもしれない。

List.2-11_空ストリームでのmatch系結果

    Stream.empty().allMatch(e -> true));   // true
    Stream.empty().allMatch(e -> false));  // true
    Stream.empty().anyMatch(e -> true));   // false
    Stream.empty().anyMatch(e -> false));  // false
    Stream.empty().noneMatch(e -> true));  // true
    Stream.empty().noneMatch(e -> false)); // true

2.2. 集約する

2.2.1. count() / min() / max()

count()は文字通り Stream の要素数を数える。

本当に数えるので、それなりのコストがかかる。

List.2-12_テキストファイルの行数

    int lc = (int) Files.lines(Paths.get("text.txt")).count();

List.2-13_テキストの単語数

    int wc = (int) Pattern.compile("\\W+").splitAsStream(text).count();

List.2-14_文字の異なりを数える

    int vc = (int) text.codePoints().distinct().count();

List.2-15_パターンの生起回数

    String text = "あんたあたしのことあんたあんたいうけどあたしもあんたのことあんたあんたいわへんからもうあんたもあたしのことあんたあんたいわんといてよあんた";
    String word = "あんた";

    text = text + "\0"; // <- 末尾対策    
    word = "(?<=" + word + ")";
    int count = (int) Pattern.compile(word).splitAsStream(text).count() -1;

List.2-16_ディレクトリ配下のファイル数を数える

    Path dir = ... ;
    int fileCount = 0;
    try (Stream<Path> files = 
            Files.find(dir, 100, (path, attrs) -> attrs.isRegularFile())
    ) {
        fileCount = (int) files.count();
    } catch (IOException e) {
        e.printStackTrace();
    }

count()は終端操作なので、要素数が分かったところでもうそのStreamオブジェクトは使えない。

List.2-17_テキスト行と行数を両方出力する

    List<String> lines = ...

    long c = lines.stream()
            .filter(line -> line.contains("status:404"))
            .peek(line -> {
                System.out.println(line);
            })
            .count();
    System.out
            .format("************ 検索結果： %d 件見つかりました *************", c)
            .println();

min()/max()には比較関数(Comparator)を渡して要素の最大値・最小値を得る。
戻り値はOptional<String>で、要素がなかった場合に empty となる。
Stream は必ず最後まで読み込まれる。

List.2-18_min()/max()の基本

    List<String> list = ... ;
    Optional<String> min;
    
    // Comparator
    min = list.stream()
            .min(Comparator.naturalOrder()); // 辞書順で最小の文字列
    // メソッド参照
    min = list.stream()
            .min(String::compareToIgnoreCase); // 大小文字区別しない
    // ラムダ式
    min = list.stream()
            .min((l, r) -> l.length() - r.length()); // 最短文字列
    // Comparable
    min = list.stream()
            .min(Comparator.comparing(s -> s.toUpperCase())); // 大小文字区別しない

List.2-19_一番古いファイル名のファイルを削除する

    // 日付書式でローテートされているファイル名は最古が min になるはず。
    // /var/log/error.log-yyyymmdd
    try (Stream<Path> logs = Files.list(Paths.get("/var/log"))) {
        Optional<Path> oldest = logs
            .filter(path -> path.getFileName().toString().startsWith("error.log-"))
            .min(Path::compareTo);
        if(oldest.isPresent()) {
            Files.delete(oldest.get());  
        }
    } catch (IOException e) {
        e.printStackTrace();
    }

List.2-20_Mapの値で比較する

    Map<String, String> report = new TreeMap<String, String>(){{
        put("国語", "C");
        put("算数", "A+");
        put("理科", "A-");
        put("社会", "D");
        put("機械学習理論II", "A+++");
    }};

    // 文字列評価で最小の値が最高成績を表すはず
    Map.Entry<String, String> best = report.entrySet().stream()
            .min(Comparator.comparing(gp -> gp.getValue() + ','))  // '+'<','<'-'
            .get();

    System.out.println(best); // "機械学習理論II=A+++"

文字列の最大・最小を得たところで実用性は限られそうだが、なにも文字列比較に限ることはない。要は、要素文字列に何らかの評価をして、数値やComparableなオブジェクトが得られればどんな関数を渡してもいいのだ。

List.2-21_メールアドレスのユーザ名部分が最も長いものを得る

        List<String> mails = Arrays.asList(
                "very.vary.long.name@example.jp",
                "jugemu_jugemu.gokounosurikire1234+qiita@example.jp",
                ...
               );
        
        String longest = mails.stream()
                .max(Comparator.comparingInt(m -> m.indexOf("@")))
                .get();

List.2-22_サジェスト(レーベンシュタイン距離による)

// Apache Commons
import org.apache.commons.lang3.StringUtils; 

    List<String> names = Arrays.asList("エレン", "アルミン", "ベルトルト", "ベルベル", "ヴェルナンデス");
    final String someone = "ベルナントカ";

    // レーベンシュタイン距離による評価関数
    ToIntFunction<String> dist = s -> StringUtils.getLevenshteinDistance(someone, s);
    
    String closest = names.stream()
            .min(Comparator.comparingInt(dist))
            .orElse(someone);

    if (!closest.equals(someone)) {
        System.out.println("もしかして " + closest); // "もしかして ベルトルト"
    }

List.2-23_和暦の比較

    // Java 8 で日時書式クラスも追加された。
    final DateTimeFormatter wareki = DateTimeFormatter.ofPattern("Gy年M月d日")
            .withChronology(JapaneseChronology.INSTANCE);

    List<String> birthdays = Arrays.asList(
        "平成12年3月4日", "昭和12年3月4日", "大正12年3月4日", "明治12年3月4日");

    String oldest = birthdays.stream()
            .min(Comparator.comparing(s -> LocalDate.parse(s, wareki))) // LocalDate は Comparable
            .get();
    System.out.println(oldest); // "明治12年3月4日"

List.2-24_Streamの最後の要素を取得する（その１）

    String last = lines.stream()
            .max((l, r) -> -1)
            .get();

ちなみに、java.util.Collections自体がmin()/max()を持っているのもたまには思い出してやってほしい。中間操作を使わないのならこちらの方が軽い。

要素文字列を何らかの数値として評価できれば、それに対する簡単な集計を行うこともできる。IntStream や DoubleStream などプリミティブ系 Stream には、count()/min()/max()だけでなくsum()やaverage()といった終端操作も用意されている。

ループなしで集計値を得られるのは便利だが、さらにその集計値を使った計算をしようとすると、何回も Stream を走らせるはめになって逆に効率が悪い。そのためか各集計値を一発でとれるsummaryStatistics()という終端操作もある。

それらを駆使すれば統計解析的なちょっと凝った計算も一発で何とかなりそうな気もすが、やってみるとそんなに使い勝手のいいものではない。試行錯誤で時間をムダにするより素直にループで処理を書いた方が幸せだろう。

一方、Stream を使った集計が簡単に並列化できるという魅力は捨てがたい。

List.2-25_最深のインデント数を調べる

    int depth = lines.stream()
            .map(l -> l.replaceFirst("(\\s*)(.*)", "$1"))
            .map(idt -> idt.replaceAll("(    | {0,3}\\t)", "1234")) // TAB -> 4SP 換算
            .mapToInt(idt -> idt.length() / 4) // IntStream
            .max().orElse(0);

List.2-26_英文の読みやすさを評価する(フレッシュ-キンケイド式による)

// 厳密なFKR値ではないので注意。(音節解析は日本人の手に負えない)
// しかし公開されている既存の実装間でも結果にかなりのバラツキがあることをみれば、
// 思いのほか許容範囲内にはあると考える。
public static void approxFleschKincaidReadabilityTests(String text) {
    final Pattern paragraphBreak = Pattern.compile("\\n\\s+");
    final Pattern period = Pattern.compile("[.!?](\\s+|$)");
    final Pattern nonAlphabet = Pattern.compile("[^a-z]+", Pattern.CASE_INSENSITIVE);
    final Pattern silentE = Pattern.compile("(?<!^[^eaiou]{1,2})e$", Pattern.CASE_INSENSITIVE);
    final Pattern pastEd = Pattern.compile("(?<!(^[^eaiou]{1,2})|([td]))ed$", Pattern.CASE_INSENSITIVE);
    final Pattern vowels = Pattern.compile("[eaiouy]+", Pattern.CASE_INSENSITIVE);
    final Pattern quotes = Pattern.compile("((?<=^|\\s)['‘\"“(])|([)'’\"”](?=[,;:]?\\s))");
    
    String[] paragraphs = paragraphBreak.split(text);
    
    Stream<String> sentences = Stream.of(paragraphs)
            .parallel()
            .map(p -> quotes.matcher(p).replaceAll(""))
            .flatMap(p -> period.splitAsStream(p));
    IntSummaryStatistics wsStat = sentences
            .mapToInt(s -> (int)nonAlphabet.splitAsStream(s).count())
            .summaryStatistics();

    Stream<String> words = Stream.of(paragraphs)
            .parallel()
            .flatMap(p -> nonAlphabet.splitAsStream(p));
    IntSummaryStatistics vwStat = words
            .map(w -> silentE.matcher(w).replaceFirst(""))
            .map(w -> pastEd.matcher(w).replaceFirst("d"))
            .map(w -> String.join(w, " ", " "))
            .mapToInt(w -> vowels.split(w).length -1)
            .summaryStatistics();

    System.out
            .format("Sentence count:            %8d\n",  wsStat.getCount())
            .format("Word count:                %8d\n",  wsStat.getSum())
            .format("Syllable count:            %8d\n",  vwStat.getSum())
            .format("Words per sentence:        %8.2f\n",  wsStat.getAverage())
            .format("Syllables per word:        %8.2f\n",  vwStat.getAverage())
            .println();

    double fre = 206.835 - wsStat.getAverage() * 1.015 - vwStat.getAverage() * 84.6;
    double fkgl = 0.39 * wsStat.getAverage() + 11.8 * vwStat.getAverage() -15.59;        
    System.out
            .format("Flesch Reading Ease:       %8.2f\n", fre)
            .format("Flesch–Kincaid Grade Level:%8.2f\n", fkgl)
            .println();
}

2.2.2. reduce()

カスタムの集約関数を適用したい場合はreduce()を使う。たとえば join では単純に実現できない連結処理するのに使える。

reduce()には以下の３種類のオーバーロードが定義されている。

Optional<T> reduce(BinaryOperator<T> accumulator)
T reduce(T identity, BinaryOperator<T> accumulator)
<U> U reduce(U identity, BiFunction<U,? super T,U> accumulator, BinaryOperator<U> combiner)

わけがわからないよ。

気を取り直して API ドキュメントを読む。

指定された単位元の値と結合的な累積関数を使ってこのストリームの要素に対してリダクションを実行し、リデュースされた値を返します。
https://docs.oracle.com/javase/jp/8/api/java/util/stream/Stream.html#reduce-T-java.util.function.BinaryOperator-

に、日本語でおk

再度気を取り直して、詳細を見ていくことにする。

以降、引数の数に従って reduce(1)、reduce(2)、reduce(3) としよう。要素型が String の場合、その呼び出しパターンをラムダ式で展開すると以下のようになるだろう。

List.2-27_reduce系終端操作の基本

    Stream<String> stream = Stream.of("a", "b", "c","d");

    // reduce(1)
    Optional<String> result = stream.reduce(
            // accumulator
            (String joined, String element) -> {
                return joined + '/' + element;
            }
    );
    // "a/b/c/d"

    // reduce(2)
    String result = stream.reduce(
            // identity
            "",
            // accumulator
            (String joined, String element) -> {
                return joined + '/' + element;
            }
    );
    // "/a/b/c/d"

    // reduce(3) parallel
    String result = stream.parallel().reduce(
            // identity
            "",
            // accumulator
            (String joined, String element) -> {
                return joined + "/" + element;
            },
            // combiner
            (String left, String right) -> {
                return left + "*" + right;
            }
    );
    //  "/a*/b*/c*/d"

    //　reduce(3) sequential
    StringBuilder result = stream.sequential().reduce(
            // identity
            new StringBuilder(),
            // accumulator
            (StringBuilder joined, String element) -> {
                return joined.append("/").append(element);
            },
            // combiner
            // sequential では呼ばれない
            (StringBuilder left, StringBuilder right) -> {
                return left.append("*").append(right);
            }
    );
    // "/a/b/c/d"

仮引数には　identitiy、accumulator、combiner という名前がついている。

API ドキュメントによると、これら引数の値と関数には、何やら難しいある種の数学的な要件を満たすことが期待されているようだ。

しかしそれは戦場プログラマの知ったことではない。薄目で見ればreduce()は単なる変数付きの内部イテレータに見えなくもない。

identitiy は本来、数学でいう単位元(identity element)の性質を持たなければならないのだが、順次処理では単なる初期値として問題ない。

accumulator の第一引数はその関数の呼び出し間で持ち回されるので、ラムダ式では扱いが面倒な外部変数の代わりに使える。

List.2-28_最後の要素を取り出す（その2）

    Optional<String> last = stream.reduce((l, r) -> r);

List.2-29_ドメイン名をパッケージ名に変換する

    String domain = "hoge.example.co.jp";
    // 要素順を逆順にする
    String pkg = Stream.of(domain.split("\\."))
            .reduce((l, r) -> r + "." + l).get();
    // jp.co.example.hoge

List.2-30_簡易ワードラップ

public static String wordwrap(String text, int len) {
    return Pattern.compile(" ").splitAsStream(text)
            .reduce((wrapped, word) -> {
                if (wrapped.length() - wrapped.lastIndexOf('\n') + word.length() > len)
                    return wrapped + '\n' + word;
                else
                    return wrapped + ' ' + word;
            }).get();    
}

    String text = "All work and no play makes Jack a dull boy. "
                + "All work and no play makes Jack a dull boy. ";
    System.out.println(wordwrap(text, 20));

12345678901234567890

All work and no play
makes Jack a dull
boy. All work and no
play makes Jack a
dull boy.

List.2-31_タグで囲む

    String text = "請求書";
            
    String emp = Stream.of("b", "i", "em", "strong", "span", "h1")
            .reduce(text, 
                   (str, tag) -> String.format("<%2$s>%s</%2$s>", str, tag)
            );
    // <h1><span><strong><em><i><b>請求書</b></i></em></strong></span></h1>

reduce(3)の第３引数の combiner は並列処理で Stream が分割処理された後の結合時に呼ばれる。Stream のどのタイミングで分割されるかは、ソースのタイプやCPUコア数などプラットフォームにも依存するだろうから、当てにはできない。

色々試したが combiner が順次処理(squential)で呼ばれることはないようだ。使わなくても null を渡すとエラーになるので、ダミーでも関数を渡しておかなければならない。

また、reduce(3)では、要素の型とは別の型を返せるのがreduce(1)とreduce(2)とは違う。

しかし、並列処理での使用は注意しなければならない。reduce(3) の indentity に StringBuiler のような mutable なオブジェクトを渡してしまうと、順次処理なら期待通りでも、並列処理ではとんでもない結果になる。その場合 reduce(3) ではなく collect(3) を使った方がすこしは安全になる(supplier)。

順次処理に限定すれば reduce(3) も何かに使い道がないわけでもないが、ちょっとトリッキーでロジックの見通しも悪くなるので無理して使うメリットもない。素直に自分でループ処理を書いた方が安全だろう。

List.2-32_行番号付き出力

    lines.stream()
            .reduce( 1, 
                    (i, line) -> {
                        System.out.format("%4d:   %s\n", i, line);
                        return ++i;
                    },
                    (l,r) -> null  // dummy
             );

List.2-33_名前付き文字列置換

    Map<String, String> params = new HashMap<>();
    params.put(":USER", "佐藤");
    params.put(":AMOUNT", "100");
    String template = "こんにちは:USERさん。早く:AMOUNT円返してください。";
    
    String message =  params.entrySet().stream()
            .reduce(template,
                    (t, e) -> t.replace(e.getKey(), e.getValue()), 
                    (l, r) -> l // dummy
            );
    System.out.println(message);
    // こんにちは佐藤さん。早く100円返してください。

List.2-34_streamを逆順にする

    // まあ、ネタということで...
    Stream<String> reflux = stream
            .reduce(Stream.empty(),
                    (stream, e) -> Stream.concat(Stream.of(e), stream),
                    (l,r) -> null // dummy
             );

List.2-35_多項式計算関数の生成（ホーナー法による）

    // ええ、もちろんテキスト処理ですが何か？
    // f(x) = -4.0x^4 + 3.0x^3 + -2.0x^2 + 1.0
    String coefficients = "-4.0, 3.0, -2.0, 1.0";

    DoubleUnaryOperator f = Stream.of(coefficients.split(",\\s*"))
            .map(Double::parseDouble)
            .reduce((double x) -> 0.0, 
                    (l, r) -> (double x) -> x * l.applyAsDouble(x) + r, 
                    (l, r) -> null // dummy
            );

    DoubleStream.iterate(0.0, x -> x + .01)
            .limit(101)
            .peek(x -> System.out.format("% 10.2f", x))
            .map(f)
            .forEach(y -> System.out.format("% 10.2f\n", y));

      0.00      1.00
      0.01      0.98
      0.02      0.96
      0.03      0.94
      0.04      0.92
      ...       ...
      0.99     -1.92
      1.00     -2.00

2.3. 変換する

2.3.1. toArray()

toArray()は Stream をその要素の配列に変換する。

常識的に考えてStream<String>からは String 型の配列（String[]）が得られるものと期待して当然なのだが、実際には Object 型配列（Object[]）でしれっと返される(引数なしの場合)。

この辺の事情には、 Java がジェネリクスを授かった時にかけられたイレイジャの呪いにより型の記憶が失われる、という設定が関係していると思われる。

String 型の配列を得るには、toArray()の引数にString[]を生成する関数を渡してやる。

ここは stream.toArray(String[]::new) のように、要素型配列のコンストラクタ参照を渡すのが定石となっている。めんどうなことだが新しい呪文として覚えておこう。

List.2-36_toArray系終端処理の基本

Stream<String> stream = Stream.of("a", "b", ...);

// 引数なし
Object[] arr = stream.toArray();

// 配列コンストラクタ参照
String[] arr = stream.toArray(String[]::new);

// ラムダ式
String[] arr = stream.toArray((size) -> new String[size]);

配列を操作するプログラムはなにかと気を使うので、とりあえず ArrayList などのコレクションに変換して楽をすることはこれまでもよくあった。Stream も配列操作の新しいお手軽ツールとして流用できるかもしれない。

List.2-37_相互変換


    // stream -> 配列
    String[] arr = stream.toArray(String[]::new);
    
    //　配列 -> Stream
    Stream<String> stream = Arrays.stream(arr);
    
    // stream -> List
    List<String> list = stream.collect(Collectors.toList());
    
    // List -> Stream
    Stream<String> stream = list.stream();
    
    // List -> 配列
    String[] arr = list.toArray(new String[0]);
    
    // 配列 -> List
    List<String> list = Arrays.asList(arr);

List.2-38_配列の型を変換する

    int[] intArr = {1, 2, 3};
    
    // int[] -> Integer[]
    Integer[] boxedArr = Arrays.stream(intArr)
            .mapToObj(i -> i)         // autoboxing
            .toArray(Integer[]::new);

    // int[] -> String[]
    String[] strArr = Arrays.stream(intArr)
            .mapToObj(String::valueOf)
            .toArray(String[]::new);

    // String[] -> int[]
    int[] intarr2 = Arrays.stream(strArr)
            .mapToInt(Integer::parseInt) // IntStream
            .toArray();

    // String[] -> CharSequence[] (インターフェースの配列)
    CharSequence[] seqArr = Arrays.stream(strArr)
            .toArray(CharSequence[]::new);

    // Object[] -> String[]
      Object[] objs = {1, "Hello", LocalDate.now()};
      String[] tostrs = Arrays.stream(objs)
              .map(Objects::toString)
              .toArray(String[]::new); // [1, Hello, 2015-01-01]

List.2-39_配列操作

    String[] arr1 = {"A", "B", "C"};
    String[] arr2 = {"L", "M", "N"};
    String[] arr3 = {"X", "Y", "Z"};
    
    String[] arr;
    
    // 初期化
    arr = IntStream.range(0,  10)
            .mapToObj(i -> "")
            .toArray(String[]::new);    // [, , , , , , , , , ]
    // 複製
    arr = Arrays.stream(arr1)
            .toArray(String[]::new);    // [A, B, C]
    // 連結
    arr = Stream.of(arr1, arr2, arr3)
            .flatMap(Stream::of)
            .toArray(String[]::new);    // [A, B, C, L, M, N, X, Y, Z]
    // 部分配列
    arr = Arrays.stream(arr, 1, 7)
            .toArray(String[]::new);    // [B, C, L, M, N, X]
    // 部分置換
    
    // 部分削除
            
    // 重複削除
    arr = Arrays.stream(arr)
             .distinct()
             .toArray(String[]::new);
    // 先頭追加
    arr = Stream.of(new String[]{"あ"}, arr)
            .flatMap(Stream::of)
            .toArray(String[]::new);
    // 先頭削除
    arr = Arrays.stream(arr)
            .skip(1)
            .toArray(String[]::new);
    // 末尾追加
    arr = Stream.of(arr1, new String[]{"あ"})
            .flatMap(Stream::of)
            .toArray(String[]::new);
    // 末尾削除
    arr = Arrays.stream(arr)
             .limit(arr.length -1)
             .toArray(String[]::new);

    // 2次元配列の列を取得
    String[][] matrix = {arr1, arr2, arr3};
    int col = 0;
    arr = Stream.of(matrix)
            .map(a -> a[col])
            .toArray(String[]::new);    // [A, L, X]

List.2-40_CSVを２次元配列に変換する

    List<String> lines = Arrays.asList("aaa,bbb,ccc", "ddd,eee,fff");
    String[][] matrix = lines.stream()
            .map(l -> l.split(","))
            .toArray(String[][]::new);

    System.out.println(Arrays.deepToString(matrix));
    // [[aaa, bbb, ccc], [ddd, eee, fff]]

List.2-41_ヘテロクラス配列に変換

    Pattern p = Pattern.compile(
            "^((?<d>[-+]?\\d+)|(?<f>[-+]?\\d*\\.\\d+([e][-+]?\\d+)?)|(?<b>(true|false)))$",
            Pattern.CASE_INSENSITIVE);
    
    String data = "123, 10000000.0, -.01E-5, FALSE, OK";
    Object[] arr = Stream.of(data.split(",\\s*"))
            .map(s -> {
                Matcher m = p.matcher(s);
                if (m.find()) {
                    if (m.group("d") != null) return Integer.parseInt(s);
                    if (m.group("f") != null) return Double.parseDouble(s);
                    if (m.group("b") != null) return Boolean.parseBoolean(s);
                } 
                return s;
            }).toArray();
    
    for (Object o : arr) {
        System.out.println("" + o + "\t" + o.getClass());
    }

123     class java.lang.Integer
1.0E7   class java.lang.Double
-1.0E-7 class java.lang.Double
false   class java.lang.Boolean
OK      class java.lang.String

2.3.2. collect()

List<String>をフィルタリングしたら結果のList<String>がもらえると当然思だろうが、Stream にそんな気は利かない。

    List<String> list = list.stream()
            .filter(s -> s.startWith("a"))
            .collect(Collectors.toList());

2014年、これを初めて見た全国のJavaプログラマが飲み込んだであろう言葉を、今の自分に言う資格はない。だって埼玉県民だもの。

Stream の結果をコレクションに変換するには、終端操作のcollect()メソッドで、その構築方法を指示してやる必要がある。

<R,A> R collect(Collector<? super T,A,R> collector)
<R> R collect(Supplier<R> supplier, BiConsumer<R,? super T> accumulator, BiConsumer<R,R> combiner)

List.2-42_collect系終端処理の基本


    Stream<String> stream = ... ;
    List<String> result;
    
    // collect(1)
    result = stream.collect(Collectors.toList());

    // collect(3) ラムダ式に展開
    result = stream.collect(
            // supplier
            () -> {
                return new ArrayList<>();
            },
            // accumulator
            (List<String> l, String e) -> { 
                l.add(e);
            },
            // combiner 
            (List<String> l, List<String> r) -> {  // BiConsumer<R,R>
                l.addAll(r);
            }
    );
        
    // collect(3) メソッド参照
    result = stream.collect(
            ArrayList::new, // supplier
            List::add,      // accumulator      
            List::addAll    // combiner 
    );
        
    // collect(1) + Collector.of()
    result = stream.collect(
            Collector.of(
                    // supplier
                    () -> new ArrayList<>(),                
                    // accumulator
                    (List<String> l, String e) -> l.add(e),
                    // combiner
                    (List<String> l, List<String> r) -> {  // BinaryOperator<A>
                        l.addAll(r); 
                        return l;
                    },
                    // finisher
                    (List<String> l) -> Collections.unmodifiableList(l)
            )
    );

    // collect(1) + Collector.of()　メソッド参照
    String joined = stream.collect(
            Collector.of(
                    StringBuilder::new,     // supplier
                    StringBuilder::append,  // accumulator
                    StringBuilder::append,  // combiner
                    StringBuilder::toString // finisher
            ));

reduceに劣らず複雑だ。

関数を３つも４つも渡すのが関数型プログラミングの流儀で普通なのかは知らない。ただ、オブジェクト指向からみると、本来クラスでカプセル化するべきメソッド（オブジェクトの生成と初期化・オブジェクト内部の操作・オブジェクト間の操作）を剥き出しでバラバラに実装しているように見え、しかもその整合性の責任はその場のプログラマに押し付けられている。

これはちょうどオブジェクト指向を裏返しにされたようで、なんともスプラッターな感じだ。

Java なら当然これらをクラスにまとめたい。java.util.stream.Collectorというインターフェースが、名前からしてなにか収集しそうでいかにもそれっぽい気がするが、それは期待はずれだ。

Collector は収集機能を実装するクラスためものではなく、collect() の引数に与える関数のホルダにすぎない。コレクションへの変換手順を定義した関数の組をあらかじめセットした Collector を、collect(1)などのCollectorsメソッドにぶちゅっとはめてやると、関数がじゅるるっと引き抜かれる仕組みだ。

「カートリッジ・インターフェース」とか「関数フィーダ」とでも呼べそうなデザインパターンだが、実際に関数型プログラミングの世界で何と呼ばれているのかは、知らない。

Stream API にはすでにそのようなCollectorの汎用的な実装が数多く提供されている。しかし Collector を継承したクラスそのものは存在せず、そのコンパニオンクラスであるjava.util.stream.Collectorsに、 static なファクトリメソッドとのコレクションとしてまとめられている。

これまでの Java API の設計だったら怒涛のクラスライブラリとなっていたはずだが、その代わりCollectorsクラス一つで鍵輪のようにして全部ぶら下げられた。かぎばあさんもびっくりだ。

Java Collectorメモ(Hishidama's Java8 Collector Memo)

その鍵を整理すると以下のようになる。

コレクション系
toList()
toMap() / toConcurrentMap()
toSet()
toCollection()
仕分け系
groupingBy() / groupingByConcurrent()
partitioningBy()
変換系
mapping()
集約系
counting()
joinning()
maxBy()/minBy()
summarizingXxx()
summingXxx()
reducing()

これらは全てCollectorを返し、しかもそれらを組み合わせられる。

これらをすべて理解し使いこなせる必要は全くないし、どうにか駆使ししてみたところでかえって分かりにくいコードになるだけだ。まずは基本的な使い方をイデオムとして押さえておけばいいだろう。

List.2-43_collect系終端操作の基本

// Collectors を static インポートすることでクラス名を省略できる。
// IDEの設定によってはワイルドカード(*)を使わせてくれないかもしれない。
import static java.util.stream.Collectors.*;

    Stream<String> stream = ... ;
        
    // 文字列に変換(連結)
    String text = stream.collect(joining());
        
    // 文字列に変換(区切り文字指定)
    String csv = src.stream().collect(joining(", "));
        
    // List に変換
    List<String> list = stream.collect(toList()); // ArrayList

    // 任意の List クラスに変換
    List<String> list = stream
            .collect(toCollection(LinkedList::new)); // LinkedList

    // Set に変換
    Set<String> set = stream.collect(toSet());  // HashSet

    // 任意の Set クラスに変換
    SortedSet<String> set = stream
            .collect(toCollection(TreeSet::new)); // TreeSet ソート済み

    LinkedHashSet<String> set = stream
            .collect(toCollection(LinkedHashSet::new)); // LinkedHashSet 要素順を維持

    // Map に変換
    // id -> object
    Map<Integer, User> map = users.stream()
            .collect(toMap(
                    e -> e.getId(),  // key 重複すると例外になる
                    e -> e           // value
            )); // HashMap

    // id -> name
    Map<Integer, User> map = users.stream()
            .collect(toMap(User::getId, User::getName));
    
    // 任意の Map クラスに変換する
    SortedMap<Integer, User> map = users.stream()
            .collect(toMap(
                    e -> e.getId(),
                    e -> e,
                    (l, r) -> r, // キーが重複したら上書き
                    TreeMap::new
            )); // TreeMap

List.2-44_オブジェクトをCSVに変換する

    Item item = ... ;
    String qcsv = Stream.of(
                item.getMaker(),   // String
                item.getName(),    // String
                item.getPrice(),   // Integer
                item.getMfgDate()  // LocalDate
            )
            .map(Objects::toString)
            .map(s -> s.replace("'", "\\'"))
            .collect(Collectors.joining("', '", "'", "'"));
    // 'Gyahtol\'s Foods', 'あの肉(L)', '100', '2015-01-01'

List.2-45_Mapを反転する

    // 重複があればエラー
    Map<String, String> flip = map.entrySet().stream()
            .collect(Collectors.toMap(e -> e.getValue(), e -> e.getKey()));

List.2-46_単語の生起頻度を集計する

import static java.util.stream.Collectors.*;

    String text = "Humpty Dumpty sat on a wall, "
                + "Humpty Dumpty had a great fall. " 
                + "All the king's horses and all the king's men "
                + "Couldn't put Humpty together again. ";

    String[] words = text.toLowerCase().split("[.,]?\\s+");
       
    // 単語 -> 回数
    Map<String, Long> freqs = Stream.of(words)
            .collect(groupingBy(w -> w, counting()));
    System.out.println(freqs);
    // {all=2, a=2, again=1, sat=1, couldn't=1, had=1, great=1, put=1, humpty=3, the=2,
    //  dumpty=2, king's=2, fall=1, and=1, men=1, wall=1, together=1, horses=1, on=1}

    // 回数　-> 単語
    Map<Long, Set<String>> dist = freqs.entrySet().stream()
            .collect(groupingBy(e -> e.getValue(),
                    TreeMap::new, // sorted
                    mapping(e -> e.getKey(), 
                            toCollection(TreeSet::new)))); // sorted
    System.out.println(dist);
    // {1=[again, and, couldn't, fall, great, had, horses, men, on, put, sat, together, wall], 
    //  2=[a, all, dumpty, king's, the], 
    //  3=[humpty]}

List.2-47_単語の生起確率

import static java.util.stream.Collectors.*;
    List<String> words = ... ;

    Map<String, Double> p = words.stream()
            .collect(groupingBy(w -> w,
                    collectingAndThen(counting(), c -> (double)c / words.size())));

List.2-48_単語の語彙(ユニーク単語)

    SortedSet<String> vocabulary = words.stream()
            .collect(Collectors.toCollection(TreeSet::new));

Collectors クラスが提供する各メソッドをうまく組み合わせれば、かなり複雑な変換処理をコンパクトに表現できる。しかしそれはいうほど簡単なことではなく、プログラミングよりパズルの才能が必要な領域になる。使ってみればわかるが、まあ、型推論が泣き出すのには手を焼くだろう。

しかも、Collectors が提供する機能の自由度は、 Stream 本来のパラダイムとはあまり関係がない。それこそCollectorインターフェースとCollectorsクラスは、Collector API とか Transform API と呼んでもいいくらいの独立したコンセプトと設計思想からなるライブラリの様相を呈している。

List.2-49_グループ内で最大値を持つデータを抽出する

// SQL で表現するのが難しい問題がある。
// たとえば「月次ごとに売り上げが最大となった日付と店舗を抽出する」などは
// 簡単そうで意外と面倒くさい。（特に MySQL、おまいのことだ）
// Collector の中にはそれを補完できそうな機能もある。
// もちろん情強は Excel のピボットを使う。

import static java.util.stream.Collectors.*;
import static java.util.Comparator.comparing;
...

    // 野球選手の成績データ
    List<Players> players = ... ;

    // 球団ごとに打率の最も高い選手を抽出する。
    Map<String, Player> topHitters = players.stream()
           .collect(groupingBy(
                   Player::getTeam,
                   collectingAndThen(
                           maxBy(comparing(Player::getBattingAverage)),
                           Optional::get)));
    
    // 打率３割以上と未満の選手に分け、チームごとの人数を集計する
    Map<Boolean, Map<String, Long>> hitterCounts = players.stream()
            .collect(partitioningBy(
                    e -> e.getBattingAverage() >= .300,
                    groupingBy(
                            Player::getTeam,
                            counting())));

Collectorsクラスの持つ機能だけでは賄えない特殊な変換をしたい場合、collect(3)に自作関数を渡したり、カスタムのCollectorクラスを実装することもできるが、テキスト処理の話題からだいぶ離れてしまうので、もうこれ以上深入りしない。

Java8のCollectorをfor文と比較しながら一から作って理解する - Qiita

2.4. 出力する

2.4.1. forEach() / forEachOrdered()

forEach()とラムダ式は Java プログラマ悲願の内部イテレータ構文を提供するものだ。

forEach()はJava 8 でIterableインターフェースの default トメソッドとして提供され、Collection 系クラスのListやSetなどで使えるようになった。
一部では、古くさい for 文よりforEach()の使用が推奨されているようだ。

一方 Stream にも相似のforEach()メソッドが終端操作として用意されている。
しかし Stream ではforEach()は推奨されず、限定された状況でやむを得ず使用するものという位置付けだ。
一部ではforEach()を使ったら「負け」とまで言われているらしい。

その理由のひとつは、forEach()が副作用を目的とした操作だからだろう。

forEach()に渡した関数(action)は値を返せず、標準出力や外部のオブジェクトを更新するなど、その外部の状態を変更するしかない。
これを副作用(Side-effects)と呼び、関数型プログラミングではそのパラダイムを穢す堕落した行為とみなされる。

もちろんそれは世俗プログラマの知ったことではない。
使える物は使うだけだ。

List.2-50_forEach()の基本

    List<String> list = Arrays.asList("a", "b", "c");

    // Stream の forEach() が副作用であることを
    // ブロックを使って明示したい
    list.stream()
	    	.forEach(s -> {
	        	System.out.println(s);
	    	});

    // ブロックを使わないとパッと見いかにも値を返してそう
    list.stream().forEach(s -> System.out.println(s));

    // forEach() がメソッド参照を使うとはおこがましい
    list.stream().forEach(System.out::println);

    // Iterable の forEach()
    list.forEach(s -> {
        System.out.println(s);
    });

    // 同等の拡張 for 構文
    for (String s : list) {
        System.out.println(s);
    }

    Map<String, String> map = new HashMap<>();

    // 実は Map にもある
    map.forEach((key, val) -> {
        System.out.format("%s=%s\n", key, val);
    });

    // Map でも Stream を使いたい
    map.entrySet().stream()
            .forEach(e -> {
                System.out.format("%s=%s\n", e.getKey(), e.getValue());
            });

しかし forEach() を使ってみるとラムダ式に絡む制約が効いて、単純に for 文の代替にはならない。

外部のローカル変数に代入できない(final扱い)。
処理を中断（break）できない。
チェック例外を投げ(throw)られない。

後の２つはともかく、最初のは地味に痛い。
たとえば、出力にインデックスをつけようとするだけでも苦労する。
Java のラムダ式はクロージャではないといわれる所以だ。

Java 8 forEach with index - Stack Overflow
ラムダ式や無名クラスのメソッドから、外部に値を渡す方法 - Qiita

List.2-51_行番号付き検索

public class GrepN {
    public static void main(String[] args) throws IOException {
        String keyword = args[0];
        
        LineNumberReader reader = 
                new LineNumberReader(new InputStreamReader(System.in));
        reader.lines()
                .filter(line -> line.contains(keyword))
                .forEach(line -> {
                    System.out
                            .format("%d:%s", reader.getLineNumber(), line)
                            .println();
                });
    }
}

$ java GrepN Buzz < fizzbuzz.txt
5:Buzz
10:Buzz
15:Fizz Buzz
20:Buzz
25:Buzz
30:Fizz Buzz
35:Buzz
...

List.2-52_テキストファイルの末尾行を表示（その４）

// count用の空読みなしで多い日も安心バージョン
public class Tail4 {
 public static void main(String[] args) throws IOException {
     final int nLines = Integer.parseInt(args[0]);
     final String fileName = args[1];

     // バッファリングするリスト
     final LinkedList<String> tail = new LinkedList<>(Arrays.asList(new String[nLines]));
     Files.lines(Paths.get(fileName))
             .forEach(line -> {
                 // final なローカル変数でも、オブジェクト内部を変更することはできる。
                 tail.addLast(line);
                 tail.removeFirst();
             });

     tail.stream()
             .filter(Objects::nonNull)
             .forEach(line -> {
                 System.out.println(line);
             });        
     }
}

List.2-53_テキストファイルの集計

public class Wc {
    final String name;
    int lineCount, wordCount, charCount; // インスタンス変数
    static int maxLineLength;            // クラス変数
    
    public Wc(String name) { this.name = name; }
    
    public Consumer<String> getCounter() {
        // ラムダ式外のローカル変数
        Pattern space = Pattern.compile("\\s+");

        return (line) -> {

            // 外部のローカル変数の参照はできる
            // しかし代入はできない（final扱い）
            wordCount += (int) space.splitAsStream(line).count();
           
            // ラムダ式の引数は普通に再代入できる
            line += "\n"; 
            charCount += (int) line.codePoints().count();

            // インスタンス変数は変更することができる
            lineCount ++;
            
            // クラス変数も変更できる
            maxLineLength = Math.max(maxLineLength, line.length());
        };
    }
        
    public Wc add(Wc wc) {
        this.lineCount += wc.lineCount;
        this.wordCount += wc.wordCount;
        this.charCount += wc.charCount;
        return this;
    }
    
    public static void main(String[] args) throws IOException {
        
        Consumer<Wc> print = wc -> 
                System.out.format("%7d %7d %7d %s\n", wc.lineCount, wc.wordCount, wc.charCount, wc.name);
        
        Wc total = Stream.of(args)
            .map(Wc::new)
            .map(wc -> {
                try (Stream<String> lines = Files.lines(Paths.get(wc.name))) {
                    lines.forEach(wc.getCounter());
                    return wc;
                } catch (IOException e) {
                    System.err.println(wc.name + ": No such file or directory");
                    return null;
                }
            })
            .filter(Objects::nonNull)
            .peek(print)
            .reduce(new Wc("total"), Wc::add);
        
        if (args.length > 1) {
            print.accept(total);
        }
    }
}

並列 Stream での forEach 関数の呼び出しは要素順にならずバラバラになる。
しかも同期されないので出力が壊れる。

forEach()の代わりにforEachOrdered()を使うと、中間操作を並列実行しつつ出力では要素順を維持できる。

List.2-54_forEachOrdered()の効果

    IntStream.range(1, 20)
            .parallel() // 並列処理
            .peek(i -> {
                System.out
                        .format("%02d%"+ i + "s%s", i, "", "★")
                        .println();
            })
         // .forEach(i -> {
            .forEachOrdered(i -> {
                System.out
                        .format("%02d%"+ i + "s%s", i, "", "-=☆")
                        .println();
            });

もちろん実行環境によってパターンは異なる
12            ★06      ★
03   ★
17                 ★
04    ★
05     ★

08        ★
02  ★
19                   ★
01 ★
09         ★
14              ★
07       ★
01 -=☆
18                  ★
02  -=☆
16                ★
13             ★
11           ★
03   -=☆
04    -=☆
15               ★
05     -=☆
10          ★
06      -=☆
07       -=☆
08        -=☆
09         -=☆
10          -=☆
11           -=☆
12            -=☆
13             -=☆
14              -=☆
15               -=☆
16                -=☆
17                 -=☆
18                  -=☆
19                   -=☆

List.2-55_Webサーバのレイテンシを調べる

    String[] urls = {
            "http://qiita.com/",
            "http://goo.gl/z5AEEC",
            "http://localhost/1.html",
            "http://localhost/2.html",
            ...
            "http://localhost/10.html"
    };
    
    Stream.of(urls)
            .parallel()
            .forEach(url -> {
                try {
                    HttpURLConnection client = (HttpURLConnection) new URL(url).openConnection();
                    client.setRequestMethod("HEAD");

                    long ms = System.currentTimeMillis();
                    int code = client.getResponseCode();
                    ms = System.currentTimeMillis() - ms;

                    synchronized (System.out) {
                        System.out
                            .format("%2d %6d %d %s", Thread.currentThread().getId(), ms, code, url)
                            .println();
                    }
                } catch (IOException e) {
                    System.err.println(e.getMessage() + " : " + url);
                }
            });

スレッド数は固定で使いまわしている。
11     10 404 http://localhost/2.html
12     10 404 http://localhost/9.html
 1     10 404 http://localhost/6.html
12      3 404 http://localhost/10.html
 1      3 404 http://localhost/7.html
12      1 404 http://localhost/8.html
11      6 404 http://localhost/4.html
 1      3 404 http://localhost/5.html
12      3 404 http://localhost/3.html
 1      2 404 http://localhost/1.html
13    129 301 http://goo.gl/z5AEEC
11     92 301 http://qiita.com/

2.4.2. iterator()

Java のラムダ式に副作用を求めるから無理がでる。
ラムダ式を使わずに、Stream の外でループを回せればいい。

何の情けか Stream はiterator()を提供し、外部イテレータを介した要素アクセスを可能にしている。

API ドキュメントではこれを「エスケープ・ハッチ」と表現していてなんかちょっと引っかかる。
まあいい、これで拡張 for 文でも Stream が回せるように、、、
クッ、ならないっ！

    Stream<String> stream = ... ;

    for (String s : stream) {
        System.out.println(s);
    }
    // java.lang.Error: Unresolved compilation problem: 
    // Can only iterate over an array or an instance of java.lang.Iterable

拡張 for 文が受け入れるのはjava.lang.Iterableインターフェースの iterator() だ。
Stream はiterator()持つがIterableではない。
言わばオレオレ iterator だ。

今更 iterator を変数に取ってhasNext()/next()で回すコードをちまちま書くのか。
地味な嫌がらせにしか思えない。

うーむ...

よく見れば、Iterableの定義はたまたま abstract なメソッドをiterator()一つしか持たない。
@FunctionalInterfaceこそ付いていないが、関数型インターフェースの要件は満たしている。
Iterableを渡すような局面では代わりにラムダ式やメソッド参照を使うことができるはずだ。

おりゃ！

List.2-56_拡張for文でStreamを使う

    Stream<String> stream = ... ;

    for (String s : (Iterable<String>) stream::iterator) {
        System.out.println(s);
    }

勝ッた。(T^T)

そこまでする意味があるかは別として、イデオムとして知っておいて損はない。
ただしIterableインターフェースの将来の仕様変更に耐える保証はない。

List.2-57_bigramの転置インデックス（青空文庫用）

// サンプルとしては長すぎたので取り下げて2-58に代えます。

List.2-58_ソースのコメントを削除

    Path src = Paths.get(file);
    Path dst = Paths.get(file + ".nocomment");
    Path tmp = Files.createTempFile(this.getClass().getName(), ".tmp");
            
    try (Stream<String> lines = Files.lines(src);
        BufferedWriter bw = Files.newBufferedWriter(tmp);
    ) {
        Iterator<String> iter = lines
                .map(l -> l.replaceAll("/\\*.*?\\*/", "")) /* 行内ブロックコメントを除去  */
                .map(l -> l.replaceFirst("//.*", ""))      // 行コメントを削除
                .flatMap(l -> {
                    // プロックコメント前後でブレイクする
                    return Stream.of(l.split("(?=/\\*)|(?<=\\*/)"));
                })
                .filter(l -> !l.matches("\\s+"))          // 空白文字のみの行も削除しておく
                .iterator();

        // 状態を持てる
        boolean isComment = false;
        for (String line :  (Iterable<String>) () -> iter) {
            if (isComment) {
                if (line.contains("*/")) {
                    isComment = false;
                }
                continue;                       // continue OK
            } else {
                if (line.contains("/*")) {
                    isComment = true;
                    continue;                   // continue し放題
                } else if (line.contains("*/")) {
                    System.err.println("ブロックコメントの不整合");
                    break;                      // break やり放題
                }
            }
            // チェック例外投げ放題
            bw.write(line);
            bw.newLine();
            bw.flush();
        }
        bw.close();
        Files.move(tmp, dst);
    }

つづく

Java 8 Stream API にテキストを流してみて（中間操作編）- Qiita

参考

java.util.stream (Java Platform SE 8 API仕様)
http://docs.oracle.com/javase/jp/8/api/index.html?java/util/stream/package-summary.html
Java Streamメモ(Hishidama's Java8 Stream Memo)
http://www.ne.jp/asahi/hishidama/home/tech/java/stream.html
Java Collectorメモ(Hishidama's Java8 Collector Memo)
http://www.ne.jp/asahi/hishidama/home/tech/java/collector.html
Java Streamサンプル(Hishidama's Java8 Stream Example)
http://www.ne.jp/asahi/hishidama/home/tech/java/stream_example.html
Java8 Stream APIの基本(6) - 終端操作の概要 - エンタープライズギークス (Enterprise Geeks)
http://enterprisegeeks.hatenablog.com/entry/2014/05/27/183000
Java8 Stream APIの基本(7) - 終端操作2(Stream#collect) - エンタープライズギークス (Enterprise Geeks)
http://enterprisegeeks.hatenablog.com/entry/2014/06/19/093000
Reduction (The Java™ Tutorials > Collections > Aggregate Operations)
https://docs.oracle.com/javase/tutorial/collections/streams/reduction.html
Java 8 Friday: 10 Subtle Mistakes When Using the Streams API
http://blog.jooq.org/2014/06/13/java-8-friday-10-subtle-mistakes-when-using-the-streams-api/#comments
Java SE 8 Lambda & Stream API Overview from History
http://www.slideshare.net/OracleMiddleJP/java-se-8-overview-from-history
Why are Java Streams once-off? - Stack Overflow
http://stackoverflow.com/questions/28459498/why-are-java-streams-once-off
Java8のlambda構文がどのようにクロージャーではないか - きしだのはてな
http://d.hatena.ne.jp/nowokay/20130522
forEach書いたら負け、for文禁止 - torutkの日記
http://d.hatena.ne.jp/torutk/20140518/p1
Java8 Streamでバリバリやれるようになりたい人のためのFunctional Interfaceまとめ - mike-neckのブログ
http://mike-neck.hatenadiary.com/entry/2014/08/01/132527
Java8での文字列連結 - Qiita
http://qiita.com/lonerydeveloper/items/9f7c977c039ad4d24d30
Javaで文字列の類似度を測るライブラリの紹介 - Qiita
http://qiita.com/ssaito/items/aae5522618d4296c5178
Flesch–Kincaid readability tests - Wikipedia
https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests
Readability-Score.com
https://readability-score.com/
069.音節 - 英語で悩むあなたのために
http://roundsquaretriangle.web.fc2.com/text/002_6.html

192

213

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up