4
4

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

Clojureで正規表現での検索結果をマップに変換する

4
Posted at

(ファイル中の)複数行に対する複数の正規表現マッチの結果をマップに変換することを目指します。

re-find

まずは、一番簡単なre-find結果をマップに変換する場合

re-findmap.clj
(defn wrap-vec [x] (if-not (sequential? x) [x] (vec x)))
(defn re-find->map [re s & keys]
  (when-let [v (re-find re s)]
    (apply hash-map (interleave keys (wrap-vec v)))))
user=> (re-find #".\d(.)" "some1234abc")
["e12" "2"]
user=> (re-find->map #".\d(.)" "some1234abc" :1 :2)
{:1 "e12", :2 "2"}

これを利用して、複数行への複数正規表現マッチを作ります。
なお、最初にマッチしたものを結果に格納することとします。
以下の例ではマッチ済みの正規表現の比較はスキップして、すべての正規表現にマッチしたら以降の行もスキップするようにしています。

※reduceを使っても実装できましたがもっと複雑でした。
※スキップなしならre-seqの例と同じくらい簡単になります。

re-find-linesmap.clj
(defn zipseq [& colls] (apply map list colls))
(defn indexed [coll] (zipseq (range) coll))

(defn- count<> [colls f] (->> (map count colls) (apply f)))
(defn count<  [& colls] (count<> colls <))

(defn re-find-lines->map [lines & re-keys-pairs]
  {:pre[(even? (count re-keys-pairs))]}
  (let [rk-list (indexed (partition 2 re-keys-pairs))
        m (java.util.LinkedHashMap.)]
    (doseq [line lines
            [i [re keys]] rk-list
            :while (count< m rk-list)
            :when (not (contains? m i))
            :let [v (apply re-find->map re line (wrap-vec keys))]
            :when v] (.put m i v))
    (apply merge (reverse (vals m)))))
user=> (re-find-lines->map ["some1234abc", "A9"] #".\d(.)" [:1 :2] #"A" :a)
{:a "A", :1 "e12", :2 "2"}
user=> (re-find-lines->map ["some1234abc", "A9"] #".\d(.)" [:1 :2] #"\d" :a)
{:a "1", :1 "e12", :2 "2"}

re-seq

re-seqの場合も似たような感じでできます。

re-seqmap.clj
(defn re-seq->map [re s & keys]
  (when-let [v (re-seq re s)]
    (apply hash-map (interleave keys (apply zipseq (map wrap-vec v))))))
user=> (re-seq #".\d(.)" "some1234abc")
(["e12" "2"] ["34a" "a"])
user=> (re-seq->map #".\d(.)" "some1234abc" :1 :2)
{:1 ("e12" "34a"), :2 ("2" "a")}

複数行、複数正規表現マッチ。
re-findの例より簡単です。

re-seq-linesmap.clj
(defn re-seq-lines->map [lines & re-keys-pairs]
  {:pre[(even? (count re-keys-pairs))]}
  (->> (for [line lines [re keys] (partition 2 re-keys-pairs)]
         (apply re-seq->map re line (wrap-vec keys)))
       (apply merge-with concat)))
user=> (re-seq-lines->map ["some1234abc", "A9"] #".\d(.)" [:1 :2] #"A" :a)
{:a ("A"), :1 ("e12" "34a"), :2 ("2" "a")}
user=> (re-seq-lines->map ["some1234abc", "A9"] #".\d(.)" [:1 :2] #"\d" :a)
{:a ("1" "2" "3" "4" "9"), :1 ("e12" "34a"), :2 ("2" "a")}

うまくできました。

4
4
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
4

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?