More than 5 years have passed since last update.

Clojure で Java クラスの継承などしてみる

Clojure

Last updated at 2014-09-29Posted at 2014-09-29

clojure で Java のクラスを継承してちょっと試しているのだけれどもいろいろとハマっているのでメモ。

たくさんアドバイスをくださる @tnoda_ さんに感謝。

※未完成。
随時追記・修正。
※ちなみにScala版はこちら（ちゃんと動く）
https://github.com/FScoward/scraping

※元ネタ
https://code.google.com/p/crawler4j/

gen-class で生成されるクラスのクラス名は :name で指定した fully qualified class nameである。
gen-class で指定したクラスは gen-class のある ns がコンパイルされないと生成されない。なので、AOT compile されるように project.clj で :aot [scraping-clj.crawler] する必要がある。
crawler.clj: Java は多重継承を許していないので、:extend [foo.bar.Baz] ではなく :extend foo.bar.Baz とする。
static でないメソッドをオーバーライドした関数のパラメータには this が含まれるべき。

crawler.clj


(ns scraping-clj.crawler
  (:import [java.io PrintWriter File]
           [edu.uci.ics.crawler4j.crawler Page WebCrawler]
           [edu.uci.ics.crawler4j.url WebURL]
           [org.jsoup Jsoup])
  (:gen-class
   :name scraping-clj.crawler.Crawler
   :main false
   :extends edu.uci.ics.crawler4j.crawler.WebCrawler
   ))

(defn -shouldVisit [this ^WebURL url]
  (def href (-> url .getURL .toLowerCase))
  (and (.startsWith href "http://ameblo.jp/takagakiayahi-blog/entry") (.endsWith href ".html")))

(defn -visit [this page]
  (println "ぱろぱろー2")
  (def url (-> page .getWebURL .getURL))
  (println url))

project.clj


(defproject scraping-clj "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [edu.uci.ics/crawler4j "3.5"]
                 [org.jsoup/jsoup "1.7.2"]]
  :aot [scraping-clj.crawler]
  :main scraping-clj.core)

core.clj


(ns scraping-clj.core
  (:import (edu.uci.ics.crawler4j.crawler CrawlController CrawlConfig)
           (edu.uci.ics.crawler4j.fetcher PageFetcher)
           (edu.uci.ics.crawler4j.robotstxt RobotstxtServer RobotstxtConfig)
           (scraping-clj.crawler Crawler)))

(def crawlStorage "scrape")

(def config (new CrawlConfig))

(defn -main []
  (.setCrawlStorageFolder config crawlStorage)
  (.setPolitenessDelay config 1000)
  (def pageFetcher (new PageFetcher config))
  (def robotstxtConfig (new RobotstxtConfig))
  (def robotstxtServer (new RobotstxtServer robotstxtConfig pageFetcher))
  (def controller (new CrawlController config pageFetcher robotstxtServer))
  (.addSeed controller "http://ameblo.jp/takagakiayahi-blog/")
  (.start controller Crawler 1))

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Clojure で Java クラスの継承 などしてみる

Clojure で Java クラスの継承などしてみる