2
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

FessAdvent Calendar 2016

Day 9

Fess HTMLのタグを指定してインデックスする

Posted at

Fess 10.3
HTMLの特定のタグをドキュメントのcontentに入れるようにします。

手順

  1. app/WEB-INF/classes/fess_config.propertiesを変更
  2. Fessを再起動
  3. クロール実行
fess_config.properties
# html
crawler.document.html.content.xpath=//BODY
crawler.document.html.lang.xpath=//HTML/@lang
crawler.document.html.digest.xpath=//META[@name='description']/@content
crawler.document.html.cannonical.xpath=//LINK[@rel='canonical']/@href
crawler.document.html.pruned.tags=noscript,script,style
crawler.document.html.max.digest.length=200

HTMLタグの指定

HTMLタグの指定はXPathで指定します。
XML Path Language (XPath)

デフォルトでcontentに入れるタグは
crawler.document.html.content.xpath=//BODY
でbodyタグ指定しています。

例: classとidを指定

<div class="document" id="doc1">をcontentに入れる

fess_config.properties
crawler.document.html.content.xpath=//DIV[@class='document'][@id='doc1']

に変更します。
Fess再起動後にHTMLをクロールすると
ドキュメントのcontentに指定したタグ内のデータが入るようになります。

2
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?