3
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

JavaScriptで綺麗にリンク一覧を取得する

Last updated at Posted at 2019-09-14

スクレイピングするまでもないけど、サイトのリンク一覧をさっと取得したいシーンがあったのでスクリプトを書きました。
実用的なスクレイピングやもっと高度なことを求めている方はChrome拡張のScraper等の利用を検討して、どうぞ。
#仕様
「綺麗に」の内訳。

  • リンクのテキストとURLを並べて表示、クリップボードにコピー
  • URLの重複は除去する
  • テキストの無駄な改行は除去する
  • 特定のドメインに絞れるようにする
  • For文は使わない

#コード
copy()コマンドを利用しているので、Chrome前提です。
ブラウザのConsoleに貼って実行してください。


// 検索ワードは適宜変更してください。
const targetLinkWords = ['www.bbc.com'];

const createLinkList = (el) => {
  let existsList = [];
  let res = '';
  Array.prototype.filter.call(el, (node) => {
    // hrefの値重複とtargetLinkWordsに登録されたワードを含まない場合、除外
    if (existsList.indexOf(node.href) === -1 && 
      targetLinkWords.find((val) => {return node.href.indexOf(val) !== -1;})) {
      existsList.push(node.href);
      res = `${res}\r\n` + (node.text.trim() === '' ? 
        'テキストなし':node.text.replace(/\r?\n/g, '')) + `||${node.href}`;
    }
  });
  return res;
};

const result = createLinkList(document.querySelectorAll('a'));
console.log(result);
copy(result);

#結果
試しにBBC NEWS Techのページで実行してみました。
image.png

Homepage||https://www.bbc.com/
Skip to content||https://www.bbc.com/news/technology#skip-to-content
Accessibility Help||https://www.bbc.com/accessibility/
Sign in||https://session.bbc.com/session?ptrt=https%3A%2F%2Fwww.bbc.com%2Fnews%2Ftechnology&context=news_gnl&userOrigin=news_gnl
Notifications||https://www.bbc.com/news/technology#
News||https://www.bbc.com/news
Sport||https://www.bbc.com/sport
Reel||https://www.bbc.com/reel
Worklife||https://www.bbc.com/worklife
Travel||https://www.bbc.com/travel
Future||https://www.bbc.com/future
Culture||https://www.bbc.com/culture
Music||https://www.bbc.com/culture/music
Weather||https://www.bbc.com/weather
More||https://www.bbc.com/news/technology#orb-footer
Video||https://www.bbc.com/news/video_and_audio/headlines
World||https://www.bbc.com/news/world
Asia||https://www.bbc.com/news/world/asia
UK||https://www.bbc.com/news/uk
Business||https://www.bbc.com/news/business
TechTech selected||https://www.bbc.com/news/technology
Science||https://www.bbc.com/news/science_and_environment
Stories||https://www.bbc.com/news/stories
Entertainment & Arts||https://www.bbc.com/news/entertainment_and_arts
Health||https://www.bbc.com/news/health
World News TV||https://www.bbc.com/news/world_radio_and_tv
In Pictures||https://www.bbc.com/news/in_pictures
Reality Check||https://www.bbc.com/news/reality_check
Newsbeat||https://www.bbc.com/news/newsbeat
Special Reports||https://www.bbc.com/news/special_reports
Explainers||https://www.bbc.com/news/explainers
Long Reads||https://www.bbc.com/news/the_reporters
Have Your Say||https://www.bbc.com/news/have_your_say
Africa||https://www.bbc.com/news/world/africa
Australia||https://www.bbc.com/news/world/australia
Europe||https://www.bbc.com/news/world/europe
Latin America||https://www.bbc.com/news/world/latin_america
・・・

ちゃんとコピーされました。

3
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?