ã¯ãããŸããŠãyamaimo (@yarnaimodev) ã§ããQiita åæçš¿...ãšããããããäžã«ã¡ãããšããèšäºãäžããã®èªäœåããŠãªæ°ãããŸãã
1998 幎çãŸãã§ãããã°ã©ãã³ã°ãšã Web ãã¶ã€ã³ã¯ç¬åŠã§ 3 幎ããããã£ãŠãŸããTypeScript / Firebase / Node.js / React ããããç¹ã«å¥œãã§ãã
ãã®å coliss ã§çŽ¹ä»ããã Can't Unsee ãè©ŠããŠã¿ãã 1 åç®ã 7,630 ç¹ã2 åç®ã 7,930 ç¹ã§ããã1
å°èŠæš¡ã§ãã Mastodon ã€ã³ã¹ã¿ã³ã¹ã管çããŠãŸããããš Helix ããŒããŒã ããã®åçµã¿ç«ãŠã2ãã§ããããŒé åãå€ããã®ããªããªãèŠããããªããŠæ»ãã§ãŸãã
éçºç°å¢ã¯åºæ¬çã« WSL + Hyper + fish shell ãš VSCode ã§ãã
ä»å Puppeteer ã䜿ã£ãŠ Web ããŒãžããã®ãŸãŸã®ç¶æ ã§ããŒã«ã«ã«ä¿åããããŒã«ãäœã£ãã®ã§çŽ¹ä»ããŸãã
ð€ð€
Web äžã®èšäºãªã©ã HTML ã§ããŒã«ã«ã«ä¿åããããšããååãä»ããŠä¿åã§ããŠã§ãããŒãž å®å
šããéžã¶ãšåé¡ãªãä¿åã§ãããµã€ãããããŸãããWeb ãã©ã³ããšã CSS ã® background-image
ã ã£ããã<iframe>
ã®äžèº«ãä¿åã§ããäžå®å
šãªç¶æ
ã«ãªã£ãŠããŸãããšããããããšæããŸãã
ããããæ¢ããŠã¿ããš Puppeteer ã䜿ã£ãŠç»åã PDF ã§ä¿åããã©ã€ãã©ãªã¯èŠã€ãã£ãã®ã§ããããµã€ãºãåºå®ãããŠæ±ãã«ããã£ãããåç»/GIF ã®ä¿åãå šææ€çŽ¢ãé£ãããšããåé¡ããããŸããã
ãããªèš³ã§ããŒãžå ã®ã³ã³ãã³ããå®å šã«ä¿åã§ããããŒã«ãã»ãããšæã£ãŠäœãå§ãããã§ãããè²ã ãšäžéœåãçããŠèšèšãå€§å¹ ã«å€ãããããªãã 2 ã¶æã§ãªããšã䜿ããç¶æ ã«ãªããŸããã(äŸãã°æåã¯åäžãã¡ã€ã«ã§ä¿åããããšãéèŠã㊠Base64 ã§çŽæ¥åã蟌ãã§ããã©å®¹éããããããšã«ãªã£ãŠãããã)
äœã£ããã®ããã¡ãã§ã âšðð
Vanilla Clipper
https://github.com/yarnaimo/vanilla-clipper
1 ã€ã®ã³ãã³ãã§ããŒãžå ã®åç»ã»CSSã»Web ãã©ã³ãã»iframeã»Shadow DOM ãªã©ããã¹ãŠããŒã«ã«ã«ä¿åããããšãã§ããŸãã
â¹ïž 䜿ãæ¹
Vanilla Clipper ã䜿ãã«ã¯ãŸã Chrome ãš Node.js ã®ã€ã³ã¹ããŒã«ãå¿
èŠã§ãã
Node.js 㯠https://nodejs.org/ja/ ããææ°çãããŠã³ããŒãã»ã€ã³ã¹ããŒã«ã§ããŸãã
ðŠ ã€ã³ã¹ããŒã«
Vanilla Clipper ã npm (ãŸã㯠yarn) ã§ã€ã³ã¹ããŒã«ããŸãã
npm i -g vanilla-clipper
ã€ã³ã¹ããŒã«ãå®äºãããšããŒã ãã£ã¬ã¯ããª3ã« .vanilla-clipper
ãã£ã¬ã¯ããªãäœæãããŸãã
äžã®æ§æã¯ãã¡ãã§ãã
ð Web ããŒãžãä¿åããŠã¿ã
äŸãã°ãã®ã³ãã³ããå®è¡ãããšãhttps://qiita.com ã .vanilla-clipper/pages/main/{æ¥ä»}-qiita.com.html
ãšããŠä¿åãããŸãã
vanilla-clipper https://qiita.com
# 倱æããå Žå㯠-n ãªãã·ã§ã³
vanilla-clipper -n https://qiita.com
ä¿åãããããŒãžããã¡ãã§ã
Web ãã©ã³ããšã iframe (reCAPTCHA ã®ãšãã) ã衚瀺ã§ããŠãŸãããã
è©Šãã«ããããããŒããŒã«ã§ Offline ã«ããŠã衚瀺ã§ããã®ã§ãããŒã«ã«ã«ä¿åãããŠããããšãããããšæããŸãã
ðïž vanilla-clipper ã³ãã³ãã®ãªãã·ã§ã³äžèŠ§
ãªãã·ã§ã³å | ããã©ã«ãå€ | 説æ | äŸ |
---|---|---|---|
--headless, -h | true | Chrome ã headless ã¢ãŒãã§èµ·åããã | -h false |
--language, -l | ãã©ãŠã¶ã®èšèªãæå®ããã | -l ja-JP | |
--directory, -d | 'main' | ä¿åå ãã©ã«ããæå®ããã | -d tech |
--account-label, -a | 'default' | ãã°ã€ã³ã¢ã«ãŠã³ãã®ã©ãã«ãæå®ããã(åŸè¿°) | -a sub |
--device | ãšãã¥ã¬ãŒãããããã€ã¹ãæå®ãããäžèŠ§ã¯ãã¡ãã | --device 'Pixel 2' | |
--element, -e | åãæããã HTML èŠçŽ ã®ã»ã¬ã¯ã¿ã | -e '[role=main]' | |
--click, -c | ã¯ãªãã¯ãããã HTML èŠçŽ ã®ã»ã¬ã¯ã¿ã | ||
--scroll, -s | æäžéšãŸã§ã¹ã¯ããŒã«ãããã HTML èŠçŽ ã®ã»ã¬ã¯ã¿ãæå®ããªãå Žå㯠<html> ãš <body> ã |
||
--max-scrolls, -x | 10 | -s ã§æå®ããèŠçŽ ãæäžéšãŸã§ã¹ã¯ããŒã«ããåæ°ãã¿ã€ã ã©ã€ã³ã®ç¡éã¹ã¯ããŒã«ãªã©ã | -s 5 |
ð ããŒãžããšã«ãªãã·ã§ã³ãå€ãããå Žåãvanilla-clipper ãããŒã«ã«ã€ã³ã¹ããŒã«ããŠã¹ã¯ãªããããçŽæ¥å®è¡ãããšäžæ¬ã§ä¿åã§ããŸãã
âïž èšå®ãã¡ã€ã«
ð ãã°ã€ã³ã«å¿ èŠãªæ å ±ãèšå®ãã¡ã€ã«ã«æžããšèªåã§ãã°ã€ã³ãããããšãã§ããŸãã
module.exports = {
resource: { maxSize: 50 * 1024 * 1024 },
sites: [
{
url: 'example.com', // ãµã€ãã®URL
accounts: {
default: {
// â ã¢ã«ãŠã³ãã©ãã«
username: 'main',
password: 'password1'
},
sub: {
// â ã¢ã«ãŠã³ãã©ãã«
username: 'sub_account',
password: 'password2'
}
},
login: [
// ãã°ã€ã³ã®æé
[
'goto',
'https://example.com/login' // URL
],
[
'input',
'input[name="session[username_or_email]"]', // ã»ã¬ã¯ã¿
'$username' // -> accounts.{ã¢ã«ãŠã³ãã©ãã«}.username
],
[
'input',
'input[name="session[password]"]', // ã»ã¬ã¯ã¿
'$password' // -> accounts.{ã¢ã«ãŠã³ãã©ãã«}.password
],
[
'submit',
'[role=button]' // ã»ã¬ã¯ã¿
]
]
}
]
}
- ãã°ã€ã³æ
å ±ã¯è€æ°ã¢ã«ãŠã³ãä¿ç®¡ããããšãã§ããŸããããã©ã«ãã§ã¯
default
ã䜿çšãããã³ãã³ãã§-a sub
ã®ããã«ã¢ã«ãŠã³ãã©ãã«ãæå®ãããšsub
ã®äžã®åå€ã䜿çšãããŸãã - ãã°ã€ã³æ å ±ã«ã¯æååãè¿ãé¢æ°ãæå®ããããšãã§ããŸãã
-
login
ãã£ãŒã«ãã®å€ã¯ã$username
ã®ããã« \$ ã§å§ãŸãå Žåãaccounts
ã«ãããã°ã€ã³æ å ±ã®å€ã«çœ®ãæããããŸãã äŸãã°ã³ãã³ãã§sub
ã¢ã«ãŠã³ããæå®ããå Žåã$username
ã®éšåã¯sub_account
ãã$password
ã®éšåã¯password2
ãå ¥åãããŸãã
ð .vanilla-clipper ãã£ã¬ã¯ããªå ã®æ§æ
ð .vanilla-clipper
ð pages
ð main
ð 20190213-page1.html
ïž
ð {ä»»æã®ãã©ã«ã}
ð 20190213-page2.html
ð 20190214-page3.html
ïž
ð resources
ð 20190213
ð {ã©ã³ãã ãª26æå}.jpg
ð {ã©ã³ãã ãª26æå}.svg
ïž
ð 20190214
ð {ã©ã³ãã ãª26æå}.woff2
ïž
ð resources.json
ð config.js
ð pages
ããŠã³ããŒãããHTMLã®ä¿åå ããã®äžã®ä»»æã®ãã©ã«ã (æå®ããªãå Žåã¯main
ãã©ã«ã) ã«ä¿åãããŸããð resources
HTML ã«å«ãŸããç»åãªã©ã®å€éšãã¡ã€ã«ã®ä¿åå ãyyyyMMdd
圢åŒã®ãµããã©ã«ãã«ä¿åãããŸããð resources.json
å€éšãã¡ã€ã«ã®æ å ±ãä¿åããããŒã¿ããŒã¹ããã¡ã€ã«ã®ããã·ã¥å€ãªã©ãä¿åãð config.js
èšå®ãã¡ã€ã«ã
âš Vanilla Clipper ã®ãããšãã
- CSS ã¯
document.styleSheets
ããååŸããã®ã§ CSS in JS ãªã©ã«ãå¯Ÿå¿ - CSS å
ã®
background-image
ãªã©ã®å€éšãã¡ã€ã«ãä¿åã§ãã - CSS ã®
@import
ãååž°çã«åŠç -
@font-face
ã« WOFFã»TTF ãªã©è€æ°ã®åœ¢åŒãå«ãå Žåã¯æé©ãªãã®ãä¿å - iframeã»Shadow DOM ã®å 容ãåã蟌ãŸãã
- å€éšãã¡ã€ã«ã¯ãã§ã«ä¿åãããŠãããã®ãšäžèŽããå Žåæ°ãã«ä¿åããªã
ãã¡ã€ã«ã¯å URL ã§è€æ°ããŒãžã§ã³ä¿åã§ããããã«ãªã£ãŠããŸãããã§ã«ä¿åãããŠãããã¡ã€ã«ãšããã·ã¥å€ãäžèŽããå Žåã¯ãã®ãŸãŸå©çšãããéã«æŽæ°ãããŠããå Žåã¯æ°ããããŒãžã§ã³ãšããŠä¿åãããã®ã§å®¹éãæããããšãã§ããŸãã
ð¯ïž å®è£ ã®ãã€ã³ã
Puppeteer ãš jsdom ã䜵çšãã
Puppeteer ã§è¡šç€ºããããŒãžããã®ãŸãŸæžãæããŠããŸããšããã®éäžã§ããŒãžèªäœã®ã¹ã¯ãªããã«ãã£ãŠå€æŽãå ããããŠããŸãå¯èœæ§ããããŸãããã®ãããPuppeteer ã§ååŸãã HTML ãäžåºŠ jsdom ã«ç§»ããŠãã DOM æäœãè¡ãããã«ãªã£ãŠããŸãã
ããŒãžã®ã¹ã¯ããŒã«
ã¹ã¯ããŒã«ããªããšã³ã³ãã³ããèªã¿èŸŒãŸããªãå Žåãããã®ã§æäžéšãŸã§èªåã§ã¹ã¯ããŒã«ããããã«ãªã£ãŠããŸããã¹ã¯ããŒã«ããèŠçŽ ã¯ããã©ã«ãã§ã¯ <html>
ãš <body>
ã§ãã -s
ãªãã·ã§ã³ã§å€æŽã§ããŸãã
ã¹ã¯ããŒã«ãæäžéšã«éãããš 2.5 ç§åŸ æ©ããåŸ æ©åŸã«èŠçŽ ã®é«ããå€åããŠããå Žåã¯ããäžåºŠã¹ã¯ããŒã«ããŸãããã® æäžéšãŸã§ã¹ã¯ããŒã« + 2.5 ç§åŸ æ© ã®åäœãããã©ã«ãã§ã¯ 10 åç¹°ãè¿ããŸãã
CSS ã®æé©å
Vanilla Clipper 㯠CSS å
ã® url()
ã§æå®ãããŠãããã¡ã€ã«ãä¿åããŸããããã®ãŸãŸã ãšããŒãžå
ã§äœ¿ãããŠããªããã®ãä¿åãããã®ã§ãã®åã« CSS ãæé©åããŠããŸãã
- æªäœ¿çšã«ãŒã«ã®äžéšãåé€
ã«ãŒã«å ã§url()
é¢æ°ã䜿çšããããã€ãã®ã«ãŒã«ãããŒãžå ã§äœ¿ãããŠããªãå Žåã¯ã«ãŒã«ãåé€ãããŸãã - Web ãã©ã³ãã¯æé©ãªåœ¢åŒã®ã¿ä¿å
@font-face
ã§å®çŸ©ããã Web ãã©ã³ãã«è€æ°ã®åœ¢åŒãå«ãŸããå Žåãwoff2
ã»woff
ãåªå ããŠä¿åãããŸãã
ãŸãšã
ãã2幎ã§è²¯ãŸã£ã800以äžã®webããŒãžãæ¯ãåããäœæ¥ãåŸ ã£ãŠãã
â ðªðððððð âœáŽµâ±œááµâ±œáŽµáŽµâŸ (@yarnaimo) 2019幎2æ13æ¥
ä¿åããããŒãžã管çãã GUI ãã»ããã§ãããâŠããšããããããçµã¿èŸŒãã å人çšãã¬ããžããŒã¹ã¿ãããªã®ãäœããããšæã£ãŠãŸãã
ããããšãããããŸããïœ â
-
https://twitter.com/yarnaimo/status/1095243124694077441Â â©
-
https://twitter.com/yarnaimo/status/1085088898638766080Â â©
-
Windows ã¯
C:\Users\{ãŠãŒã¶ãŒå}
ãmacOS ã¯/Users/{ãŠãŒã¶ãŒå}
 â©