More than 1 year has passed since last update.

robots.txt

スクレイピング

Last updated at 2024-04-28Posted at 2024-04-28

What is robots.txt

robots.txt is a text file webmasters create to instruct web robots (typically search engine robots) how to crawl pages on their website. The robots.txt file is part of the the robots exclusion protocol (REP), a group of web standards that regulate how robots crawl the web, access and index content, and serve that content up to users. The REP also includes directives like meta robots, as well as page-, subdirectory-, or site-wide instructions for how search engines should treat links (such as "follow" or "nofollow").

How to use robots.txt

The robots.txt file is a simple text file placed on your web server which tells webcrawlers like Googlebot if they should access a file or not. The robots.txt file is part of the the robots exclusion protocol (REP), a group of web standards that regulate how robots crawl the web, access and index content, and serve that content up to users. The REP also includes directives like meta robots, as well as page-, subdirectory-, or site-wide instructions for how search engines should treat links (such as "follow" or "nofollow").

Example

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/

Reference

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up