< Return to Glossary

What is a Robots.txt?

Robots.txt is a text file that webmasters create to tell web robots (usually search engine robots) how to crawl the pages of their website. The file is placed in the root directory of the site to control which parts of the site the robots can access and crawl. Robots.txt is part of the Robots Exclusion Protocol (REP), a group of web standards that govern how robots crawl the web, access and index content, and serve it to users.

For example, if a website owner does not want a search engine to index a particular directory on his website, he can use the robots.txt file to deny all robots access to files in that directory. The syntax is simple and allows directives like "Disallow", which prohibits a robot from accessing a certain part of the website. Conversely, you can use "Allow" to specify what is allowed. While most compliant robots will follow the guidelines in a robots.txt file, this is not a mechanism to exclude malicious bots, as the file can be ignored by malware and other non-compliant entities.

Effective use of robots.txt can help control website traffic, reduce the load on web servers, and keep the indexed content of a website as intended by the webmaster.

Pirsch Analytics Icon Pirsch Analytics Icon

Ready to Level up Your Analytics?

Try Pirsch Analytics free of charge for 30 days with no credit card required. Pick the best Google Analytics alternative, setting up your first website only takes a few minutes.