robots.txt
file located in the root Web directory of a Web Site is used by robots such as Googlebot, MSNBot, Yahoo! Slurp or Yahoo!'s Web Crawler to know which pages of the Web Site are to the indexed by the search engine, and which pages should not be.This
robots.txt
file is a plain text file containing sections such as:
User-agent: Googlebot
Disallow: /private_content/
Disallow: /images/
In this example, it will exclude from the search engine index the pages located in the private_content and images directories.
The syntax for robots.txt entry presented here should be used as is ; I mean a space is needed between ":" and the page or directory path.
Comments may be inserted in the robots file. A comment line starts with a "#" character.
A more generic syntax exists to disallow files or directories for all User-agents:
User-agent: *
Disallow: /cgi-bin/
Disallow: /family/
Should you combine both syntaxes "User-agent: Bot-Name" and "User-agent: *", you should take care to place "User-agent: *" after all "named" sections.
For example, Google's robot, Googlebot reads the
robots.txt
file and uses the first User-agent
section matching the pattern Googlebot*. Then, Googlebot stops reading the file.It should be the same for other bots (Yahoo or Msn).
Aucun commentaire:
Enregistrer un commentaire