The robots.txt file is used to control the crawlers activity on a website/blog. It will help you to keep some directories away from crawling while allowing some. For example if yu have two folders 1.Articles and 2.Javascripts - and if you wish to exclude Javascripts from crawling by robots, then you can command it on the robots.txt file.

A few basics about what the robots.txt file is -
- It is found in the root folder, Ex:-www.google.com/robots.txt
- It’s a text file and can be edited
- It is used to command the robots what to crawl and what not
- It is used to help the crawlers locate the sitemap on your site

If you are on blogger platform, then you can’t upload the robots.txt file. Panic not - there is another option which you can utilize. I’ll discuss it towards the end of this article. First let’s discuss a normal robots.txt implementation on a hosted site.

Implementing the Robots.txt file on a web-hosted site(Wordpress)

Pre-requisites - I assume that you have a wordpress hosted site with Cpanel/FTP access.

- Find the file at your public_html folder. If it isn’t there, create a blank text document.

Excluding a folder from crawling by SE bots.
Suppose you don’t want Google to index one of your folders.
In the robots.txt file, you have to specify two things - which crawler agent(Google, Yahoo, MSN) do you want to keep out and - which folder/folders you want to exclude.

The general syntax to be written in the Robots file is this.

User-agent: *
Disallow: /yourfolder/

Here, user-agent:* means all search agents(Google,MSN,Yahoo etc).
/yourfolder/ restricts that folder from crawling. Note that the sub-folders will not be crawled too.

In order to keep all agents away from crawling ALL folders, use this code.

User-agent: *
Disallow: /

You can specify individual crawler agents with their names(replacing *) like google bot,lycra etc.If you are following a general command to all search engine crawlers, keep the * in the user-agent line.

Specifying a sitemap with the Robots.txt file
Due to the recent agreement with the major search engines, they have come up with a common command that they will follow to detect sitemaps from robots.txt file. The command is -

Sitemap: Sitemap url here

Robots.txt for Blogger users.

Blogger users cannot upload the robots.txt file instead, they can use the robots meta tag to control the crawling of bots on particular files.

These codes should be included in the HEAD section of the particular page template.(Enclosed in arrow brackets)

META NAME=”ROBOTS” CONTENT=”NOINDEX”

This command will not index the current page in which this code is included.

META NAME=”ROBOTS” CONTENT=”NOFOLLOW”

This command will not follow/parse the links present on the particular page where this code is present in the head section.Blogger users can use this option to their advantage when making posts.If you want every new page to be crawled by the bots, include the following code to head section of your blogger template.

meta name=”robots” content=”index, follow”

Happy driving the robots. :)

Related SEO Tips and Articles:

  1. [...] have already seen how to control robots crawling your website using optimised robots.txt useage and..Avoiding duplicate content using robots.txt [...]

  2. Hi…
    This is regarding tk domain. I’m using tk domain to promote my blogspot.
    After I switchover to tk domain even my meta tag verification get failed. How can I do successful meta tag verification in tk domain websites? Is it wise to promote my blog with tk domain? what are the hurdles I may face in future? please let me know…

    I look forward to hearing from you ASAP.

    Thanx in advanz…

    [Reply to this comment]

  3. by d way.. niz site man.. keep it up! :)

    [Reply to this comment]

  4. Hi,

    I was wondering if you know how to control how googlebot crawls the individual pages of a blogspot site. I have one particular post that I want to exclude from google searches. Is there any way to do that? How does one edit the html for just one particular post to include the meta tag you have given above? I want the rest of my blog site crawled by the bots but just one particular post to be excluded.

    Thanks.

    [Reply to this comment]

  5. Han,

    Add the following tag to your blogspot page.

    META NAME=”ROBOTS” CONTENT=”NOFOLLOW”

    This page won’t get indexed by the bots.

    [Reply to this comment]

  6. Jamal,

    It’s definitely not a good idea to promote the .tk domain, because it’s a domain beyond your control. If you are on blogger. I suggest using the custom domain option. Try to get a domain, it’s only less than 10$ for a year.
    If you are using the .tk domain, go to your settings for the domain and it asks for relevant keywords and description to the domain, if you had skipped in during registration take time out to fill in the keywords and description field. The .tk domain will automatically create the meta tags for keywords and decription in the site which is spider crawlable.

    Having said that, it’s only an potion if you are keen to use the domain. I suggest you rush to godaddy right now. :)

    [Reply to this comment]

  7. hi,
    My blogspot is not showing adsense ads from yesterday, i think something is blocking google adsense crawler to my blogspot. I dont know how to allow google adsense crawler to crawl my blog. Plz help me..

    [Reply to this comment]

  8. But it still show the same message:
    User-agent: *
    Disallow: /search

    How will i fixed this problem.

    [Reply to this comment]

  9. Just a question , I have seen your robots.txt but you are not blocking the Category and even RSS links.. Is there any reason for this.. Since this creates duplicate contents.

    [Reply to this comment]

  10. Romba nalla Article.. Hi very nice article.. really working on my site..thankyou very much..

    [Reply to this comment]

  11. Nice article .. but can you help me out in fixing my blog, in my blog not a single post is indexed my google..
    thanks in advance.

    [Reply to this comment]

    Mani KarthikNo Gravatar Replied:

    Sure Robin, Please post your question with your site details here, and we can sort it.

    [Reply to this comment]

  12. hey plz check my blog once and see if u can optimize its perfomance http://pcsoftwarez.blogspot.com

    [Reply to this comment]

  13. Hi Mani Karthik,

    my blog is yeeern.blogspot.com.
    i am having problem adding the feed to the google webmaster tools.

    How do i change my robots.txt?

    Thanks in Advance.

    [Reply to this comment]

  14. [...] Optimize the robots.txt file for Wordpress, allow your blog to rank high - Robots.txt is an often ignored file, which is actually an excellent tool that will help you get more files indexed on google and thus rank high. Here ate the tweaks. [...]