How To Add Robots.txt To Any Website? | Next Earning

What is robots.txt?

You may not want your site’s every page to be viewed by search engines crawlers and visitors. The page that is included in the root directory and should not be crawled and indexed by search engine spiders needs the use of robots.txt. Robots.Txt is a plain text file which tells the search engines which part of your site should they access and index and which part of your site search engines shouldn’t access and index.

Adding robots.txt file to site directory

Robots.txt file uses a protocol with a small set of commands that filters the access to your site by category, section, etc; this is called Robots Exclusion Standard. The process of adding robots.txt file has been respected by every major search engine but this does not work for spambots which collects e-mail for spammers. Robots.txt is not enough to make a site fully secured. So, you've to think of putting the files in protected directory for the full security.

Why Robots.txt file is important for SEO?

Robots.txt is very important for SEO of your site. Search engine spiders try to find out robots.txt file before indexing your site to the search engines. You should use robots.txt file to save your bandwidth as search engine spiders use more bandwidth as a result of its repeated retrieval of the large 404 error fie. Also not to let search engine spiders to index your graphic files also you must use robots.txt file. Robots.txt helps to avoid wastage of server resources. Similarly, robots.txt file removes clutter from your web statistics. Also, whenever you don't want any pages to be shown and indexed in the search engines, you have to add robots.txt file.

How to Create and Configure Your Robots.txt File?

Creating a Robots.txt file is very easy; just open a text editor and save the blank file as robots.txt. You have to add a little code depending on your needs. You’ve to make clear about which pages to show and which pages not to show in different search engines.

Some common popular search engine spiders are:
  • Googlebot for Google
  • Googlebot-Image for Google Images
  • Googlebot-News for Google News
  • Bingbot for Bing
  • Teoma for Ask

There are few terms and rules to be understood for robots exclusion standard which are known as directives. Let’s understand them before using them:

Main directives:
User-agent: User-agent refers to the search engines to which the function/rule should apply.
Disallow: Disallow tells to the search engines not to crawl and index page, file or directory.

Other Important directives:
Allow: It informs to the search engines that they can crawl and index the page, file or directory.
Crawl-delay: Crawl refers to the (time period) number of seconds between the requests to your server
Sitemap: It shows the location of your site’s sitemap.

Let me show various types of robots.txt examples with their function:

If all search engines can crawl to your every pages and directories:

User-agent: *
Disallow:
Here asterisk(*) implies all (all search engines).

If you want to block every search engines to crawl and index your whole site:

User-agent: *
Disallow: /
(Here right slash {/} implies to the root directory or root of the domain)
This is suitable if your website is under construction and you don’t want any search engines to index your site.

If you want to hide a specific folder or directory (eg: search directory)

User-agent: *
Disallow: /search
The page www.nextearning.com/search will not appear in the search engine results. Use of “/search” is enough; do not try making it like “www.nextearning.com/search” .

If you want to block a specific page to a specific search engine:

User-agent: Bingbot
Disallow: /search
The search directory of your site will be blocked in Bing search engine and this directory won’t get indexed in Bing search engine.

If you want to allow some files or directories to crawl even if the whole site is blocked:

User-agent: Googlebot-Image
Disallow: /images/
Allow: /images/logo.png
The whole “images” folder is blocked in the search engines but your logo will be indexed and viewed.

You can add your site’s sitemap anywhere inside your robots.txt file; you may put it at the beginning of file or even at the end too.
User-agent: *
Disallow:
Sitemap: http://www.yourwebsite.com/sitemap.xml
Note: Allowing everything (i.e. Allow: *) and disallowing nothing (i.e. Disallow: /) are same in function.

If you want to declare the time period between the requests to your server,

User-agent: *
Disallow:
Crawl-delay: 10

Advanced Robot.Txt:

User-agent: *
Disallow: /web/seo*.jpg
The above code is useful to disallow/block seo.jpg, seo-new.jpg, seo-tips.jpg, etc.


User-agent: *
Disallow: /*?*
The above code is used to block each URLs with the question mark “?” character.


User-agent: *
Disallow: /"
Similarly, the above code is used to block search engine spiders to crawl the pages with URLs starting with quote.


User-agent: *
Disallow: /web.html
This code will block search engine spiders from crawling and indexing pages like /web-tutorials.html, web-themes.html, web123.html, etc. If you want to block only the page "/web.html" then use the following code:
User-agent: *
Disallow: /web.html$

If you want to place comment in the robots.txt file then better place it at the beginning or after directory. To place comments you should use hashtag(#)
i.e.  #Block only the page /web.html
User-agent: *
Disallow: /web.html$

The robots.txt file of wordpress.org looks like:

Wordpress's robots.txt code

Where to place robots.txt file in your website?

Before placing your robot.txt file to your site, make sure that you’ve successfully created the robot.txt file and the file is under 5000 characters. Wordpress users may even use plugin too.
After successfully creating the robots.txt file, you’ve to place it to your main directory.

At last,

Use of robots.txt file to tell search engines about which page or directory to crawl and index and which not to crawl and not to index is one of the important parts of SEO. This is widely accepted tool throughout the various search engines and it’s effective too. But, this doesn’t work for spambots.

We went through the introduction to robots.txt file, importance of robots.txt file, how to create it and where to add it. If you’ve queries, comment below I will try to help you and I hope this post was helpful for you. Comment below for feedback.

Follow my next article:

How To Setup Bing Webmaster Tools?



Post a Comment

Pacific Prakash Regmi

{picture#https://lh3.googleusercontent.com/-41lRHTZjyVc/AAAAAAAAAAI/AAAAAAAAErM/xOfIMDRAlGE/s144-p-k-rw-no/photo.jpg} Pacific is Android Developer at Sirseni Technology and author of Viralandroid. {facebook#https://www.facebook.com/PacificRegmi} {twitter#https://twitter.com/PacificRegmi} {google#https://plus.google.com/113273303524839594024}

Pratikshya Regmi

{picture#https://lh5.googleusercontent.com/-CwajOIZC72U/AAAAAAAAAAI/AAAAAAAAAAA/dONxpSR4Kpc/s128-c-k/photo.jpg} Pratikshya is wordpress developer and creative writer. She is also senior editor for Next Earning. {twitter#https://twitter.com/pratikshyaregmi} {google#https://plus.google.com/107723467271414664587}
Powered by Blogger.