Before creating a robots.txt file, it's a good idea to first know what a robots.txt file is. A robots.txt file is a file that tells search engine crawlers which URLs on your site are accessible to search engine crawlers. This file can be used to hide your site's url from search engine crawlers, but it cannot hide it from google indexing. To hide the url from Google's indexer, we can use the noindex attribute on the url.
Limitation of robots.txt file
If you want to create or edit a robots.txt file, you must know the limits of URL blocking methods from search engine crawlers. Depending on your goals and situation, we recommend that you consider other mechanisms to ensure your URLs are not discoverable on the web.
- The robots.txt command may not be supported by other search engines.The instructions in the robots.txt file cannot force crawler behavior on your site; it is the crawler who chooses whether to comply with the instructions or not. While Googlebot and other well-known web crawlers comply with the instructions in the robots.txt file, other crawlers may not. Therefore, if you want to keep your information safe from web crawlers, we recommend using other blocking methods, such as password-protected private files on your server.
- Different crawlers interpret the syntax in different ways. Although well-known web crawlers follow the commands in the robots.txt file, each crawler may interpret the commands in a different way. You should know the appropriate syntax to handle different web crawlers because some web crawlers may not understand certain instructions.
- Pages disallowed in robots.txt can still be indexed if linked from other sites.While Google will not crawl or index content blocked by robots.txt, we may still find and index disallowed URLs if they are linked from other sites on the web. As a result, URL addresses and, possibly, other publicly available information such as link text in links to pages may still appear in Google search results. To prevent URLs from appearing in Google search results, password protect your files on the server, use response headers or noindex meta tags, or delete entire pages.
Then How to create or custemize robots.txt in blogger ?
Here are the steps:
Step 1. Go to your blogger admin dashboard.
Step 2. Select "Setting" >> scroll down and find "Custom robots.txt"
Step 3. Fill the robots.txt
Step 4. Click Save.
By default behavior is that the user agent is allowed to crawl the entire site.
User-agent: *
Disallow:
# or
User-agent: *
Allow:/
# Example 1: Block only Googlebot
User-agent: Googlebot
Disallow: /
# Example 2: Block Googlebot and Adsbot
User-agent: Googlebot
User-agent: AdsBot-Google
Disallow: /
# Example 3: Block all but AdsBot crawlers
User-agent: *
Disallow: /
# Example 4: Block only Googlebot on folder "nogooglebot"
User-agent: Googlebot
Disallow: /nogooglebot/
# Example 5: no Block Googlebot only on folder "nogooglebot"
User-agent: Googlebot
Disallow: /
Allow: /nogooglebot/
Okay, that's enough for now from me, if you have any questions, please comment in the comments column. If something goes wrong I apologize. Hopefully helpful, and thank you.
"Look for someone who is willing to accept your situation, your family and your job. In fact, happiness is about being together and being grateful."
Reference:
https://www.google.com