In this article, we are going to discuss a small but quite powerful file within your website, known as the robots.txt file. What is the robots.txt file? How to create it?
Now, it’s an important file when it comes to technical SEO, and we’re going to explore what the file does, how it works, and the implications it has on your SEO.
What Is The Robots.Txt File?
A robots.txt file is simply a file that instructs web crawlers like Google’s crawler which crawls your content to index your website, what to do when it hits certain areas of your site.
Most robots.txt files are pretty simple, and the majority you see will have maybe just a couple of lines in them covering a couple of areas of the website, but some might be a lot more complex.
It is a text file residing in the root directory of your website and gives search engines crawlers instructions as to which pages they can crawl and index.
One of the first things you need to check and optimize when working on your technical SEO is the robots.txt file. A problem or misconfiguration in your robots.txt can cause critical SEO issues that can negatively impact your rankings and traffic.
How To Edit Your Robots.txt
To edit your robots.txt use your favorite FTP client and connect to your website’s root directory. Robots.txt is always located in the root folder. Download the file to your PC and open it with a text editor. Make the necessary changes and upload the file back to your server.
How To Create a Robots.txt
If you don’t already have a robots.txt,
- Create a new text file using a text editor or Robots.txt Generator Tool,
- Add your directives,
- Save it and upload it to the root directory of your website.
Make sure that your file name is robots.txt and not anything else.
**Also, have in mind that the file name is case-sensitive so it should be all lowercase.
The robots file has a very simple structure. There are some predefined keyword/value combinations you can use.
The most common are:
Technically speaking, there are usually two parameters here. The first one will be the user agent. Now the user agent is the name of the crawler. If you just want to describe Google, it will be Googlebot. There are Bingbots as well.
There are many different crawler types out there for different search engines and different platforms as well.
So you have to describe the user agent you want to describe to access or not access areas of your site.
Now, if you just put a * here, this character means everything, it means anyone and everyone. You don’t need to define all of the crawlers you know in your head within your robot’s file. Simply putting a star in there will indicate to all crawlers whether or not they can crawl or access different areas of your website.
User-agent:* – includes all crawlers
User-agent: Googlebot – instructions are for Google bot only
The next part is you add your Disallow line. So you want to define the URL or the section of your website you don’t want people to crawl.
After the forward-slash of your website, what are the pages, subfolders, or sections of your site you don’t want to be crawled by any crawlers.
By defining this in the second line of your robots. txt file, you can say to different crawlers not to access areas of your website.
Why would you do this? Well, some areas might pose a security risk. You wouldn’t necessarily want Google to crawl very sensitive data. You might have a platform in the background of your website, maybe it’s a software as a service product, maybe hold a lot of secure details or information, or maybe the area just doesn’t provide value to users of Google.
Maybe you don’t want that content to be indexed because it could be harmful to your ranking. There are several different reasons you might want to do this, and by using the robots. txt file you can instruct all the crawlers out there to crawl or not crawl different areas of your website.
Some robots.txt might actually include time delays as well. Some people might want their crawlers not to crawl a website as quickly and they can put time delays in, to tell the bots or the crawlers not to crawl areas of their website until a time delay has been specified. where you can actually go ahead and define a time period you can delay crawlers.
The ‘Allow’ directive specifies explicitly which pages or subfolders it can be accessed. This is applicable for the Googlebot only.
You can use the ‘allow’ directive to give access to a specific sub-folder on your website, even though the parent directory is disallowed.
For example, you can disallow access to your ‘Photos’ directory but allow access to your ‘cars’ sub-folder which is located under the Photos.
One more thing to remember as well. That is if you have an XML sitemap, and you should, you should also include a directive to the XML sitemap within your robots.txt file to tell the crawler where to find your sitemap. So it has a good understanding of your content as well.
The ‘sitemap’ directive is supported by all major search engines and you can use it to specify the location of your XML Sitemap.
Create robots.txt file using Robots.txt Generator in ceevee-seotools.com
Just go to the tool and enter the required details using an easy dropdown list.
The first dropdown is “Default – All Robots are” which allows you to select give permission to allow or refuse all robots by default.
The second option is to allow you to select crowl delay time. the given options are 5, 10, 20, 60, and 120 seconds. Or you can select the “No Delay” option too.
Next, it gives you an option to enter your sitemap URL. Leave blank if you don’t have one.
After that our tool gives you a better option to include robots as your wish. Lastly, you can enter your restricted directories. Next, simply click “Create and save as robots.txt” and upload it to your root directory.
How To Test Your Robots.txt
As I mentioned before, your robots.txt file should always be uploaded to your root directory. When you go to your browser, type your URL in, and then put forward slash and then robots.txt, all in lower case, then that should provide access to your robots.txt file.
Now, you can obviously go ahead and test your robots.txt file. to make sure you’re not blocking off good areas of your website that you want Google or other bots to crawl.
This can be done in the search console. And also you can do this with the below method.
Open a new browser window and navigate to the ‘Robots.txt Tester Tool’ by entering the following URL in the address bar:
Choose a verified property from the dropdown list. Click the Test button. If everything is ok, the Test button will turn green and the label will change to ALLOWED.
If there is a problem, the line that causes a disallow will be highlighted.
A few more things to know about the ‘robots tester tool’: You can use the URL Tester to enter a URL from your website and test if it is blocked or not.
You can make any changes to the editor and check new rules but in order for these to be applied to your live robots.txt, you need to edit your file with a text editor and upload the file to your website’s root folder.
To inform Google that you have made changes to your robots, click the SUBMIT button and then select ‘Ask Google to update’.