Why are log files important for SEO?
For starters, they contain information that is not available elsewhere
Log files are also one of the only ways to see Google’s actual behavior on your site. They provide useful data for analysis and can help inform valuable optimizations and data-driven decisions.
Performing log file analysis regularly can help you to understand which content is being crawled and how often, and answer other questions around search engines crawling behavior on your site.
It can be an intimidating task to perform, so this post provides a starting point for your log file analysis journey.
What are Log Files?
Log files are records of who accessed a website and what content they accessed. They contain information on who has made the request to access the website (also known as ‘The Client’).
This could be a search engine bot, such as Googlebot or Bingbot, or a person viewing the site. Log file records are collected and kept by the web server of the site, and they are usually kept for a certain period of time.
What Data Does a Log File Contain?
A log file typically looks like this:
27.300.14.1 – – [14/Sep/2017:17:10:07 -0400] “GET https://allthedogs.com/dog1/ HTTP/1.1” 200 “https://allthedogs.com” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
Broken down, this contains:
The client IP.
A timestamp with the date and time of the request.
The method of accessing the site, which could be either GET or POST.
The URL that is requested, which contains the page accessed.
The Status Code of the page requested, which displays the success or failure of the request.
The User Agent, which contains extra information about the client making the request, including the browser and bot (for example, if it is coming from mobile or desktop).
Certain hosting solutions may also provide other information, which could include:
The host name.
The server IP.
Bytes downloaded.
The time taken to make the request.
How to Access Log Files
As mentioned, log files are stored by the web server for a certain period of time and are only made available to the webmaster(s) of the site.
The method to access these depends on the hosting solution, and the best way to find out how they can be accessed is to search their docs, or even to Google it!
For some, you can access log files from a CDN or even your command line. These can then be downloaded locally to your computer and parsed from the format they are exported in.
Why is Log File Analysis Important?
Performing log file analysis can help provide useful insights into how your website is seen by search engine crawlers.
This can help you inform an SEO strategy, find answers to questions, or justify optimizations you may be looking to make.
It’s Not All About Crawl Budget
Crawl budget is an allowance given by Googlebot for the number of pages it will crawl during each individual visit to the site. Google’s John Mueller has confirmed that the majority of sites don’t need to worry too much about crawl budget.
However, it is still beneficial to understand which pages Google is crawling and how frequently it is crawling them.
I like to view it as making sure the site is being crawled both efficiently and effectively. Ensuring the key pages on the site are being crawled and that new pages and often changing pages are found and crawled quickly is important for all websites.
Different SEO Analyzers
There are several different tools available to help with log file analysis, including:
Splunk.
Screaming Frog Log File Analyser.
If you are using a crawling tool, there is often the ability to combine your log file data with a crawl of your site to expand your data set further and gain even richer insights with the combined data.
Search Console Log Stats
Google also offers some insights into how they are crawling your site within the Google Search Console Crawl Stats Report.
I won’t go into too much detail in this post, as you can find out more here.
Essentially, the report allows you to see crawl requests from Googlebot for the last 90 days.
You will be able to see a breakdown of status codes and file type requests, as well as which Googlebot type (Desktop, Mobile, Ad, Image, etc.) is making the request and whether they are new pages found (discovery) or previously crawled pages (refresh).
Comments