The Importance of Crawling Your Entire eCommerce Website
If you are a marketing professional, it’s always good to have a crawler among your tools. There is an important rule you need to learn about organic search engines. The engine must crawl to your page first, and once that is done, your page can start ranking and the drive of sales and traffic can begin. Your pages can’t exist in the eyes of the search engine until the crawl is successful. The rankings are only available for those pages which the search engine already recognized.
However, you can tell the search engines that your page actually exist by creating an XML sitemap, but that won’t be enough. The sitemap only gets your pages indexed, which is fine if you have almost no competition with your site in the rankings, but otherwise an XML alone can’t really help your rank.
When it comes to your site’s SEO performance, it’s strongly related to the depth of the crawl. In order to optimize your site, you must analyze your crawl first. We will discuss some crawler recommendations at the bottom of this article. Now we are going to focus on a few reasons to start crawling your website.
Can Spiders Crawl Your Website?
You need to examine which pages are associated with your site while making sure you don’t miss a single one. This is an easy task if you investigate this subject through the eyes of a crawler that is quite similar to Google’s main web crawlers. Are all products really placed on your website? Are they in the appropriate category? Are there additional pages you just didn’t know about? Or it may be the case that merchandising or any kind of marketing duplicated your pages.
What are Crawl Blocks?
If the crawler could not access your web pages, they simply won’t show up in the report when the crawl is done. Pay close attention and try to figure out what is missing when you inspect the output file. If any of your pages are missing, that means the crawler has either been obstructed by some errors and did not complete (check error messages), or they are inaccessible by the crawler.
Once a crawl block appears, the nature of the block can be determined if you know which pages are exactly missing. If you identify that your size, style and color filter pages are missing, then it leads to a common but rather damaging SEO problem: AJAX filters that are not changing the URL while refreshing and narrowing products visible on the screen.
There may be pages missing a combination of letters from their URL. Robots.txt disallows can also cause problems by disallowing more than planned. Or is your whole site missing? Look through the robots.txt and search for a global disallow or a meta robots NOINDEX command.
Which URLs are Disallowed?
Some crawlers notify you when they bump into pages that can be crawled but a robots.txt disallowed them. This is a great feature that can help you in finding files and fixing them, thus you can allow the pages which were unintentionally disallowed.
Find and Fix 404 Errors
Most ecommerce sites have 404 errors, and many of them show a 404 error for each product that is discontinued. However, those error pages are most of the time useful to customers, but most of the time they are not crawlable in the website’s navigation. You can’t link to a product that is discontinued from the page. The search engines have indexed the page before so they clearly know it was there, so all they need is to see a 404 error and it will be immediately de-indexed.
When a search engine finds 404 error pages linked within your site navigation, it will be considered as a sign of poor user experience. The decrease of your search ranking is strongly related to the quantity of 404 errors on your site, not to mention other signals that can make it even worse.
You can get 404 reports in various ways, but that’s only a document with a bunch of URLs that are returning 404 errors. A crawler is capable of showing the specific error pages that are linked to in a certain way so search engines are able to crawl to them. This tool can highlight you the reason why your 404 errors are displayed so they can be easily resolved. It’s important to know which pages are linked to error pages and how many of them overall.
The Identification of Redirects
Crawlers are quite effective in identifying 404 errors, but they can do the same with redirects as well. First of all, it’s recommended to look for opportunities to convert 302 redirects into a 301 type. Secondly, all redirects need to be reviewed in order to determine how many of them are needed before the crawler reaches the page that returns a 200 OK. And lastly, it’s important to examine if the final destination page the one you intended to land on.
According to Google, about 15 percent of the authority is leaking while being transferred to the receiving page with each 301 redirect. Therefore this number greatly sums up if one of your pages redirect to another redirect many times and it should be limited.
Find Weak Meta Data
Poorly written or duplicate title tags can be sorted out in Excel by arranging them alphabetically, assuming you managed to get the data. A crawler is an outstanding tool for this task, since it collects meta keywords and meta description fields as well. Make optimization easier for yourself by quickly prioritizing the areas that need the most help.
It usually takes too much time to review all the meta data on your pages without a crawler. Not only that it’s tedious, but you don’t even know how many pages need to be sampled on your site in order to rest assured that their meta data is correct. Even then, it’s still possible that you forgot to review some pages that contain incorrect tags. It may be just a few pages that you haven’t sampled, but those can cost you dearly because there are meta tags like robot noindex , which simply restricts search engines from indexing pages.
Check Canonical Tabs
A lot of companies are working canonical tabs nowadays, but it’s still easy to use them incorrectly. You can find them on many pages, and their main purpose is to reference the specific page. This kind of defeats the purpose of having them as they also generate additional duplicate content that are meant to be removed by the tags.
Take a look at the canonical tags on your pages that have duplicate content and determine if the duplicate version of the content represents a canonical page.
Analyzing and Understanding Custom Data
Go beyond a crawler’s standard data set and examine custom fields and you will be able to find out more about particular fields: do they exist, are they populated and what they contain. You should get used to regular expressions (RegEx, which identifies character patterns) or XPath (XML document identification), and you can also command the crawler grab the analytics code from pages, the price of each product, Open Graph tags or structured data and more.
Gather Data from Google Tools
Google Search Console and Google Analytics can provide analytics data which can be quite beneficial for crawlers. They report the data for each crawled page while saving precious time for themselves in determining the relative value for their page optimizations. It’s important to think about whether or not to increase the traffic on your page. Should it be driving more traffic? You can come to a clear conclusion by running a report that shows you how much data is needed to optimize the page.
How to Crawl Your Website?
Find the crawler that suits your needs and don’t forget to use it often. One of the best crawlers is SEO Spider from Screaming Frog, as it can do everything that I mentioned above.
They have developed an outstanding crawler with some excellent features. SEO Spider can satisfy your needs and it also generates reports which can be opened in Excel. The 99 pounds investment is absolutely worth it in my opinion, since this tool will help you a lot. SEO Spider is regularly updated by its developer and new features are also added from time to time.
If you have a smaller site, consider downloading their demo for free. Although its features are pretty much limited, it can still crawl up to 500 pages which is fine for starters. Once you used them up, try out free tools like GSite Crawler or Xenu Link Seuth. Of course, you can find others on the web, but both of these are assured to be quite good.
Xenu Link Seuth was developed by a guy who uses Link Seuth to share his religious views. He actually made an outstanding tool which is recommended for everyone. However, it’s already over ten years old and not updated or supported anymore, so don’t be surprised if you get varied results.
On the other hand, Link Seuth crawls deeper than SEO Spider and it doesn’t even need that much system memory. With Link Seuth, you can export to CSV and the data can be used to determine the number of pages existing on your website, to look for crawl blocks and to find 404 errors and redirects.
The creator of GSite Crawler is an ex-Google employee who developed it to be more like an XML sitemap creator. It can be still used to analyze which pages are on your site and find crawl blocks, but no additional features are available.