Wish your product listings could rank better on search engines?
Feel like your success is as limited as your crawl budget?
Effectively ranking thousands of products, hundreds of categories, and millions of links requires a level of organization that can sometimes feel out of reach.
This is particularly true when the performance of your ecommerce site depends on a limited crawl budget or disconnected teams.
For large ecommerce sites, it’s an immense challenge.
In this post, you’ll find proven solutions for some of the most persistent technical SEO issues plaguing sites like yours. You’ll learn how to tackle issues around crawl budget, site architecture, internal linking, and more that are holding your site’s performance back.
Let’s get to it.
1. Crawl Budget Is Often Too Limited To Provide Actionable Insights
Growing your ecommerce business is great, but it can lead to a massive volume of pages and a disorganized, outdated website structure.
Your company’s incredible growth has likely led to:
- Extensive SEO crawl budget needs.
- Lengthy crawling processes.
- High crawl budget waste from easily-missed, outdated content, such as orphan and zombie pages, that no longer need to be crawled or indexed.
- Difficult-to-follow reports filled with repetitive fundamental technical errors on millions of pages.
- Incomplete and segmented crawl data, or partial crawls.
Trying to solve SEO problems using partial crawls isn’t a great idea; you won’t be able to locate all the errors, causing you to make SEO decisions that may do more harm than good.
Whether your crawl budget limitations are from website size or desktop-based crawling tools, you need a solution that allows you to review and understand your full website, as a whole — with no limits.
The Solution: Use Raw Logs Instead Of Crawl Reports
To overcome the issue of slow, limited crawl budgets, we recommend using raw logs instead of crawl reports.
Raw logs give you the power to:
- Monitor crawling, indexation, and detailed content analysis at a more reasonable price.
- Understand which pages are impacting your crawl budget and optimize accordingly.
- Eliminate critical errors right after a product update.
- Allow you to fix issues before Google bots discover them.
- Quickly identify pages with 2XX, 3XX, 4XX, and 5XX status codes.
Using a raw log tool also gives you the exact picture of a site’s SEO efficiency.
You’ll be able to pull reports that show the number of pages in the site structure, the pages getting search bot visits, and the pages getting impressions in SERPs.
This gives you a clearer picture of where structure and crawling issues occur, at any depth.
For example, we can see there are more than 4 million pages in the site structure above.
Only 725,161 are visited by search bots.
And only 29,277 of these pages are ranked and getting impressions.
24,189,025 pages visited by search bots that aren’t even part of the site structure.
What a missed opportunity!
How To Discover & Solve SEO Crawl Issues Faster With Raw Logs
Implement a no-limit SEO analysis tool that can crawl full websites of any size and structure.
Blazing fast, JetOctopus can crawl up to 250 pages per second or 50,000 pages in 5 minutes, in order to help you understand how your crawl budget is affected.
- Create an account at JetOctopus.
- Access the Impact section.
- Evaluate your Crawl Ratio and missed pages.
In seconds, you can measure the percentage of SEO-effective pages and know how to improve the situation.
Our Log Analyzer tracks crawl budget, zombie and orphan pages, accessibility errors, areas of crawl deficiency, bot behavior by distance from the index by content size, inbound links, most active pages, and more.
With its effective visual representation, you can boost indexability while optimizing the crawl budget.
Crawl budget optimization is central to any SEO effort and even more so for large websites. Here are a few points to help you get started.
Identify whether your crawl budget is being wasted.
Log file analysis can help you identify the reasons for crawl budget waste.
Visit the ‘Log File Analysis’ section to determine this.
Get rid of error pages.
Review the site’s crawl through log file analysis to find pages that may have 301, 400, or 500 errors.
Improve crawl efficiency.
Use SEO crawl and log file data to determine the disparities between the crawled and indexed pages. Consider the following to improve crawl efficiency.
- Make sure the GSC parameter setting is up to date.
- Check if any important pages are included as non-indexable pages. The data in log files will help you locate them.
- Add disallow paths in the robots.txt file to save your crawl budget for priority pages.
- Add relevant noindex and canonical tags to indicate their level of importance to the search bots. However, noindex tags do not work well in the case of multimedia resources, namely videos and PDF files. In such cases, use robots.txt.
- Look for disallowed pages being crawled by search bots.
2. Managing A Massive Internal Linking Structure Can Be Complicated
Internal linking is one of the best ways you can inform Google of what exists on your website.
When you create links to your products from pages on your site, you give Google a clear path to crawl in order to rank your pages.
Google’s crawlers use a website’s internal linking structure and the anchor text to derive contextual meaning and discover other pages on the site.
However, creating a crawl-friendly internal linking structure is tough for large-scale websites.
Keeping up with internally linked products that constantly go in and out of stock isn’t always sustainable on a large ecommerce site.
You need a way to see where deadends happen during a Google crawl.
Why Internal Linking Structure Matters
Google relies on internal linking to help it understand how visitors can quickly and easily navigate through the website.
If your homepage ranks well for a specific keyword, internal links help in distributing PageRank to other, more focused pages throughout the site. This helps those linked pages rank higher.
The Solution: Find Crawl Dead-Ends With Interlinking Structure Efficiency Tools
Our Interlinking Structure Efficiency solves this issue by giving you a clear view of your site’s internal linking health.
- On the dashboard, go to Ideas -> Structure Efficiency.
- This screenshot shows the list of directories that are present on the website, the pages in this directory, the percentage of indexable pages, the average number of internal links to a page within this directory, the bot’s behavior here, SERP impressions, and clicks. It clearly reflects SEO efficiency by directories to analyze and multiply the positive experiments.
Check out how our client DOM.RIA Doubled Their Googlebot Visits by experimenting with it.
- Crawlability: JS content limits a crawler’s ability to navigate the website page by page, impacting its indexability.
- Obtainability: Though a crawler will read a JS page, it cannot figure out what the content pertains to. Thus, they will not be able to rank the pages for the relevant keywords.
As a result, ecommerce webmasters cannot determine which pages are rendered and which aren’t.
It also shows the JS errors.
Here’s how to put this feature to work for you:
- Go to JS Performance in the Crawler tab.
4. Few Tools Offer In-Depth Insights For Large Websites
Core Web Vitals and Page Speed are significant technical SEO metrics to be monitored. However, few tools track these page by page.