How To Use The Google Indexation Checker
Posted on: February 23rd, 2015 by Patrick Hathaway in Guides

This is a step-by-step guide on how to use the Google Indexation Checker feature in URL Profiler, and how to interpret the results.

To get a more complete overview of how this function works, I also recommend you read the accompanying blog post, which introduces the Google Index Checker feature.

How To Set Up Indexation Checks

The most common use-case for this feature is to profile all the URLs on a single site, as part of a technical site audit.

We could import a Screaming Frog crawl, but for the purpose of this example we’ll just import our sitemap. Then select ‘Google Indexation’ under the ‘Google’ option.

URL Profiler Google Indexation

Adding Proxies

If you’ve not added proxies to URL Profiler before, the first time you select this option you will be shown a warning:

Add Proxies Warning
If you choose ‘No’, you can get away with it for smaller runs, but it will take a lot longer. Anything over about 100 URLs is likely to get your IP banned by Google.

Proxies are required because this feature automatically queries Google in bulk – and Google DO NOT like you doing this! (there is no other way to check URL level indexation)

Recommended Proxies

We recommend a provider called BuyProxies.org, and we suggest you use their Dedicated Proxies.

I have written a full, detailed guide which explains exactly how to use proxies with URL Profiler and how to get set up with BuyProxies.org.

More Proxies = Faster Results

If speed is your priority, more proxies will get the job done faster for you. You can in theory use as many proxies as you want, but we wouldn’t recommend using less than 10.

These are not hard and fast rules, as they also depend upon the speed and reliability of your proxies. If you start to see results slow down dramatically, you might need to check your proxies are still working ok.

The table below will give you an idea of how to work out what you may need. Again, please see the proxy guide post for more detail on this.

No. Proxies Checking Speed 1000 URLs Will Take Suggested Max*
10 1 every 4 seconds Approx 70 minutes 1,250 URLs
20 1 every 3 seconds Approx 50 minutes 2,500 URLs
50 1 every 2 seconds Approx 35 minutes 6,500 URLs

*Suggested maximum per profile

Once you have your URLs in, and some proxies loaded, you are ready to go! Just hit ‘Run Profiler’ and wait for the program to complete.

Interpreting The Results

Again, I strongly suggest you read the accompanying blog post, so that you understand where we’re coming from with our results.

The index checker will return 5 columns of data, as follows:

  • Google Indexed: Can we find the URL in the base index? Result is either, Yes, No or Alternative URL.
  • Google Info: Indexed: We only check this if the URL is not in the base index (i.e. did not get a ‘Yes’ in the first column). Result is either, Yes, No, Not Checked or Alternative URL.
  • Google Index: Based on the checks above, we determine which index the URL is in. Result is either Base, Deep, or None.
  • Google Indexed Alternative URL: If we found an alternative URL indexed instead of the one we searched for, we display this here.
  • Google Cache Date: Simply displays the last cache date for each URL. If there is no cache date, the result is listed as ‘Not Cached’. On occasion we are unable to check the cache date, in which case the message ‘Check Failed’ is displayed instead.

This doesn’t really make a lot of sense without examples, so I will give examples below for each of the logical options.

URL in Base Index

This is the most ‘normal’ result. The URL is properly indexed in the Base index, and it returns as the top result when you search Google for the exact URL.

Perfect IndexationWe don’t check indexation using the info: operator, as there is no need.

URL in Deep Index

This is probably the most abnormal result, and the one we discovered when testing the index checker in the first place. It is possible for URLs to return when queried using the info: operator, but not when searched more generally, and don’t return at all when you search Google for the exact URL.

Deep Indexed URL

As far as we are able to determine, URLs in the ‘Deep’ index are not findable under normal circumstances, and therefore not in the Base index from which Google serve their results (i.e. these URLs will never get you any traffic)

URL Not Indexed

A very useful result, but not one you want to see a lot of I’d guess. This means we couldn’t find the URL in the Base Index or the Deep Index – it is not indexed at all.

Not Indexed

Alternative URL Indexed

So far we have seen the ‘Alternative URL’ column empty. This comes into play when we process a info: command and get given a different URL to the one we specified. We say that the URL is ‘None’ for the Google Index column as the specific URL you requested is not actually indexed.

Alternative URL Indexed

The 4th column ‘Google Indexed Alternative URL’ is where we specify what the different URL is that Google returned.

Typically this happens for canonical URLs:

Not The URL We Searched For

Cache Date

The final column we return is ‘Cache Date’. It is not on the screenshots above as I wanted to make sure the distinction was clear between indexing and caching, as they are not the same thing.

Cache Date is the last date Google cached your page, or in cases where your page has not actually changed, it is the date that Google last requested your page for crawling.

Typically these results will look like this:

Google Cache Date

The data is pretty straightforward, and can generally be considered as a good proxy for ‘last crawl date’.

There are also some less straightforward results that this check generates:

Cache Check Odd Results

Not Cached

This simply means we were unable to find the cache link, meaning it is not cached by Google at all.

You can see this in the SERPs by the absence of the green dropdown link.

Not Cached

Of all the results we have tested, ‘Not Cached’ seems to have given us the most false positives (still single digit % though) – so if you are concerned, it might be worth re-running your ‘Not Cached’ results.

No Date Found

This is a bit more unusual, and it represents URLs that are cached, but for some reason Google are not displaying a cache date.

No Cache Date

In this example, the page redirects via 301 to a downloadable PDF. The cached content is a HTML version of the document, but they offer no cache date.

Not Found

This is different to both of the above cases, and represents URLs that serve a 404 when we request the webcache:

Cache 404 Error

When Gareth told me about this, his words were (quite literally):

“It’s Google fucking with us.”

Whether this is the case or not, we are unable to get the results.

The Most Important KPI for SEO

The reasons for checking index status are simple – if your pages are not indexed they can not generate organic search traffic.

Further, if your pages are only indexed in the ‘Deep’ index, they can not generate organic search traffic.

Often, you’ll be looking for the inverse – you will have pages you don’t want indexed and you’ll want to make sure that they’re not.

Either way, thoroughly checking indexation can be one of the most important stages in a technical site audit.

URL Profiler’s index checker will allow you to do this more accurately and more thoroughly than any other SEO tool on the market.

Patrick Hathaway

Patrick Hathaway

I seem to be the one that writes all the blog posts, so I am going to unofficially name myself 'Editor'. In fact, I think I prefer Editor-in-chief.

    Hi Patrick, if I get “Connection Failed” result in Google Indexed, that means my proxy connection failed, right? How can I avoid it? I have set maximum retries to 5. Is there any chance to can updat the tool to switch to the next proxy in the list instead of letting it fail?

    • HathawayP

      There is no way to completely avoid it, every time. The max retries already works by trying another proxy if it gets a fail. So you can bump that up to 10 to help it a bit. But for some reason sometimes Google just does not want to process a particular query at that time (them messing with us again). So if you wait a few minutes and try again you can generally get them all.

    Patrick, could you automatically check for indexation of alternative url? I have to re-run these checks for my domains because in my case i don’t store info if a site is www or non-www. Of course i’m aware that if alternate url is indexed it could mean many ugly things but anyway i’d expect this tool to add alternate url’s to the list and check them separately.

    • HathawayP

      The tool is already doing this. If an alternative URL is found, ‘Alternative URL’ will be listed. Then in the column ‘Google Indexed Alternative URL’ you will find the alternative URL which is listed.

      Any listed alternative URL is indexed by definition.

      As an example you can try http://www.urlprofiler.com, this will come back with an alternative URL of http://urlprofiler.com – which is the indexed version of the page.

      Hope that answers your question.

        well, it’s not doing it – or at least it’s not a full check – i’ve asked because i was surprised that so many of my sites got not in index info, and after re-adding real url i got correct data.

        • HathawayP

          If you share a URL or two I can debug and get it fixed if it’s not doing what it is supposed to – please email support@urlprofiler.com and we’ll get on it.

    Have you ever thought about repeating this with Bing?

  • Google index checker analyses on how easily and quickly google is able to crawl or index on a website. This tool is also useful in checking the google index stats of multiple websites at a time.

    Hi Partick, in your opinion if a page is in the deep index, does Google follow the links on that page, i.e. does the link and its’ anchor text count?

    • HathawayP

      Sheesh. Hard to say for definite, we’ve never tested it. But if a URL is in the deep index, it suggests that Google probably doesn’t think it’s a good result to show users. So personally, I wouldn’t trust it in terms of passing PageRank.

  • great tool and post. I have just migrated to HTTPS and only 75% of my urls were being indexed in GWM. This tool shows me that most of those missing are still indexed with the former HTTP url. Only 10 posts actually showed up unindexed. Question: How on earth do you get google to index those remaining posts that are truly unindexed? They are not that bad, and evergreen content. Just forgotten….

    • HathawayP

      Glad it proved useful! Based on what you’ve said, it sounds as though 75% of your indexed URLs as the correct HTTPS version, which is a good thing. That means Google has crawled the HTTP URLs, seen the 301 redirects, and updated the index. Of the remaining 25%, it is likely that they have not yet been recrawled, or maybe only crawled once or twice.

      In time they will change across to HTTPS as well, but if you are concerned you can do things like ‘Fetch and Render’ in Google Search Console.

      The final 10 URLs that are not indexed at all – it is highly likely that these were actually not indexed before the migration (rather than they have suddenly dropped out when you migrated). So if you are convinced that the content is good and they ‘should’ be indexed (i.e. not thin/duplicate content) then it probably means that Google aren’t crawling them very regularly. So you can do things like internally link to them from a page high up in your site hierarchy, or make sure they are included in your XML sitemap, or build some external links to them, or do the ‘Fetch and Render’ job on them all (or all of the above!).

      • thanks. i will do that. btw, some other people mentioned it – I think the very act of scraping google triggers a proper google recrawl rather than a half hearted GWM crawl. Can’t be sure, but my indexed posts, having been stuck on the same number for a while have just jumped after running this software

        • HathawayP

          Yeah I’ve seen the same thing too, although never tied it down as a causal link. Although it would more likely be the act of searching for a URL (and/or the info: search), rather than the scraping bit.

Ready to take your content auditing seriously?