ignore

How To Use The Google Indexation Checker

How To Use The Google Indexation Checker
Posted on: February 23rd, 2015 by Patrick Hathaway in Guides

This is a step-by-step guide on how to use the Google Indexation Checker feature in URL Profiler, and how to interpret the results.

To get a more complete overview of how this function works, I also recommend you read the accompanying blog post, which introduces the Google Index Checker feature.

How To Set Up Indexation Checks

The most common use-case for this feature is to profile all the URLs on a single site, as part of a technical site audit.

We could import a Screaming Frog crawl, but for the purpose of this example we’ll just import our sitemap. Then select ‘Google Indexation’ under the ‘Google’ option.

URL Profiler Google Indexation

Adding Proxies

If you’ve not added proxies to URL Profiler before, the first time you select this option you will be shown a warning:

Add Proxies Warning
If you choose ‘No’, you can get away with it for smaller runs, but it will take a lot longer. Anything over about 100 URLs is likely to get your IP banned by Google.

Proxies are required because this feature automatically queries Google in bulk – and Google DO NOT like you doing this! (there is no other way to check URL level indexation)

Recommended Proxies

We recommend a provider called BuyProxies.org, and we suggest you use their Dedicated Proxies.

I have written a full, detailed guide which explains exactly how to use proxies with URL Profiler and how to get set up with BuyProxies.org.

More Proxies = Faster Results

If speed is your priority, more proxies will get the job done faster for you. You can in theory use as many proxies as you want, but we wouldn’t recommend using less than 10.

These are not hard and fast rules, as they also depend upon the speed and reliability of your proxies. If you start to see results slow down dramatically, you might need to check your proxies are still working ok.

The table below will give you an idea of how to work out what you may need. Again, please see the proxy guide post for more detail on this.

No. Proxies Checking Speed 1000 URLs Will Take Suggested Max*
10 1 every 4 seconds Approx 70 minutes 1,250 URLs
20 1 every 3 seconds Approx 50 minutes 2,500 URLs
50 1 every 2 seconds Approx 35 minutes 6,500 URLs

*Suggested maximum per profile

Once you have your URLs in, and some proxies loaded, you are ready to go! Just hit ‘Run Profiler’ and wait for the program to complete.

Interpreting The Results

Again, I strongly suggest you read the accompanying blog post, so that you understand where we’re coming from with our results.

The index checker will return 5 columns of data, as follows:

  • Google Indexed: Can we find the URL in the base index? Result is either, Yes, No or Alternative URL.
  • Google Info: Indexed: We only check this if the URL is not in the base index (i.e. did not get a ‘Yes’ in the first column). Result is either, Yes, No, Not Checked or Alternative URL.
  • Google Index: Based on the checks above, we determine which index the URL is in. Result is either Base, Deep, or None.
  • Google Indexed Alternative URL: If we found an alternative URL indexed instead of the one we searched for, we display this here.
  • Google Cache Date: Simply displays the last cache date for each URL. If there is no cache date, the result is listed as ‘Not Cached’. On occasion we are unable to check the cache date, in which case the message ‘Check Failed’ is displayed instead.

This doesn’t really make a lot of sense without examples, so I will give examples below for each of the logical options.

URL in Base Index

This is the most ‘normal’ result. The URL is properly indexed in the Base index, and it returns as the top result when you search Google for the exact URL.

Perfect IndexationWe don’t check indexation using the info: operator, as there is no need.

URL in Deep Index

This is probably the most abnormal result, and the one we discovered when testing the index checker in the first place. It is possible for URLs to return when queried using the info: operator, but not when searched more generally, and don’t return at all when you search Google for the exact URL.

Deep Indexed URL

As far as we are able to determine, URLs in the ‘Deep’ index are not findable under normal circumstances, and therefore not in the Base index from which Google serve their results (i.e. these URLs will never get you any traffic)

URL Not Indexed

A very useful result, but not one you want to see a lot of I’d guess. This means we couldn’t find the URL in the Base Index or the Deep Index – it is not indexed at all.

Not Indexed

Alternative URL Indexed

So far we have seen the ‘Alternative URL’ column empty. This comes into play when we process a info: command and get given a different URL to the one we specified. We say that the URL is ‘None’ for the Google Index column as the specific URL you requested is not actually indexed.

Alternative URL Indexed

The 4th column ‘Google Indexed Alternative URL’ is where we specify what the different URL is that Google returned.

Typically this happens for canonical URLs:

Not The URL We Searched For

Cache Date

The final column we return is ‘Cache Date’. It is not on the screenshots above as I wanted to make sure the distinction was clear between indexing and caching, as they are not the same thing.

Cache Date is the last date Google cached your page, or in cases where your page has not actually changed, it is the date that Google last requested your page for crawling.

Typically these results will look like this:

Google Cache Date

The data is pretty straightforward, and can generally be considered as a good proxy for ‘last crawl date’.

There are also some less straightforward results that this check generates:

Cache Check Odd Results

Not Cached

This simply means we were unable to find the cache link, meaning it is not cached by Google at all.

You can see this in the SERPs by the absence of the green dropdown link.

Not Cached

Of all the results we have tested, ‘Not Cached’ seems to have given us the most false positives (still single digit % though) – so if you are concerned, it might be worth re-running your ‘Not Cached’ results.

No Date Found

This is a bit more unusual, and it represents URLs that are cached, but for some reason Google are not displaying a cache date.

No Cache Date

In this example, the page redirects via 301 to a downloadable PDF. The cached content is a HTML version of the document, but they offer no cache date.

Not Found

This is different to both of the above cases, and represents URLs that serve a 404 when we request the webcache:

Cache 404 Error

When Gareth told me about this, his words were (quite literally):

“It’s Google fucking with us.”

Whether this is the case or not, we are unable to get the results.

The Most Important KPI for SEO

The reasons for checking index status are simple – if your pages are not indexed they can not generate organic search traffic.

Further, if your pages are only indexed in the ‘Deep’ index, they can not generate organic search traffic.

Often, you’ll be looking for the inverse – you will have pages you don’t want indexed and you’ll want to make sure that they’re not.

Either way, thoroughly checking indexation can be one of the most important stages in a technical site audit.

URL Profiler’s index checker will allow you to do this more accurately and more thoroughly than any other SEO tool on the market.

Patrick Hathaway

By Patrick Hathaway

I seem to be the one that writes all the blog posts, so I am going to unofficially name myself 'Editor'. In fact, I think I prefer Editor-in-chief. You can follow me on Twitter or 'encircle me' on .

If You Like the Sound of URL Profiler,
Download a Free Trial Today

(You'll be amazed by how much time it saves you, every day!)

  • Free 14 day trial (full feature)
  • No credit card required
  • License from only £12.95 a month

Comments

  • Wolfgang Koehlerr
  • LinkiCZ

    Hi Patrick, if I get “Connection Failed” result in Google Indexed, that means my proxy connection failed, right? How can I avoid it? I have set maximum retries to 5. Is there any chance to can updat the tool to switch to the next proxy in the list instead of letting it fail?

    • HathawayP

      There is no way to completely avoid it, every time. The max retries already works by trying another proxy if it gets a fail. So you can bump that up to 10 to help it a bit. But for some reason sometimes Google just does not want to process a particular query at that time (them messing with us again). So if you wait a few minutes and try again you can generally get them all.

  • lury

    Patrick, could you automatically check for indexation of alternative url? I have to re-run these checks for my domains because in my case i don’t store info if a site is www or non-www. Of course i’m aware that if alternate url is indexed it could mean many ugly things but anyway i’d expect this tool to add alternate url’s to the list and check them separately.

    • HathawayP

      The tool is already doing this. If an alternative URL is found, ‘Alternative URL’ will be listed. Then in the column ‘Google Indexed Alternative URL’ you will find the alternative URL which is listed.

      Any listed alternative URL is indexed by definition.

      As an example you can try http://www.urlprofiler.com, this will come back with an alternative URL of http://urlprofiler.com – which is the indexed version of the page.

      Hope that answers your question.

      • lury

        well, it’s not doing it – or at least it’s not a full check – i’ve asked because i was surprised that so many of my sites got not in index info, and after re-adding real url i got correct data.

        • HathawayP

          If you share a URL or two I can debug and get it fixed if it’s not doing what it is supposed to – please email support@urlprofiler.com and we’ll get on it.

  • Marston Gould

    Have you ever thought about repeating this with Bing?

  • Google index checker analyses on how easily and quickly google is able to crawl or index on a website. This tool is also useful in checking the google index stats of multiple websites at a time.

Ready to take your content auditing seriously?