Update 1.92 – Revamped Google Indexation Checker
Since our last big update when we added Lighthouse data to Page Speed, we’ve run a number of low-key updates that didn’t warrant a blog post about them. But this time we’ve got a fix to a core feature, so we suggest everyone jumps on this new version of the software.
Improved: Google Indexation Checker
URL Profiler’s Google indexation checker has been one of our most popular features for many years. Back in 2015 we published some revolutionary findings, and ended up building the most advanced indexation checker on the market. We used a tiered combination of checks, including the info: operator, to determine whether a URL was in the ‘main’ index or the ‘deep’ index. It was very cool.
But times have changed. True to form, Google became more aggressive against scrapers, and earlier this year they killed the info command. Recently, my good friend Bill Sebald told me that Google had finally killed the secret API they were using to power an index checker of their own, over on the Greenlane site. Very shortly after, we started receiving complaints that our index checker was returning inconsistent results.
And it had suddenly become very inconsistent. You could run the same set of URLs several times in a row, and sometimes a URL would be indexed and sometimes it wouldn’t.
WTF was going on? Quite simply, Google was deliberately screwing with the requests. Although we had dropped the info command, they were detecting an automated pattern and blocking it all the same. Gareth and I banged our heads together and came up with a methodology that would circumvent their attempts to block the scrape. We’ve been testing and tweaking for the last few days, and it now works really well.
It gives a super clear ‘Yes/No’ answer, and works perfectly almost all of the time. While it will never offer a false ‘Yes’, we have seen some URLs occasionally generate a false ‘No’, which means it says it is not indexed, when it actually is. In all our testing we have experienced this very rarely, and only on specific types of URL, seemingly where Google occasionally changes the type of SERP they want to display. While this won’t affect most users, I think it is worth pointing this out – when scraping Google it is practically impossible to offer a 100% cast iron guarantee.
As per usual, you can skip to the bottom of this post and grab the update links if you want to try it out, or read further below for more detail about what else we have changed in the tool.
If you have not yet had a play with the indexation checker, I have updated our guide on how to use it:
During our testing, we also noticed that Google has become a LOT more strict on blocking proxies. We have added a number of human emulation features to help stop this from happening, such as randomising the user agent, and adding a random wait time between requests. We have also slowed the whole thing down, so that URL Profiler will simply not check as fast as it used to.
There are a few truths we can’t really influence any futher:
- It is still the case that more proxies will allow you to check URLs quicker (although not quite as fast as before)
- It is still the case that eventually, your proxies will start to burn out and return ‘Connection Failed’
- It is still the case that you will get better proxy performance with dedicated proxies
We have also updated our guide on using proxies with URL Profiler, so you may want to check that out too.
Added: Manage Your Own Licenses
This will please anyone that has had their computer die on them, then tried to activate URL Profiler on their replacement machine, only to be told that the license was still active on the old (and now dead!) machine, and they are forced to email support to get it sorted. Well…no longer!
Now you can just follow the on-screen prompt, and deactivate the old machine all on your own.
Improved: Updated Safe Browsing API
One of the lesser known features of URL Profiler is the Malware check – enter a massive list of domains and you can check them against Google’s Safe Browsing API, and if they come back with ‘Dangerous’… avoid!
This is an update to the latest version (v4) of the API, which required a bit of work to make sure we were playing nicely with their rate limits. You need to set up a free API key to use it, but once this is done you basically never need to touch it again.
The Malware check will take your list of URLs and run them against Google’s constantly updated lists of unsafe web resources. Examples of unsafe web resources are social engineering sites (phishing and deceptive sites) and sites that host malware or unwanted software. URLs listed as ‘Dangerous’ are considered unsafe, and should be avoided.
Improved: Cleaner URL Importing
URL Profiler will now intelligently fix misconfigured URLs that you enter:
- It will add https:// if you enter URLs with no protocol (e.g. ‘example.com/page1’)
- It will fix the errors like https:/ and add the second slash
This means you get to be extra lazy when entering URLs 🙂
Bug Fixes & Smaller Updates
As always, there are a couple of bugs to fix, and some of these ones have been quite annoying for a few users, so we wanted to make sure we got them fixed:
- Improved URL importing on Mac, it is much much faster now.
- Resolved an issue with ‘Import CSV and Merge Data‘ which meant it would not work with CSV files that used semi-colons instead of actual commas, which can happen on some regional settings.
- The WHOIS dates were not coming back in date format in the Excel sheet.
- On some regional settings (where comma is the decimal), the IP address was being returned as a string of numbers without the dots (e.g. 94.82.76.5 would come out as 9482765). This formatting was being forced by Excel, in a way that we cannot change. Now, we wrap the IP address in quote marks, to stop Excel changing the formatting (i.e. “94.82.76.5”).
- Removed LinkedIn share count, which has been discontinued.
Downloads
Existing customers or existing trial users can grab the new update from here:
If you’ve not tried URL Profiler yet, you can start a free 14 day trial here. The trial is fully featured, and you don’t need to give us any payment details to get started.