ignore

Scraping Twitter Lists To Boost Social Outreach (+ Free Tool!)

Posted on: October 27th, 2014 by Patrick Hathaway in How To

I published a post a few weeks ago describing how to build your own twitter custom audience list, outlining a variety of techniques to build up your list.

This post outlines another method (hat tip to Ade Lewis for the idea) which requires you to scrape Twitter directly.

If you want to skip all the explanations and just want to download the Twitter List Scraper tool, here you go…

Download the Twitter Scraper Tool for Windows or Mac (completely free)

Disclaimer: Scraping Twitter is against their Terms of Service, so if you decide to do this you do it at your own risk.

Some Benchmarks

Building custom audiences on Twitter requires you to identify Twitter usernames that might be interested in your service or product.

In my previous posts, one of the methods I employed was to pull a competitor’s link profile and scrape social accounts from the linking domains.

Once you upload a custom list, Twitter goes through a process of ‘matching’ against profiles in their system, to make sure the user exists and hasn’t opted out of tailored ads.

As our data was scraped from a list of unqualified websites, the data matching wasn’t likely to be perfect.

Experiments

Since I published that post, I have been experimenting a fair bit with list building, and have built up around 10 custom audience lists. I

‘ve uploaded a total of 48,857 Twitter usernames using this method, but only 29,260 were matched by Twitter (just less than 60% match rate).

From some other experiments where I have had better control over the input data, this match rate was between 70-80%.

Since we’ll be scraping Twitter directly, I expect our match rate to be much higher – 90%+

Finding Relevant Twitter Lists

So, we’re going to scrape Twitter, and the first step is to find Twitter lists that will contain users potentially interested in what we have to offer.

As an example, we’ll pretend we’re marketing a music website, and we’ve produced a survey we want to collect responses for.

An advanced Google query can give us lists of music bloggers: site:twitter.com inurl:lists inurl:members inurl:music “music blogger”

Google Advanced Query - Music And a similar query can give us lists of music journalists: google-advanced-query-2 A really quick and easy way to scrape these Twitter URLs from the Google SERPs is to use a link copier extension like Linkclump.

If you first set your Google results to display 100 results (here’s how), you can just copy them straight off the page. Linkclump Dragging this to the bottom of the results page will give us a list we can paste right into Excel: Linkclump Results Any one of these pages shows us a load of Twitter users, editorially curated by someone else, specifically because they “think music is swell”.

Twitter Music List

Swell.

Put Your Scraping Hat On

From the 2 Google queries we used, we have 66 Twitter lists we want to extract usernames from. We thought this would be a great advert for URL Profiler, which has a nice Custom Scraper function. But…it didn’t really work. Here’s what happened: First we uploaded the list of URLs into the white box, and hit ‘Custom Scraper (Beta)’ under ‘Content Analysis’. URL Profiler Custom Scraper We wanted to scrape usernames from our list pages, as well as the number of members in each list. Using Inspect Element in Chrome, we pulled out the CSS selector we needed (more details on how to do this on this post). CSS Selector So we set up the tool to scrape the username as text from: span.username And similarly, the member number from: #page-container ul.stats li:nth-child(1) strong Which looks like this in URL Profiler: Custom Scraper Then we just ran the profiler and waited for the results. And realised the problem…

The Infinite Scroll Problem

Once we opened up the Excel output, we can found the data we were looking for under ‘Data 1’ and ‘Data 2’. Excel Output Although it looks a bit weird on the screenshot, the usernames are populated in that cell, separated by semi-colons. It is trivial to sanitise this data, just a bit of Excel data wrangling. Once we’d cleaned the data up, we saw that we hadn’t managed to scrape every username on the page. This is why: Infinite Scroll The custom scraper can’t currently process the infinite scroll to keep loading more usernames. In fact this is an issue most scraping software tools encounter, as sites auto-load the data in different ways.

A Custom Solution (Free Tool Download)

Once we’d identified the problem, Gareth stopped what he was doing and spent a couple of hours knocking together a quick Twitter List scraper. Here’s how it works: Twitter List Scraper That’s all there is to it! Whereas the custom scraper can only currently grab the first 25 results or so (so we’d end up with 66 x 25 = 1650 usernames), this will get you the lot: All Twitter Usernames So in about 10 minutes total work, we have almost 8000 targeted Twitter usernames that we can advertise to.

Download the Twitter Scraper Tool for Windows or Mac (completely free)

Twitter Custom Audiences

All we need to do now is upload this list as a custom audience on Twitter, then we can start serving ads to them.

I covered step-by-step instructions for this in my previous post, but the main thing to remember is to make all your usernames lowercase (just use =LOWER in Excel).

Then just head over to Twitter ads, go to Tools -> Audience Manager and hit ‘Create New List Audience’. You’ll end up on a page like this, where  you need to select ‘Twitter usernames’ from the data type options.

Twitter List Audience

Before we started, I estimated that we could perhaps expect a 90%+ match rate on our upload. Well…not quite:

Tailored Audience Ready

The matched audience size is 5430, out of an import  of 7733 rows – just over 70%. This is clearly better than the ‘links’ method which had a match rate of 60%, but still not as high as I had hoped.

Possibly this is simply a result of the scale with which we are working. I’ll follow up on the blog when I’ve done more experiments.

Interest-based Segmentation

Twitter Lists are a very powerful (and often under-used) feature, as they form an interest-based segmentation of the Twitter user-base, created by users themselves.

This method allows you to tap into this segment and advertise to it directly for your own benefit. Download the Twitter List Scraper and get going today!

If you have any other ideas for ways to use the tool or build Twitter Custom Audience lists, please share them below.

Downloads

Without further ado, here are the download links for the Twitter List Scraper:

Patrick Hathaway

By Patrick Hathaway

I seem to be the one that writes all the blog posts, so I am going to unofficially name myself 'Editor'. In fact, I think I prefer Editor-in-chief. You can follow me on Twitter or 'encircle me' on .

Comments

  • Davide Di Prossimo

    Hey Patrick,

    This was great, great, and super great. Thanks for putting it together. Loved it. One quick question for you: Do you think Twitter will ever do (or perhaps is doing that already) anything about the fact Twitter is so “hackable?” I hope not, but you know … it is a question that comes to mind.

    Thanks Patrick

    • HathawayP

      Thanks for the comment Davide, glad you enjoyed it.

      I think they probably will, to be honest. I would if I were them. The infinite scroll already makes it a little more challenging, as I illustrated in the article. Dedicated scrapers like Kimonify and Import.io really struggle with this sort of thing currently.

      I guess what comes with it is the moral question – should we really be doing this?

      • Davide Di Prossimo

        You’re most welcome Patrick. I loved it, at the very least.

        Yes, I understand your point, but the way I see it is that you can put a weapon in the hands of a citizen, but that does not make that person a killer. You can scrape as many @usernames and emails as you want, but then you must stop, and think about the ethical aspect of how using that info. I am very fascinated with scraping and data science … hope that makes sense.

        Thanks once more Patrick. I’ll share this article with our audience (57K+ people newsletter). You deserve that 🙂 I am sure they’s appreciate.

        Cheers

        • HathawayP

          Fantastic! Thanks, sounds awesome.

          Although I guess gun crime rates in the US don’t exactly support the ‘if you give a man a gun’ argument(!)

          • Davide Di Prossimo

            Oh well, that is correct I suppose. I don’t live in the U.S. though, I did not know.

          • HathawayP

            Davide! Your newsletter is Follow Weekly, yes? I saw some traffic from you in our Analytics and only just put two and two together that it is you! Thank you for including us (you have a new subscriber).

          • Davide Di Prossimo

            Patrick yes it was me (us) from Follow.net. You’re most welcome, you’re piece was/is great, I was super happy to included it. Oh, you subscribed?! Glad to hear that. I am sure you’ll like our newsletter.

            Thanks Patrick, and keep up the good work, as you’ve done so far 🙂

  • Trevor Cherewka

    I submitted my email but never received a confirmation email nor a link. Suggestions?

    • HathawayP

      Hi Trevor, we decided to remove the email requirement, sounds as though you filled it in during the crossover. Please hit Ctrl+F5 and you should find the direct download links at the bottom of the post.

  • brank87

    Patrick! Thanks for the tool and the guide. Unfortunately I’m trying to use it but only get empty csvs. Is it just me or the tools is having some issues?
    Cheers!

    • HathawayP

      Working ok for me. Make sure your URLs are in the right format. For example it won’t scrape a URL like this: https://twitter.com/danbarker/lists/brightonseo

      But it will scrape a URL like this: https://twitter.com/danbarker/lists/brightonseo/members

      So you need the /members bit on the end.

      • brank87

        mmm still nothing, that’s how I was entering urls. I even tried with your example but I’m still getting empty csvs. I just wrote a support ticket. Let’s see. Thanks!

        • I had the same issue; try changing your list to Public, and that might correct the issue.

  • Francisco García

    Any news about the empty csv file?

    Thank you.

    • HathawayP

      Hi Francisco. This seems to happen when lists are private as opposed to public, which makes sense. However in some cases on Mac this will happen even if everything is formatted correctly, which we have yet to figure out – not sure if it is a problem with Twitter or with the Mac version.

  • Andy Barr

    Patrick,

    Any word about those match rates? We’re getting our IDs using your tool and our match rate is anywhere from <50% to 75%.

    We just compiled the list so I'm wondering if the full list takes time to propagate, after the "Ready" status in the manager? I've heard similar for FB lists.

    Thanks!

    • HathawayP

      After some more testing my gut feeling is that match rate takes 2 things into account:
      – If the user has opted out of advertising
      – If the user has been active on Twitter over some given time period

      Since there can’t be that many people opting out of advertising, I think the inactivate users must be the thing that’s cutting our lists down. This would at least make sense – since anyone can be added to a Twitter list, active or not – and some lists have been hanging around for years.

      I have yet to design a test that would confirm this however.

  • Absolutely brilliant, downloaded the scraper today, followed your instructions perfectly (but only tested 10 URLS), and already have 397 Twitter User Names. Going to continue reading…..

    • HathawayP

      Great! Make sure to build up a decent sized list. Still not 100% clear how they match up so you want to get a fair number over 500 so they don’t all disappear!

  • Felix

    Hi Patrick, I like your post to scrape twitter lists. I have learned how to scrape Twitter accounts through this UK Startup http://bit.ly/ScrapeTwitter

    They can actually even scrape for you or teach you, I thought I share this for all the non-techs out there 🙂

  • LinkiCZ

    I am receiving an empty csv file, does that mean the tool is not working anymore?

  • Thanks for this nice blog. Keep it up. 🙂

    http://www.alquranonlinelearning.com

  • gg80302

    Hey Patrick, Great Tool! It only seems to want to scrape 1 list URL at a time…any ideas why it is not behaving as it did for you?

  • Merry Thomas

    Tool is not working….getting blank excel file.

  • Hi, just dropped two list URLs and got a zero byte file as a result

Ready to take your content auditing seriously?