Sunday, February 12, 2012

The short answer: Abysmal Facebook data quality. For the longer answer, read on…

Brands, consultants, and even mainstream news articles hail Radian6 as a leader in the social media listening space. Forrester says so (or rather said so 18 months ago). So do thought leaders like Chris Brogan. Clearly Salesforce.com agrees, or else they wouldn’t have shelled out over $300 million dollars to buy them outright.  And there are specific use cases where Radian6 can make sense:

  • Lead generation. This is pretty self-explanatory; if this weren’t the case, SalesForce.com would not have opened their checkbook.
  • You’re an agency or consultant working for an unsophisticated client. You fall into this category if this workflow sounds familiar:
  1. Write a keyword consisting entirely of a couple of terms in Radian6’s CONTAINS bucket
  2. Create a widget on the dashboard
  3. Take a screenshot of the pretty chart
  4. Put it in a PowerPoint with a note that buzz volume increased 35% month over month
  5. Repeat steps 1 - 4 (more than) as necessary
  6. Send the report to the client and pat yourself on the back for a job well done

Congratulations, you have an unsustainable business model! You may be getting away with this now, but you will need to step up your game in a timeframe measured in days, weeks, or months. Bank on the shorter end of that scale if your stakeholder has a quarterly business review coming up. Their boss is going to eat them alive, and they’re going to blame you (and deservedly so).

  • You have tremendous amounts of manpower. I’m not talking about a staff of interns, but rather an army of consultants (or some crowdsourced equivalent) poring through tweets for you and categorizing them in a useful way.
  • You’re doing crisis management, and speed matters above all else.
  • You’re doing all of the above exclusively in the English language.
  • You’re paying for multiple tools based on social media channel and use case.

But that’s about it. I may be exaggerating ever so slightly, but the reality is that if you’re looking for a one size fits all social media listening tool, you need to consider how many of these Radian6 pitfalls could potentially impact you. 

Before we go any further, some things to keep in mind:

I am going to call out some specific scenarios. I freely acknowledge that the plural of anecdote is not data, but I believe these situations to be representative of serious issues, based on my personal experience.

And about that personal experience… I’ve been using social media data as a research input for the better part of a decade. I’ve led studies conducted across multiple countries, in multiple languages, in multiple use cases, for multiple Fortune 50 companies. A large portion of that time was spent at a Radian6 competitor, in various research and client service roles. I would understand if you thought I had an axe to grind against them. The reality is that this is a space with hundreds of competitors, and the issues I’m going to cover are by no means exclusive to one vendor. If you ask any of the product managers I’ve collaborated with, they’ll tell you I am an equal opportunity offender when it comes to criticism of tools.

I’ve been providing recommendations and insight to my clients based on Radian6 data since I joined my current employer a year ago. So it’s at no small amount of professional risk that I call out Radian6 by name when I bring these issues to your attention. Hopefully, it also means they will take meaningful and rapid steps to address these problems. 

Data data data data data

The foundation of good research is good data (and also good questions, but for the purposes of this analysis, let’s stick to data). To my mind, there are three key questions you need to ask when assessing data quality for a social media research project:

  • Where does data come from?
  • How do I access the data?
  • How do I manipulate the data?

I’ll address each of these in turn, and talk about where Radian6 is not making the grade. 

Where is the data coming from?

The volume of social media data is staggering. Big Data experts like IBM and SAS have entered the social media monitoring space because it takes that kind of expertise to process data growing at an exponential rate. Vendors tend to classify data sources into large categories like blogs, message boards, Usenet groups, consumer review sites, and social networks like Twitter, Facebook, Orkut, YouTube, and even MySpace (remember them?). In sales material, vendors reps like to hype up their millions of sources, of which 99% end up being blogs. The reality is that most of those blogs are hosted by TypePad, Wordpress, or LiveJournal, and exist solely to earn their creators cash through Google AdWords placements and Amazon affiliate links. More on this later.

For Radian6 in particular, there are a number of large sites full of content that could be extremely relevant to you that they simply don’t offer. If you want insight into B2B or SMB conversation, the groups formed by LinkedIn’s 135 million+ members would be a great place to start. But Radian6 doesn’t capture them. It’s a similar story for consumer reviews from retailers like Amazon, Best Buy, and Office Depot. And if you want perspective on the rapidly growing Google+ user base? Look elsewhere. You’ll have to judge on a case-by-case basis whether these are omissions you can live with.

Data Sources

How is the data collected?

In Radian6’s defense, it’s not always their fault when they don’t offer data from a site. The Q&A site Quora, for example, prohibits most services from indexing their site content. (The morbidly curious can look at http://www.quora.com/robots.txt and see for yourselves). The Wall Street Journal’s “What they know” series on web privacy explains some of the technical and ethical issues here. And some sites like Orkut request in their terms of service that third parties like social media listening tool vendors hold onto data for only a limited period of time.

But in some cases there is simply no excuse for missing data, and it can have an incredibly detrimental impact on the quality of data available to you.  

Let’s talk about the elephant in the room: Facebook. I think everyone can agree on the premise that a significant portion of content shared on Facebook is not available for public consumption. Although sometimes people make things more public than they intended, with severe consequences (yes, that’s a link to the story of the dad who unloaded a clip of hollow point rounds into his daughter’s laptop). But there is a tremendous amount of publicly-consumable data on the Facebook fan pages of brands, products, and people. What you need to know about Radian6, and what I feel is such a critical issue that I’m just a few HTML tags shy of having text blink and scroll across the screen, is that Radian6 is not able to pick up comments made in response to fan page posts. Radian6 also does not pick up posts proactively made by fans to pages unless the fans make ALL of their content public.

I honestly have no idea how long this has been an issue for them, but it came into sharp relief after the major round of privacy changes Facebook made in early September. Look at this chart trending Radian6 Facebook messages and weep (about the data, not my horrible graphic design skills):

That’s a volume drop of over 90% between the first and third red dotted lines. Yeah.

Now, if you use social media management software like Buddy Media or Vitrue, and you’re only interested in conversation on fan pages you have administrator access to, then this isn’t an issue; you have all that data already. But if you have any interest in competitors, or doing benchmarking against a brand you aspire to like a Starbucks or a Nike, you are up a creek without a paddle. And unless you have someone intensely poring over the data you had no way of knowing it!

There are tools out there that collect all this content without fuss. Brandwatch and Nielsen MyBuzzmetrics come to mind. So it isn’t like this isn’t feasible.

Based on Radian6’s responses to my concerns here, any remedy is far off. They are putting priority on shoring up collection of brand-owned accounts like Twitter and Facebook, each of which sounds like a major release unto itself. If you need Facebook data, which is a pretty safe assumption, are you OK waiting months or quarters for it? Are you OK with the fact that Radian6 didn’t make an attempt to bring this to your attention?

The dirty little secret is that Radian6 does not take responsibility for actually harvesting all their data, they contract out to 3rd party crawlers along the lines of Boardreader and Board Tracker. If they don’t have data from a forum on the immensely popular FatWallet deal site (which was an issue I encountered a few months ago), they can pass the blame to their suppliers.

Their other data comes from publicly available APIs. But there are still some big sites that they simply don’t have. You may not have any interest in studying Chinese social media yet, but when you do, you better hope Radian6 offers access to the country’s largest social network by then.

How clean is the data?

So back to the TypePads, WordPresses, Bloggers, and such of the world. Most of the tens of millions of blogs they host contain loads of content, none of it written by written by a consumer. These sites are breeding grounds for spammy blogs designed to make money from Google AdWords advertising and Amazon affiliate product links. Unless you go out of the tool to examine the source pages, you may not realize this is fouling up your data. Even then, you may be in the dark about it until you run some unique phrases in a post through a Google search and see tons of duplicate results. The methodology in most of the social media listening tool evaluations I’ve seen doesn’t seem to account for this, and so Radian6 ends up getting praised for high volumes of spam-free data — when that couldn’t be further from the truth.

How quickly is the data collected?

This can be less of an issue depending on your use case. If you are trying to do community management or crisis monitoring, this could be paramount. For research on a longer timeline, it’s far less of an issue. But Radian6’s dependence on third parties to supply them with data unquestionably increases the lag time in seeing data.

To be continued

I think 5 single-spaced pages of text in Word is quite enough for one blog post. In Part II, I’ll cover data access, data manipulation, and customer service.

Notes

  1. rumahdesain2000 reblogged this from measurematt and added:
    Jasa desain rumah
  2. improvebrainfunction reblogged this from measurematt
  3. subliminal-messagesx reblogged this from measurematt
  4. mobile-phone-specifications reblogged this from measurematt
  5. alt-tech reblogged this from measurematt
  6. brandwatchreads reblogged this from measurematt
  7. measurematt posted this