How much does that penny for your thoughts weigh?

I’m surprised that neither Stephen Wolfram nor Nick Felton haven’t yet tackled the “change in my pocket” analysis.
Spam-churian candidates
I’ve talked before about issues with using social media data to predict election outcomes. Yesterday Mashable’s 78th infographic of the day looks at a new wrench in the gears: spam:
The same techniques used by social spammers advertising free iPads and Viagra are now being used to spread bogus political messages across social media, blogs and news sites.
Yet another instance that should bring home that point that social media mention volume isn’t everything. A low-noise data source (likely coming from a tool with robust filtering capabilities) is the only way to reduce the impact of this nonsense on data quality.
Data on Rails

This is the blog post-equivalent of arriving on the platform just as the 3 train arrives. Two items on NYC Subway data came on my radar this morning:
- The infographic above uses ridership data to reveal the busiest and calmest stations.
- The MTA is going to open up real-time data on trains to developers. I have an app on my phone that does some whiz-bang things with augmented reality and planning trips, but it has a kludgy system for notifying of delays and can’t tell me when the next train will actually arrive.
I’m Not a Real Activist, But I Play One on the Internet | Truthiness in Digital Media

Here’s a Big Problem for practitioners of social listening to solve: what happens when the “people” responsible for Consumer-Generated Media aren’t actually people? Whether you’re taking a sample or analyzing in aggregate, the pool is contaminated.
Conde Nast to Provide Ad Metrics for Tablets
Forgive the Mashable link, but it gets the point across.
March Mathness

I didn’t have a chance to pick an NCAA bracket this year. I’m not too upset, as it means that my winning streak is intact (I won my office pool several years ago with a bracket titled “I actually hate Duke”). While they don’t account for the psychology of an office pool, I take a hard look at predictions from FiveThirtyEight’s Nate Silver and others before I complete my bracket.
This time of year is also exciting to mathematically-inclined sports fans because it means that MIT’s Sloan Business School hosts its Sports Analytics Conference.

6 Surprising Pizza Pie Charts
Happy π Day!

Predicting a story's popularity on Digg | Digg Topnews

I’ve been playing with this car metaphor for using social media data. I really need to give it its own post. But the idea boils down to using it as a rearview mirror (that is, backward-looking), as a dashboard and a dipstick (seeing how you’re doing right now), and a windshield (seeing where you’re going). The first two are quite common, the third, much less so. But here’s a cool example.
Finding the Right Problem to Tackle: When Web Analytics Technologies Chase Problems - SemAngel
I recognize that the point of this post is measuring what matters in the digital space, but this section totally reads like the treatment for Moneyball 2. Somewhere Jonah Hill is getting ready for his second Oscar nomination.
Why Klout really matters: Money, money, money — GigaOm
I’m going to skip past the “be wary of any black box algorithm” rant.
I assume many companies are already taking similar approaches to using Twitter as a marketing campaign. Step 1 might be finding out how people feel about a particular product, show, etc., by analyzing the Twitter firehose. But Step 2 should be finding out which Twitter users are influential in that space and trying to make them happy. Or maybe part of Step 1 is weighting sentiment based on who expressed it — an influential voice coming out in support of or against something might be worth more than someone with relatively low influence in determining how something will play out.
I’m totally on board with ideation as Step 1. Take a look at how people are talking and make sure your campaign reflects that language or content. But successful influencer outreach veers in a different direction after that. Take a look at who generates the content that promotes the most engagement, be it retweets, @mentions, link clicks, or traffic to your site (it’s always nice when you have some other channel data). And then see what those authors’ attitudes are to your brand/product/service. If they’re a fan, target them to your heart’s content. If they’re not, you should either avoid them or use this as an opportunity to convert them into one.
Vote for me: How data will change the 2012 elections — Cloud Computing News
I find it interesting how the tone here is so much more nonchanalant than it was for the New York Times piece on Target a little while ago
- The video associated with this post is great and well worth your 15 minutes. But two bullet points deserve to be shared in their entirety:
- 5. "Data quality sucks, just get over it."
- That is the title of my post from June 2006. And look how far we've come. : )
- The core thrust of my post was that data on the web will never get to 95% clean and it will have big holes and it will be sparse in some areas. We should aim to collect, process and store data as cleanly as humanly possible, but after that we should move on to using the data, because we will still have more data about the web than what God's blessed any other channel with. Let's not become the type of people who continue to waste time on quality beyond the point of diminishing returns. Let's not become persistent javascript hackers and sprop variable tweakers at the cost of delivering value from data now.
- Multiply all of that a million times when it comes to big data. We will have dirty data. We will have no idea what to do with videos or spoken text or (omg!) social media overload. We will be missing primary keys. We will suffer from a lack of clean meta data (or sometimes any meta data!). We will realize the shallow limits of sentiment analysis. We will cry from the pain of the painful business process fixes that usually result in good data.
- And yet, we are standing on a mountain of gold.
- Do the best you can in terms of collecting, processing, and storing data of the cleanest possible quality. Know when to shift to data analysis. Start making decisions. Make small ones at first. (Remember, even they will be revolutionary, as these datasets have never come together!) Make bigger ones over time, as you understand the limitations of what you are dealing with.
- Here's the kiss of death: Big data implementation projects where the first touch of an Analyst will come 18 months after the project was first conceived. You see, the world would have changed so dramatically in 18 months that nothing you possibly spec'ed for is relevant any more.
- Think smart. Move fast. Slowly become Godlike over time.
- 6. Eliminating noise is even more important than finding a signal.
- This might be a little controversial. But stay with me.
- Thus far in the history data analysis the objective for our queries has been trying to find the signal amongst all the noise in the data. That has worked very well. We had clean business questions. The data size was smaller and the data set was more complete and we often knew what we were looking for. Known knowns and known unknowns. (See video above.)
- With big data, it is so much more important to be magnificent at knowing what to ignore. You must know how to separate out all the noise in the disparate huge datasets to even have a fighting chance to start to look for the signal.
- It is amazing but true. If you are not magnificent at knowing what to ignore, you'll never get a chance to pay attention to the stuff to which you should be paying attention.
- Your business savvy. Your analytical gut instinct. Tuning your algorithms to first ignore and then hunt for insights. That is what will have a material impact.
- http: //www.kaushik.net/avinash/big-data-imperative-driving-big-action/
Wanted: Social, Mobile and Gaming Guru | News - Advertising Age
Sorry folks, I’m taken :-)