Thursday, April 5, 2012

I’m surprised that neither Stephen Wolfram nor Nick Felton haven’t yet tackled the “change in my pocket” analysis.

What’s a Pound of Change Worth?

Friday, March 30, 2012

I’ve talked before about issues with using social media data to predict election outcomes. Yesterday Mashable’s 78th infographic of the day looks at a new wrench in the gears: spam:

The same techniques used by social spammers advertising free iPads and Viagra are now being used to spread bogus political messages across social media, blogs and news sites.

Yet another instance that should bring home that point that social media mention volume isn’t everything. A low-noise data source (likely coming from a tool with robust filtering capabilities) is the only way to reduce the impact of this nonsense on data quality.

Wednesday, March 28, 2012

This is the blog post-equivalent of arriving on the platform just as the 3 train arrives. Two items on NYC Subway data came on my radar this morning:

  • The infographic above uses ridership data to reveal the busiest and calmest stations.
  • The MTA is going to open up real-time data on trains to developers. I have an app on my phone that does some whiz-bang things with augmented reality and planning trips, but it has a kludgy system for notifying of delays and can’t tell me when the next train will actually arrive.
Saturday, March 17, 2012
To that end, if the ongoing competition in computer security between those uncovering vulnerabilities and those patching vulnerabilities is any indication, these bots might be the initial glimmerings of a larger emerging competition between “truth black hats” — discovering and leveraging social exploits in groups online and “truth white hats” — developing the active infrastructure to “patch” these cognitive weaknesses in the same communities. Whether the strategic advantage in this space of “social security” (sorry) accrues to the astroturfer or to those attempting to block those efforts remains to be seen.

I’m Not a Real Activist, But I Play One on the Internet | Truthiness in Digital Media

Here’s a Big Problem for practitioners of social listening to solve: what happens when the “people” responsible for Consumer-Generated Media aren’t actually people? Whether you’re taking a sample or analyzing in aggregate, the pool is contaminated.

Friday, March 16, 2012 Thursday, March 15, 2012

I didn’t have a chance to pick an NCAA bracket this year. I’m not too upset, as it means that my winning streak is intact (I won my office pool several years ago with a bracket titled “I actually hate Duke”). While they don’t account for the psychology of an office pool, I take a hard look at predictions from FiveThirtyEight’s Nate Silver and others before I complete my bracket.

This time of year is also exciting to mathematically-inclined sports fans because it means that MIT’s Sloan Business School hosts its Sports Analytics Conference.

Wednesday, March 14, 2012
Numbers can tell us a lot about technology, but only if we know them. Here are a few we don’t.

The Numbers We Don’t Know

Tuesday, March 13, 2012
So if I were a GM, there are a number of data analysis projects that I think would be far more important that measuring fielding efficiency. I’d probably rather have a model to optimize farm system progression or predict deterioration curves for aging veterans. I’m willing to bet that with a combination of lifestyle, demographic, mechanical, and psychographic data, I could build a pretty good model of age deterioration that would significantly out-perform most GM’s mental math. Double that for farm system progression, where I suspect many organizations are markedly inefficient. An analysis that tackled either of these issues would, I’m guessing, be far, far more impactful than fielding efficiency for the organization. These problems might not yield sexy visualizations but they would yield true competitive advantage.

Finding the Right Problem to Tackle: When Web Analytics Technologies Chase Problems - SemAngel

I recognize that the point of this post is measuring what matters in the digital space, but this section totally reads like the treatment for Moneyball 2. Somewhere Jonah Hill is getting ready for his second Oscar nomination.

Monday, March 12, 2012
So for 2012, we can expect campaigns to make use of aggregated structured data from their web sites, apps, records of volunteers canvassing and other traditional collection methods. They will also be collecting and analyzing unstructured data from interviews conducted with voters, social media and other sources to get a sense of how the public feels about issues. At the same time, they will try to get a more complete picture of the voter by merging offline and online identities.

Vote for me: How data will change the 2012 elections — Cloud Computing News

I find it interesting how the tone here is so much more nonchanalant than it was for the New York Times piece on Target a little while ago

  • The video associated with this post is great and well worth your 15 minutes. But two bullet points deserve to be shared in their entirety:
  • 5. "Data quality sucks, just get over it."
  • That is the title of my post from June 2006. And look how far we've come. : )
  • The core thrust of my post was that data on the web will never get to 95% clean and it will have big holes and it will be sparse in some areas. We should aim to collect, process and store data as cleanly as humanly possible, but after that we should move on to using the data, because we will still have more data about the web than what God's blessed any other channel with. Let's not become the type of people who continue to waste time on quality beyond the point of diminishing returns. Let's not become persistent javascript hackers and sprop variable tweakers at the cost of delivering value from data now.
  • Multiply all of that a million times when it comes to big data. We will have dirty data. We will have no idea what to do with videos or spoken text or (omg!) social media overload. We will be missing primary keys. We will suffer from a lack of clean meta data (or sometimes any meta data!). We will realize the shallow limits of sentiment analysis. We will cry from the pain of the painful business process fixes that usually result in good data.
  • And yet, we are standing on a mountain of gold.
  • Do the best you can in terms of collecting, processing, and storing data of the cleanest possible quality. Know when to shift to data analysis. Start making decisions. Make small ones at first. (Remember, even they will be revolutionary, as these datasets have never come together!) Make bigger ones over time, as you understand the limitations of what you are dealing with.
  • Here's the kiss of death: Big data implementation projects where the first touch of an Analyst will come 18 months after the project was first conceived. You see, the world would have changed so dramatically in 18 months that nothing you possibly spec'ed for is relevant any more.
  • Think smart. Move fast. Slowly become Godlike over time.
  • 6. Eliminating noise is even more important than finding a signal.
  • This might be a little controversial. But stay with me.
  • Thus far in the history data analysis the objective for our queries has been trying to find the signal amongst all the noise in the data. That has worked very well. We had clean business questions. The data size was smaller and the data set was more complete and we often knew what we were looking for. Known knowns and known unknowns. (See video above.)
  • With big data, it is so much more important to be magnificent at knowing what to ignore. You must know how to separate out all the noise in the disparate huge datasets to even have a fighting chance to start to look for the signal.
  • It is amazing but true. If you are not magnificent at knowing what to ignore, you'll never get a chance to pay attention to the stuff to which you should be paying attention.
  • Your business savvy. Your analytical gut instinct. Tuning your algorithms to first ignore and then hunt for insights. That is what will have a material impact.
  • http: //
Sunday, March 11, 2012
For one thing, digital skills are no longer a plus but expected. Mobile and social media are the two areas most in demand. Midlevel analytics jobs are a sweet spot, and media agencies need strategists focused on loyalty and content marketing.

Wanted: Social, Mobile and Gaming Guru | News - Advertising Age

Sorry folks, I’m taken :-)