Digital Conversations

My last project of 2014 was easily the most enjoyable one. The project was an analysis of my digital conversations from the past decade.

Firstly, let me praise Google and Facebook for their efforts to make their data available. Their efforts to share control with their users sets a great precedent for net neutrality. Google in particular, via their Data Liberation Front, is leading the web in these efforts. Way to be.

For the project, I collected all of my digital conversations from facebook, google voice, and google hangouts. These services automatically keep your data - from the first time you use them, till the end of time (?). Having joined Facebook during high school, and having adopting Google Voice as my primary phone provider years ago, I had LOTS of data to play with. After normalizing all that data, I had over 100,000 datapoints available for parsing.

≫ I still have 2GB worth of Gmail data to parse. Once this is parsed, that info should easily eclipse the rest. But, that's a project for a rainy (or snowy) day.

With the data was normalized, the fun could begin. Though the datapoints were small - just containing the sender, time, and content, there was still a lot to play with. By analyzing, organizing and filtering the data, some really gorgeous graphs came out. One of the coolest analysis, "tone", took data from Natural Language Processing, and compared the sentiment between the participants of the conversation.

Once I had the data graphing, I was blown away by the results. Being able to rewind years of a time with a mouse hover felt breathtaking. Memories that had otherwise been forgotten were immediately at the front of my mind.

That reminiscent power was inspiring; but it simultaneously embarrassing. High school drama, drunken texts, and other conversations should be lived once, an donly once. To protect my own ego, the data you see in the final product has been manually pruned. It's crazy to me that in the Age of Big Data, this sort of data is available for everyone. In the near future, these same datasets could easily and instantly destroy careers, end relationships, or ruin politics.

Though some of the data was uncomfortable, the entire experience was beautiful. Getting to explore my past, analyze the data, play with [canvasjs], etc. was a blast. I'd like to add some more analysis, include gmail, and release the source code to the public soon. Stay tuned!