Here is a map of the 2010 census data with colored pixels representing the race of every person in the US. With over 300 million dots, it’s a very visual representation of the racial divides that exist in most cities.
(via Flowing Data)
The European Journalism Centre and the Open Knowledge Foundation have joined forces to create the Data Journalism Handbook, a “free, open source reference book for anyone interested in the emerging field of data journalism.” The central tenet of the handbook is exploring “how data can be used to create deeper insights into what is happening around us and how it might affect us”.
Though the handbook is written for journalists, it provides useful advice for anyone who wants to understand data and effectively communicate results to others. This could include scientists, teachers, activists, or just anyone who wants to better understand the world.
It covers such important topics as:
As I’ve talked about before, we are entering the era of big data, with the internet providing access to a growing amount of easily accessible data on a wide range of topics. There’s a lot to be learned from it all, if we only know how to look.
There has been a lot in the news recently about the US government’s monitoring of phone calls and internet data. The National Security Agency is collecting millions of phone records and tapping directly into the servers of some of the biggest internet companies and extracting emails, chat logs, and a variety of other information in what is likely the biggest surveillance program in history.
This program is one example of a world-wide 21st-century phenomenon: the creation, collection, and processing of massive amounts of data. The existence of all this data is a very new thing. In the wrong hands and with the wrong intentions it can be used to control and suppress. But the flip side to that coin is that in the right hands, big data can be used to expand our comprehension of the world and open up new frontiers of knowledge.
In his TED talk (called the ‘Beauty of Data Visualization’, found here), information designer David McCandless talks about a phrase that has arisen to describe this situation: ‘Data is the new oil’. There is a lot to this analogy. Data, like oil, can be used to power incredible machines of advancement and technological progress. But because it is associated with such power, it becomes a very valuable resource, and there are bound to be struggles over who collects and controls it. McCandless actually prefers a different phrase: ‘Data is the new soil’. He prefers to think of this wealth of data as a fertile landscape from which beautiful and useful things emerge, if only they are cultivated in the right way.
As an example of the power of data to create a more balanced picture of the world, take a look at this TED video, in which Hans Rosling urges people to ‘let the data set change your mindset’ (Gapminder, his website, aims to do the same). In the video and on the website, Rosling attempts to eliminate out-dated ideas of enormous discrepancies between the Western world and the rest of it by showing just how far developing countries have come in the last fifty years. The use of data in this way is really about establishing a fact-based worldview – overcoming prejudice and replacing obsolete information.
This shows just a little bit of what is possible with good data. But where does this data come from, and how do you get it? Not everyone has the resources of the US government to collect whatever data they want. Or do they?
The mission of the Open Knowledge Foundation is to open up information around the world and ‘see it used and useful’. As part of the open government movement, the open data movement is largely about attempting to balance out the information flow between citizens and the state by pushing the government to open up more of its data to the public. There are a variety of resources that have been created for finding this information: among them Data.gov (US government) and datacatalogs.org (a list of data catalogs from around the world). I don’t know much about accessing and using this information, but it’s something I plan to learn more about.
As always, there are tenuous lines that must be walked between the usefulness of making data public and the necessity of keeping private lives private. I don’t think there are easy answers here. The boundaries between one side and the other are likely going to shift back and forth for quite some time.
Besides the question of when is it alright to collect data in the first place, the possession of this data opens up other questions. What do you do with it all? How do you process it efficiently? How do you visualize it effectively? Big data is a big topic – its collection, analysis, and visualization – and it’s something I hope to look into more.
Just to see what kinds of creativity there can be in simply taking a little bit of data and putting some life into it, and the enormous variety of ways to do this, check out these data visualization projects.