One of the core tenants of data science is the idea of transparency, both in the sense that data collection methods should be open to scrutiny, and that the results of data collection should be shared with the general public.
In 2009, President Obama appointed Vivek Kundra as the United States’ first Federal Chief Information Officer. The position was established to oversee the creation of a new website, data.gov. The site currently hosts 186,000 data sets — ranging from crime reporting to college affordability statistics — that are primarily generated by the federal government, although state and local government agencies are starting to post their data as well. Data.gov has been developed on Github, while being powered by the open-source applications CKAN and WordPress, and represents a new level of governmental transparency in the world of data science.
Data.gov says, “Open government data powers software applications that help people make informed decisions – from choosing financial aid options for college to finding the safest consumer products and vehicles.”
Although the federal government has yet to pass any binding laws, the 2013 Federal Open Data Policy provides recommendations for how information should be shared on data.gov, and the policy stipulates that newly generated government data should be available in open and machine-readable formats. This uploaded information is open to anyone or any entity, and data.gov is even transparent about which corporations are using its data sets: Foursquare is listed as using data sources from the U.S. Global Positioning System and U.S. Census TIGER database; LinkedIn is using data from the Departments of Labor and Education.
As for-profit corporations already have the resources to extract information from data.gov, it becomes increasingly important that private citizens and non-profit organizations also take advantage of the United States Government’s newfound commitment to openly available data. Luckily, a few civically-minded organizations have begun to use data.gov as well. For example, the College Affordability and Transparency Center, which provides tools for comparing the cost of college tuition, uses data.gov’s “National Center for Education Statistics, Integrated Postsecondary Education Data System” to help citizens make informed decisions about the costs and benefits of higher education.
Data.gov is a huge step forward in data transparency, but available data sets are only half of the answer to the federal government’s desire to “help people make informed decisions.” Large corporations already have the resources to interpret data, so in addition to increased transparency, there needs to be an an increased opportunity for average citizens and non-profit organizations to use, understand, and interpret the data being presented.