background preloader

How to be a data journalist

How to be a data journalist
Data journalism is huge. I don't mean 'huge' as in fashionable - although it has become that in recent months - but 'huge' as in 'incomprehensibly enormous'. It represents the convergence of a number of fields which are significant in their own right - from investigative research and statistics to design and programming. The idea of combining those skills to tell important stories is powerful - but also intimidating. Who can do all that? The reality is that almost no one is doing all of that, but there are enough different parts of the puzzle for people to easily get involved in, and go from there. 1. 'Finding data' can involve anything from having expert knowledge and contacts to being able to use computer assisted reporting skills or, for some, specific technical skills such as MySQL or Python to gather the data for you. 2. 3. 4. Tools such as ManyEyes for visualisation, and Yahoo! How to begin? So where does a budding data journalist start? Play around. And you know what?

http://www.theguardian.com/news/datablog/2010/oct/01/data-journalism-how-to-guide

Data journalism training – some reflections I recently spent 2 days teaching the basics of data journalism to trainee journalists on a broadsheet newspaper. It’s a pretty intensive course that follows a path I’ve explored here previously – from finding data and interrogating it to visualizing it and mashing – and I wanted to record the results. My approach was both practical and conceptual.

Big Data Technology Evaluation Checklist Anyone who’s been following the rapid-fire technology developments in the world that is becoming known as “big data” sees a new capability, product, or company founded literally every week. The ambition of all of these players, established and newcomer, is tremendous, because the potential value to business is enormous. Each new arrival is aimed at addressing the pain that enterprises are experiencing around unrelenting growth in the velocity, volume, and variety of the data their operations generate. What’s being lost, however, in some of this frothy marketing activity, is that it’s still early for big data technologies. There are vexing problems slowing the growth and the practical implementation of big data technologies. For the technologies to succeed at scale, there are several fundamental capabilities they should contain, including stream processing, parallelization, indexing, data evaluation environments and visualization.

Data science We’ve all heard it: according to Hal Varian, statistics is the next sexy job. Five years ago, in What is Web 2.0, Tim O’Reilly said that “data is the next Intel Inside.” But what does that statement mean? Why do we suddenly care about statistics and about data? In this post, I examine the many sides of data science — the technologies, the companies and the unique skill sets.

Voices: News organizations must become hubs of trusted data in a market seeking (and valuing) trust Editor’s Note: American readers may know Geoff McGhee for his video project Journalism in the Age of Data, released to acclaim last fall. Here he teams up with two European colleagues — Mirko Lorenz, a German information architect and journalist, and Nicolas Kayser-Bril, head data journalist at OWNI in France — to argue that news organizations should restructure themselves as data generators, gatherers, and analyzers. They believe that selling trusted data should be the foundation of journalism’s new business model. Give their argument a look. Journalists and media companies in general have had to answer a fundamental question ever since their traditional business model collapsed: What are we?

How to: get to grips with data journalism A graph showing the number of IEDs cleared from the Afghanistan War Logs Only a couple of years ago, the idea that journalists would need to know how to use a spreadsheet would have been laughed out of the newsroom. Now those benighted days are way behind us and extracting stories out of data is part of every journalist's toolkit of skills. Some people say the answer is to become a sort of super hacker, write code and immerse yourself in SQL. Growing Your Own Data Scientists CIOs and CTOs must learn to address a challenge, involving the divide between the people who know about the vast amount of new sources of data emanating from machines and other devices (“big data”) and the questions in the enterprise whose answers can be monetized. One group of people knows about the technology for analyzing data (they’re usually in IT). The other group understands the pernicious questions that would lead to an answer that is worth money to an organization (they’re usually on the business side). The role of the data scientist is a hybrid role that can solve this problem. While the definition of the role is compelling, it’s a lot easier to define the role than it is to hire someone to fill it, and even when you do, communication problems may persist. See these articles on Forbes.com for definitions of a data scientist from leading experts in the field:

Narcolepsy - Introduction Description An in-depth report on the causes, diagnosis, and treatment of narcolepsy. Highlights Overview All people with narcolepsy experience excessive sleepiness during the day. The growing importance of data journalism One of the themes from News Foo that continues to resonate with me is the importance of data journalism. That skillset has received renewed attention this winter after Tim Berners-Lee called analyzing data the future of journalism. When you look at data journalism and the big picture, as USA Today’s Anthony DeBarros did at his blog in November, it’s clear the recent suite of technologies is part of a continuum of technologically enhanced storytelling that traces back to computer-assisted reporting (CAR). As DeBarros pointed out, the message of CAR “was about finding stories and using simple tools to do it: spreadsheets, databases, maps, stats,” like Microsoft Access, Excel, SPSS, and SQL Server. That’s just as true today, even if data journalists now have powerful new tools for scraping data from the web with tools like ScraperWiki and Needlebase, scripting with Perl, or Ruby, Python, MySQL and Django.

22 free tools for data visualization and analysis You may not think you've got much in common with an investigative journalist or an academic medical researcher. But if you're trying to extract useful information from an ever-increasing inflow of data, you'll likely find visualization useful -- whether it's to show patterns or trends with graphics instead of mountains of text, or to try to explain complex issues to a nontechnical audience. There are many tools around to help turn data into graphics, but they can carry hefty price tags. The cost can make sense for professionals whose primary job is to find meaning in mountains of information, but you might not be able to justify such an expense if you or your users only need a graphics application from time to time, or if your budget for new tools is somewhat limited. If one of the higher-priced options is out of your reach, there are a surprising number of highly robust tools for data visualization and analysis that are available at no charge.

Related: