https://code.google.com/archive/p/google-refine/
Related: okosVisiNav VisiNav is a system to search and navigate web data, collected from a multitude of sources. In summary, the system demonstrates how to combine data from multiple sources into a single unified view, how to search and navigate the aggregated dataset, and how to re-use query results from web data in external applications. On a conceptual level, VisiNav deals with objects. Objects can have attributes and links to other objects. For example, there are objects of type Person. A Person has a name; a Person knows another Person; a Person makes Documents; a Person is based near a Location.
Chapter 1. Using Google Refine to Clean Messy Data Google Refine (the program formerly known as Freebase Gridworks) is described by its creators as a “power tool for working with messy data” but could very well be advertised as “remedy for eye fatigue, migraines, depression, and other symptoms of prolonged data-cleaning.” Even journalists with little database expertise should be using Refine to organize and analyze data; it doesn't require much more technical skill than clicking through a webpage. For skilled programmers, and journalists well-versed in Access and Excel, Refine can greatly reduce the time spent doing the most tedious part of data-management. Other reasons why you should try Google Refine: It’s free.It works in any browser and uses a point-and-click interface similar to Google Docs.Despite the Google moniker, it works offline.
Data Wrangler UPDATE: The Stanford/Berkeley Wrangler research project is complete, and the software is no longer actively supported. Instead, we have started a commercial venture, Trifacta. For the most recent version of the tool, see the free Trifacta Wrangler. Working with data in protovis For the past year or so I have been dabbling with protovis. I don’t have a heavy CS background but protovis is supposedly easy to pick up for people like me, who are vaguely aware that computers can make calculations but who need to check the manual for the most mundane programming instructions. I found was while it’s reasonnably easy to modify the most basic examples to make stuff happen, it is much harder to understand or adapt the more complex ones, let alone to create a fairly complex visualization.
Weka 3 - Data Mining with Open Source Machine Learning Software in Java Weka is a collection of machine learning algorithms for data mining tasks. It contains tools for data preparation, classification, regression, clustering, association rules mining, and visualization. Found only on the islands of New Zealand, the Weka is a flightless bird with an inquisitive nature. The name is pronounced like this, and the bird sounds like this.
Disco Hyperdata Browser The Disco - Hyperdata Browser is a simple browser for navigating the Semantic Web as an unbound set of data sources. The browser renders all information, that it can find on the Semantic Web about a specific resource, as an HTML page. This resource description contains hyperlinks that allow you to navigate between resources. While you move from resource to resource, the browser dynamically retrieves information by dereferencing HTTP URIs and by following rdfs:seeAlso links. News Chapter 2: Reading Data from Flash Sites Flash applications often disallow the direct copying of data from them. But we can instead use the raw data files sent to the web browser. Adobe Flash can make data difficult to extract.
Protovis Protovis composes custom views of data with simple marks such as bars and dots. Unlike low-level graphics libraries that quickly become tedious for visualization, Protovis defines marks through dynamic properties that encode data, allowing inheritance, scales and layouts to simplify construction. Protovis is free and open-source, provided under the BSD License. It uses JavaScript and SVG for web-native visualizations; no plugin required (though you will need a modern web browser)!
Mike Bostock December 27, 2014Mapping Every Path to the N.F.L. Playoffs December 20, 2014How Each Team Can Make the N.F.L. PyNGL and PyNIO Introduction This is the place to start if you are new to PyNGL and PyNIO. PyNGL (pronounced "pingle") is a Python language module used to visualize scientific data, with an emphasis on high quality 2D visualizations. A working knowledge of Python is assumed. The Wolfram Education Portal Is Here! January 18, 2012 — Wolfram Blog Team Teachers, are you looking for a new way to integrate technology into your classroom? How about through a dynamic textbook or pre-generated lesson plans? Chapter 4: Scraping Data from HTML Web-scraping is essentially the task of finding out what input a website expects and understanding the format of its response. For example, Recovery.gov takes a user's zip code as input before returning a page showing federal stimulus contracts and grants in the area. This tutorial will teach you how to identify the inputs for a website and how to design a program that automatically sends requests and downloads the resulting web pages. Pfizer disclosed its doctor payments in March as part of a $2.3 billion settlement - the largest health care fraud settlement in U.S. history - of allegations that it illegally promoted its drugs for unapproved uses. Of the disclosing companies so far, Pfizer's disclosures are the most detailed and its site is well-designed for users looking up individual doctors. However, its doctor list is not downloadable, or easily aggregated.
Google Refine is a great tool for cleaning up (standardizing) large amounts of data at a time. by danielhall66 Jan 21