Big Data Sets you can use with R
by Joseph Rickert The world may indeed be awash with data, however, it is not always easy to find a suitable data set when you need one. As the number of people becoming involved with R and data science increases so does the need for interesting data sets for creating examples, showcasing machine learning algorithms and developing statistical analyses. The Revolution Analytics collection contains some of the data sets we use at Revolution to show off the Parallel External Memory Algorithms in our RevoScaleR package. The Airlines data set that was used in the 2009 American Statistical Association challenge has become the “iris” data set for big data. > rxGetInfoXdf(working.file,getVarInfo=TRUE) File name: C:\DATA\Airlines_87_08\BigAir3.xdf Number of observations: 123534969 Number of variables: 31 Number of blocks: 833 Variable information: Var 1: Year, Type: integer, Low/High: (1987, 2008)Var 2: Month 12 factor levels: January February March April May ...
