R and Data Mining

Summer 2010 — R: ggplot2 Intro Contents Intro When it comes to producing graphics in R, there are basically three options for your average user. base graphics I've written up a pretty comprehensive description for use of base graphics here, and don't intend to extend beyond that. Both and make creating plots of multivariate data easier. The website for ggplot2 is here: Basics is meant to be an implementation of the Grammar of Graphics, hence gg-plot. Plots convey information through various aspects of their aesthetics. x position y position size of elements shape of elements color of elements The elements in a plot are geometric shapes, like points lines line segments bars text Some of these geometries have their own particular aesthetics. points point shape point size lines line type line weight bars y minimum y maximum fill color outline color text label value The values represented in the plot are the product of various statistics. Layer by Layer Displaying Statistics

Visualizing Tables with plot.table Home > R > Visualizing Tables with plot.table plot.table function in the Systematic Investor Toolbox is a flexible table drawing routine. plot.table has a simple interface and takes following parameters: plot.matrix – matrix with data you want to plotsmain – text to draw in (top, left) cell; default value is blank stringhighlight – Either TRUE/FALSE to indicate if you want to color each cell based on its numeric value Or a matrix with colors for each cellcolorbar – TRUE/FALSE flag to indicate if you want to draw colorbar Here is a few examples how you can use plot.table function to create summary reports. First, let’s load Systematic Investor Toolbox: To create basic plot.table: To create plot.table with colorbar: Next, I want to show a more practical example of plot.table function. I will show more examples of plot.table in the future posts. To view the complete source code for this example, please have a look at the plot.table.test() function in plot.table.r at github. Like this:

Learning R R library(stringr) [1] "1 Introduction" [3] "Climate projections of the Intergovernmental Panel on Climate Change (IPCC) forecast a general increase of seasonal temperatures in the present century across the temperate zone, aggravated by decreasing amounts of summer rainfall in certain regions at lower latitudes (Christensen et al. 2007). [5] "In this study, we aim to (1) identify the limiting macroclimatic factors and to (2) predict the future boundaries of beech (Fagus sylvatica L.) and sessile oak (Quercus petraea (Mattuschka) Liebl.) forests in a region highly vulnerable to climatic extremes. [7] "Beech and sessile oak forests of Hungary are to a large extent “trailing edge” populations (Hampe and Petit 2005), which should be preferably modelled using specific modelling strategies (Thuiller et al. 2008). extr1 <- unlist(str_extract_all(txt, pattern = "\\(.*? extr2 <- extr1[grep("[0-9]{4}", extr1)] (str_extract(extr2, "[A-Z].*[0-9]")) [1] "Christensen et al. 2007" [2] "Fischlin et al. 2007"

R Reference Card Polygon Overlay Analysis Download data and R Code for this example Project Requirement: Polygon Overlay operations determine the spatial coincidence (if any) of two polygon data layers, or between polygon and point layer, usually creating a new data layer in the process. Three useful (and widely used) polygon overlay operations are: Intersection (logical AND): The common or shared area between two overlapping polygons. Union (logical OR): The combined areas of two possibly overlapping polygons. Point-in-Polygon (logical AND): Between a point and polygon layer, the subset of points located within the polygon boundary. Here, we demonstrate overlay operations using a collection of point and polygon species range data sets collected in South America, and methods from the PBSmapping package. 1) What is the area of each Species Range? Input Data / Format: Point File: Mammalian Species Sightings (ESRI Point Shape File) from NatureServe data set. Base Map: DIVA-GIS Global Administrative Boundaries. Workflow: Discussion:

developers:projects:gsoc2012:ropensci Summary: Dynamic access and visualization of scientific data repositories Description: rOpenSci is a collaborative effort to develop R-based tools for facilitating Open Science. Projects in rOpenSci fall into two categories: those for working with the scientific literature, and those for working directly with the databases. Visit the active development hub of each project on github, where you can see and download source-code, see updates, and follow or join the developer discussions of issues. Most of the packages work through an API provided by the resource (database, paper archive) to access data and bring it within reach of R’s powerful manipulation. See a complete list of our R packages currently in development. The student could choose to work on a package for a particular data repository of interest, or develop tools for visualization and exploration that could function across the existing packages.

Cookbook for R » Cookbook for R Model visualisation. had.co.nz This page lists my published software for model visualisation. This work forms the basis for the third chapter of my thesis. classifly: Explore classification boundaries in high dimensions. Given p-dimensional training data containing d groups (the design space), a classification algorithm (classifier) predicts which group new data belongs to. clusterfly: Explore clustering results in high dimensions. Typically, there is somewhat of a divide between statistics and visualisation software. There are also some custom methods for certain types of clustering, mostly inspired by the work of Dr Dianne Cook: Self organising maps (aka Kohonen neural networks), ? meifly: Models explored interactively. Meifly is tool that uses R and GGobi to explore ensembles of linear models, where we look at all possible main effects models for a given dataset (or a large subset of these models). Installation Please make sure you have a current version of R and rggobi installed, then use the following R code:

Cookbook for R » Cookbook for R Quick-R: Home Page R Programming Welcome to the R programming Wikibook This book is designed to be a practical guide to the R programming language[1]. R is free software designed for statistical computing. How can you share your R experience ? Explain the syntax of a commandCompare the different ways of performing each task using R.Try to make unique examples based on fake data (ie simulated data sets).As with any Wikibook please feel free to make corrections, expand explanations, and make additions where necessary. Some rules : Prerequisites[edit] We assume that readers have a background in statistics. We also assume that readers are familiar with computers and that they know how to use software with a command-line interface. See also[edit] Larry Wasserman's book All of Statistics[6]The Statistics and the Econometric Theory wikibooks.The Econometrics and Statistics pages on wikipedia. References[edit]