background preloader

A Community Site for R – Sponsored by Revolution Analytics

A Community Site for R – Sponsored by Revolution Analytics

The Workspace The workspace is your current R working environment and includes any user-defined objects (vectors, matrices, data frames, lists, functions). At the end of an R session, the user can save an image of the current workspace that is automatically reloaded the next time R is started. Commands are entered interactively at the R user prompt. Up and down arrow keys scroll through your command history. You will probably want to keep different projects in different physical directories. IMPORTANT NOTE FOR WINDOWS USERS: R gets confused if you use a path in your code like c:\mydocuments\myfile.txt This is because R sees "\" as an escape character. getwd() # print the current working directory - cwd ls() # list the objects in the current workspace setwd(mydirectory) # change to mydirectory setwd("c:/docs/mydir") # note / instead of \ in windows setwd("/usr/rob/mydir") # on linux # save your command history savehistory(file="myfile") # default is ".Rhistory" q() # quit R.

Built-in Functions Almost everything in R is done through functions. Here I'm only refering to numeric and character functions that are commonly used in creating or recoding variables. Numeric Functions Character Functions Statistical Probability Functions The following table describes functions related to probaility distributions. Other Statistical Functions Other useful statistical functions are provided in the following table. Other Useful Functions Note that while the examples on this page apply functions to individual variables, many can be applied to vectors and matrices as well.

Impatient R Translations français: Translated by Kate Bondareva. Serbo-Croatian: Translated by Jovana Milutinovich from Geeks Education. Preface This is a tutorial (previously known as “Some hints for the R beginner”) for beginning to learn the R programming language. You are probably impatient to learn R — most people are. This page has several sections, they can be put into the four categories: General, Objects, Actions, Help. General Introduction Blank screen syndrome Misconceptions because of a previous language Helpful computer environments R vocabulary Epilogue Objects Key objects Reading data into R Seeing objects Saving objects Magic functions, magic objects Some file types Packages Actions What happens at R startup Key actions Errors and such Graphics Vectorization Make mistakes on purpose Help Introduction I asked R users what their biggest stumbling blocks were in learning R. > search()

An Introduction to R Table of Contents This is an introduction to R (“GNU S”), a language and environment for statistical computing and graphics. R is similar to the award-winning1 S system, which was developed at Bell Laboratories by John Chambers et al. This manual provides information on data types, programming elements, statistical modelling and graphics. This manual is for R, version 3.1.0 (2014-04-10). Copyright © 1990 W. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Preface This introduction to R is derived from an original set of notes describing the S and S-PLUS environments written in 1990–2 by Bill Venables and David M. We would like to extend warm thanks to Bill Venables (and David Smith) for granting permission to distribute this modified version of the notes in this way, and for being a supporter of R from way back. Comments and corrections are always welcome. 1.1 The R environment

"R" you ready? | My advances in R – a learner’s diary The Endeavour | John D. Cook I help people make decisions in the face of uncertainty. Sounds interesting. I’m a data scientist. I study machine learning. I’m into big data. Even though each of these descriptions makes a different impression, they’re all essentially the same thing. There are distinctions. “Decision-making under uncertainty” emphasizes that you never have complete data, and yet you need to make decisions anyway. “Data science” stresses that there is more to the process of making inferences than what falls under the traditional heading of “statistics.” Despite the hype around the term data science, it’s growing on me. Machine learning, like decision theory, emphasizes the ultimate goal of doing something with data rather than creating an accurate model of the process that generates the data. “Big data” is a big can of worms. The term “statistics” literally means the mathematics of the interests of states, as in governments, because these were the first applications of statistics.

Home Page R FAQ: How does R handle missing values? R FAQ How does R handle missing values? Version info: Code for this page was tested in R Under development (unstable) (2012-02-22 r58461) On: 2012-03-28 With: knitr 0.4 Like other statistical software packages, R is capable of handling missing values. However, to those accustomed to working with missing values in other packages, the way in which R handles missing values may require a shift in thinking. Very basics Missing data in R appears as NA. x1 <- c(1, 4, 3, NA, 7)x2 <- c("a", "B", NA, "NA") NA is the one of the few non-numbers that we could include in x1 without generating an error (and the other exceptions are letters representing numbers or numeric ideas like infinity). is.na(x1) is.na(x2) We can see that R distinguishes between the NA and "NA" in x2--NA is seen as a missing value, "NA" is not. Differences from other packages NA cannot be used in comparisons: In other packages, a "missing" value is assigned an extreme numeric value--either very high or very low. x1 < 0 x1 == NA mean(x1)

The R programming language for programmers coming from other programming languages IntroductionAssignment and underscoreVariable name gotchasVectorsSequencesTypesBoolean operatorsListsMatricesMissing values and NaNsCommentsFunctionsScopeMisc.Other resources Ukrainian translation Other languages: Powered by Translate Introduction I have written software professionally in perhaps a dozen programming languages, and the hardest language for me to learn has been R. R is more than a programming language. This document is a work in progress. Assignment and underscore The assignment operator in R is <- as in e <- m*c^2. It is also possible, though uncommon, to reverse the arrow and put the receiving variable on the right, as in m*c^2 -> e. It is sometimes possible to use = for assignment, though I don't understand when this is and is not allowed. However, when supplying default function arguments or calling functions with named arguments, you must use the = operator and cannot use the arrow. At some time in the past R, or its ancestor S, used underscore as assignment. Vectors Sequences

Multiple Regression R provides comprehensive support for multiple linear regression. The topics below are provided in order of increasing complexity. Fitting the Model # Multiple Linear Regression Example fit <- lm(y ~ x1 + x2 + x3, data=mydata) summary(fit) # show results # Other useful functions coefficients(fit) # model coefficients confint(fit, level=0.95) # CIs for model parameters fitted(fit) # predicted values residuals(fit) # residuals anova(fit) # anova table vcov(fit) # covariance matrix for model parameters influence(fit) # regression diagnostics Diagnostic Plots Diagnostic plots provide checks for heteroscedasticity, normality, and influential observerations. # diagnostic plots layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page plot(fit) click to view For a more comprehensive evaluation of model fit see regression diagnostics. Comparing Models You can compare nested models with the anova( ) function. Cross Validation You can assess R2 shrinkage via K-fold cross-validation. Variable Selection

Data Sorcery with Clojure Graphical Parameters You can customize many features of your graphs (fonts, colors, axes, titles) through graphic options. One way is to specify these options in through the par( ) function. If you set parameter values here, the changes will be in effect for the rest of the session or until you change them again. The format is par(optionname=value, optionname=value, ...) # Set a graphical parameter using par() par() # view current settings opar <- par() # make a copy of current settings par(col.lab="red") # red x and y labels hist(mtcars$mpg) # create a plot with these new settings par(opar) # restore original settings A second way to specify graphical parameters is by providing the optionname=value pairs directly to a high level plotting function. # Set a graphical parameter within the plotting function hist(mtcars$mpg, col.lab="red") See the help for a specific high level plotting function (e.g. plot, hist, boxplot) to determine which graphical parameters can be set this way. Text and Symbol Size Lines Colors

Related: