Memory
One of the most vexing issues in R is memory. For anyone who works with large datasets - even if you have 64-bit R running and lots (e.g., 18Gb) of RAM, memory can still confound, frustrate, and stymie even experienced R users. I am putting this page together for two purposes. First, it is for myself - I am sick and tired of forgetting memory issues in R, and so this is a repository for all I learn. However, this is a work in progress! 1) Read R> ?" 2) As I said elsewhere, 64-bit computing and a 64-bit version of R are indispensable for working with large datasets (you're capped at ~ 3.5 Gb RAM with 32 bit computing). How to avoid this problem? If you're unwilling to do any of the above, the final option is to read in only the part of the matrix you need, work with that portion of it, and then remove it from memory. 3) It is helpful to constantly keeping an eye on the top unix function (not sure what the equivalent is in windoze) to check the RAM your R session is taking.
The Endeavour | John D. Cook
I help people make decisions in the face of uncertainty. Sounds interesting. I’m a data scientist. Not sure what that means, but it sounds cool. I study machine learning. Hmm. I’m into big data. Even though each of these descriptions makes a different impression, they’re all essentially the same thing. There are distinctions. “Decision-making under uncertainty” emphasizes that you never have complete data, and yet you need to make decisions anyway. “Data science” stresses that there is more to the process of making inferences than what falls under the traditional heading of “statistics.” Despite the hype around the term data science, it’s growing on me. Machine learning, like decision theory, emphasizes the ultimate goal of doing something with data rather than creating an accurate model of the process that generates the data. “Big data” is a big can of worms. Bayesian statistics is much older than what is now sometimes called “classical” statistics.
Tutorials - Data-gov Wiki
From Data-gov Wiki Learn how to build your own linked data demos! This page describes the techniques and technologies used on this wiki, as well as providing links to a number of other tutorials and presentations on these technologies at other sites. Tutorials How-Tos (total 15) Data-gov Insights (total 7) External Resources Weblogs (total 20) Technologies (total 65) Applications (total 4) Other Resources (total 4)
The R programming language for programmers coming from other programming languages
IntroductionAssignment and underscoreVariable name gotchasVectorsSequencesTypesBoolean operatorsListsMatricesMissing values and NaNsCommentsFunctionsScopeMisc.Other resources Ukrainian translation Other languages: Powered by Translate Introduction I have written software professionally in perhaps a dozen programming languages, and the hardest language for me to learn has been R. R is more than a programming language. This document is a work in progress. Assignment and underscore The assignment operator in R is <- as in e <- m*c^2. It is also possible, though uncommon, to reverse the arrow and put the receiving variable on the right, as in m*c^2 -> e. It is sometimes possible to use = for assignment, though I don't understand when this is and is not allowed. However, when supplying default function arguments or calling functions with named arguments, you must use the = operator and cannot use the arrow. At some time in the past R, or its ancestor S, used underscore as assignment. Vectors Sequences
How to calculate polygon centroids in R (for non-contiguous shapes)
I've spent a little while figuring out the answer to this question. It's not immediately obvious from a Google search, so thought it may useful to post the answer on here. There is also an additional question about non-contiguous polygons. Instant easy answer: use the command: centroids <- getSpPPolygonsLabptSlots(polys) (This was found in the class description of the SpatialPolygonsDataFrame R data class for the overarching spatial package in R, sp) This seems to do exactly the same thing as cents <- SpatialPointsDataFrame(coords=cents, data=sids@data, proj4string=CRS("+proj=longlat +ellps=clrk66")) in the following code, which should be replicable on any R installation (try it!) Where cents (blue) and centroids (red) are identical centroids (this should plot should appear after you've run the code): So far so good. So this question is 3-things:
Data Sorcery with Clojure