background preloader

VassarStats: Statistical Computation Web Site

VassarStats: Statistical Computation Web Site
Related:  Epidemiology & BiostatisticsStatistics

Sample Size Calculator - Confidence Level, Confidence Interval, Sample Size, Population Size, Relevant Population - Creative Research Systems This Sample Size Calculator is presented as a public service of Creative Research Systems survey software. You can use it to determine how many people you need to interview in order to get results that reflect the target population as precisely as needed. You can also find the level of precision you have in an existing sample. Before using the sample size calculator, there are two terms that you need to know. These are: confidence interval and confidence level. If you are not familiar with these terms, click here. Enter your choices in a calculator below to find the sample size you need or the confidence interval you have. Sample Size Calculator Terms: Confidence Interval & Confidence Level The confidence interval (also called margin of error) is the plus-or-minus figure usually reported in newspaper or television opinion poll results. The confidence level tells you how sure you can be. Factors that Affect Confidence Intervals Sample sizePercentagePopulation size Sample Size Percentage

Interactive Statistical Calculation Pages Sample Size Calculator by Raosoft, Inc. If 50% of all the people in a population of 20000 people drink coffee in the morning, and if you were repeat the survey of 377 people ("Did you drink coffee this morning?") many times, then 95% of the time, your survey would find that between 45% and 55% of the people in your sample answered "Yes". The remaining 5% of the time, or for 1 in 20 survey questions, you would expect the survey response to more than the margin of error away from the true answer. When you survey a sample of the population, you don't know that you've found the correct answer, but you do know that there's a 95% chance that you're within the margin of error of the correct answer. Try changing your sample size and watch what happens to the alternate scenarios. That tells you what happens if you don't use the recommended sample size, and how M.O.E and confidence level (that 95%) are related. To learn more if you're a beginner, read Basic Statistics: A Modern Approach and The Cartoon Guide to Statistics.

Downloadable Sample SPSS Data Files Downloadable Sample SPSS Data Files Data QualityEnsure that required fields contain data.Ensure that the required homicide (09A, 09B, 09C) offense segment data fields are complete.Ensure that the required homicide (09A, 09B, 09C) victim segment data fields are complete.Ensure that offenses coded as occurring at midnight are correctEnsure that victim variables are reported where required and are correct when reported but not required. Standardizing the Display of IBR Data: An Examination of NIBRS ElementsTime of Juvenile Firearm ViolenceTime of Day of Personal Robberies by Type of LocationIncidents on School Property by HourTemporal Distribution of Sexual Assault Within Victim Age CategoriesLocation of Juvenile and Adult Property Crime VictimizationsRobberies by LocationFrequency Distribution for Victim-Offender Relationship by Offender and Older Age Groups and Location Analysis ExamplesFBI's Analysis of RobberyFBI's Analysis of Motor Vehicle Theft Using Survival Model

Data & Documentation | YRBSS | Adolescent and School Health | CDC Skip directly to search Skip directly to A to Z list Skip directly to navigation Skip directly to page options Skip directly to site content Get Email Updates To receive email updates about this page, enter your email address: CDCDASH HomeDataYRBSSData & Documentation YRBSS Data & Documentation Recommend on Facebook Tweet On This Page Youth Risk Behavior Survey (YRBS) data are available in two file formats: Access® and ASCII. New Sexual Minority Data are Now Available. Combined YRBS Datasets and Documentation The combined YRBS dataset includes national, state, and large urban school district data from selected surveys from 1991-2015. National dat (zip)( States A-M dat (zip)( States N-Z (zip)( Top of Page National YRBS Datasets and Documentation Data SPSS Syntax: sps

The R Trader » Blog Archive » BERT: a newcomer in the R Excel connection A few months ago a reader point me out this new way of connecting R and Excel. I don’t know for how long this has been around, but I never came across it and I’ve never seen any blog post or article about it. So I decided to write a post as the tool is really worth it and before anyone asks, I’m not related to the company in any way. BERT stands for Basic Excel R Toolkit. It’s free (licensed under the GPL v2) and it has been developed by Structured Data LLC. At the time of writing the current version of BERT is 1.07. In this post I’m not going to show you how R and Excel interact via BERT. How do I use BERT? My trading signals are generated using a long list of R files but I need the flexibility of Excel to display results quickly and efficiently. Use XML to build user defined menus and buttons in an Excel file.The above menus and buttons are essentially calls to VBA functions.Those VBA functions are wrapup around R functions defined using BERT. Prerequisite Step by step guide You’re done!

Measuring Association in Case-Control Studies All the examples above were for cohort studies or clinical trials in which we compared either cumulative incidence or incidence rates among two or more exposure groups. However, in a true case-control study we don't measure and compare incidence. There is no "follow-up" period in case-control studies. In the module on Overview of Analytic Studies we considered a rare disease in a source population that looked like this: This view of the population is hypothetical because it shows us the exposure status of all subjects in the population. Another way of looking at this association is to consider that the "Diseased" column tells us the relative exposure status in people who developed the outcome (7/6 = 1.16667), and the "Total" column tells us the relative exposure status of the entire source population (1,007/5,640 = 0.1785). The Odds Ratio The relative exposure distributions (7/6) and (10/56) are really odds, i.e. the odds of exposure among cases and non-diseased controls.

Introduction to Principal Component Analysis (PCA) - Laura Diane Hamilton Principal Component Analysis (PCA) is a dimensionality-reduction technique that is often used to transform a high-dimensional dataset into a smaller-dimensional subspace prior to running a machine learning algorithm on the data. When should you use PCA? It is often helpful to use a dimensionality-reduction technique such as PCA prior to performing machine learning because: Reducing the dimensionality of the dataset reduces the size of the space on which k-nearest-neighbors (kNN) must calculate distance, which improve the performance of kNN. What does PCA do? Principal Component Analysis does just what it advertises; it finds the principal components of the dataset. Can you ELI5? Let’s say your original dataset has two variables, x1 and x2: Now, we want to identify the first principal component that has explains the highest amount of variance. Let's say we just wanted to project the data onto the first principal component only. Here is a picture: You can think of this sort of like a shadow.

THE DECISION TREE FOR STATISTICS The material used in this guide is based upon "A Guide for Selecting Statistical Techniques for Analyzing Social Science Data," Second Edit ion, produced at the Institute for Social Research, The University of Michigan, under the authorship of Frank M. Andrews, Laura Klem, Terrence N. Davidson, Patrick O'Malley, and Willard L. Rodgers, copyright 1981 by The University of Michigan, All Rights Reserved. The Decision Tree helps select statistics or statistical techniques appropriate for the purpose and conditions of a particular analysis and to select the MicrOsiris commands which produce them or find the corresponding SPSS and SAS commands. Start with the first question on the next screen and choose one of the alternatives presented there by selecting the appropriate link. The "Statistics Programs" button provides a table of all statistics mentioned which can be produced by MicrOsiris, SPSS, or SAS and the corresponding commands for them. GlossaryReferences

Do Faster Data Manipulation using These 7 R Packages Introduction Data Manipulation is an inevitable phase of predictive modeling. A robust predictive model can’t be just be built using machine learning algorithms. But, with an approach to understand the business problem, the underlying data, performing required data manipulations and then extracting business insights. Among these several phases of model building, most of the time is usually spent in understanding underlying data and performing required manipulations. This would also be the focus of this article – packages to perform faster data manipulation in R. What is Data Manipulation ? If you are still confused with this ‘term’, let me explain it to you. Actually, the data collection process can have many loopholes. At times, this stage is also known as data wrangling or data cleaning. Different Ways to Manipulate / Treat Data: There is no right or wrong way in manipulating data, as long as you understand the data and have taken the necessary actions by the end of the exercise. #or

The Central Limit Theorem To understand the wildness of samples, we would choose thousands of samples, calculate an x-bar for each, and display the x-bars in a histogram. This histogram represents a sampling distribution and when we look at it we see something truly amazing. Sampling distributions tend to be far less variable or wild than the populations they are drawn from (See Fig. 1A, 1B, 1C and 1D.) They also have essentially the same mean as the population. Sampling distributions drawn from a uniformly distributed population start to look like normal distributions even with a sample size as small as 2 (see Fig. 1B). This may not seem Earth shattering but it’s really quite profound. The situation is similar to hiring Mary Jane, who has a master’s degree in computer science versus Jim Bob who says he can compute. The central limit theorem tells us that a sampling distribution always has significantly less wildness or variability, as measured by standard deviation, than the population it’s drawn from.

Data Preprocessing Tools Advance Macintosh Data Recovery Software and macintosh file retrieval tool for deleted or formatted apple macintosh hard drives. Mac Recovery Software is the most advanced Mac File Recovery application that recovers data from formatted, deleted or corrupted Mac partitions or External mac hard drives. Best Mac Data Recovery Tools is risk-free mac data recovery utility that recovers all important data lost accidental format, virus, file/directory deletion, or even a sabotage. Software fixes damaged mac hard disk and restore mac files within minutes. Platform: Windows Publisher: macdatarecovery.net Date: Size: 1730 KB ADRC Data Recovery Tools v1.0 contains a collection of DIY data recovery tools that supports a wide variety of drives (fixed drives or removable drives) and file systems (FAT12, FAT16, FAT32 and NTFS) for Windows 95/ 98, Windows ME, Windows NT, Windows 2000, Windows XP and Windows 2003 server.The software incorporates extremely simple GUI with novice users in mind.

Deriving Z-Test Formulas: 1-Sample, 1-Sided | Power and Sample Size Knowledge Base | HyLown in Derivation, Normal, z-test, one-sample, one-sided Setup We will derive the formulas for three situations: Normal, Binomial, and Poisson data. Critical Value and Accept/Reject Regions First, let's determine the critical value. So, would we rather have a large or small critical value, and how do we decide? Power We can derive the power formula in a manner very similar to the way we derived the critical value above. Sample Size A formula for sample size can be obtained by algebraically solving for $n$ in the above power formula. Normal Data -- Testing a Mean Suppose the data are $Y_1, Y_2, \dots, Y_n \overset{iid}\sim N(\mu,\sigma^2)$. Binomial Data -- Testing a Proportion Suppose the data $Y_1, Y_2, \dots, Y_n$ represent $n$ independent binary outcomes, each with success probability $p$. Poisson Data -- Testing a Rate Suppose the data $Y_1, Y_2, \dots, Y_n$ represent $n$ independent Poisson random variables, each with rate $\lambda$.

Plot Digitizer

Related: