An Excel TUTORIAL for Introductory Statistics Applications

Prof. Vladimir A. Dobrushkin

Heath Loder

This tutorial contains many R scripts. You, as the user, are free to use all codes for your needs, and have the right to distribute this tutorial and refer to this tutorial as long as this tutorial is accredited appropriately. Any comments and/or contributions for this tutorial are welcome; you can send your remarks to

Email Vladimir Dobrushkin

Preface

The purpose of this tutorial is to introduce students in introductory statistics courses to the programming language R. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team, of which Chambers is a member. R is named partly after the first names of the first two R authors and partly as a play on the name of S.

R is a free software environment for statistical computing and graphics. R is available as Free Software under the terms of the Free Software Foundation’s GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS. It is a GNU project which is similar to the S language and environment that was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.

The term “environment” is intended to characterize it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software. We will start by showing you how to download the program, and the recommended interface for computing with R is Rstudio. R is an incredibly powerful software, with a slight focus on statistical computing.

Download R from the site:

https://cran.r-project.org/mirrors.html

Follow the link given or search CRAN within Google, and select the download for your appropriate operating system.

R, like S, is designed around a true computer language, and it allows users to add additional functionality by defining new functions. Much of the system is itself written in the R dialect of S, which makes it easy for users to follow the algorithmic choices made. For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly.

This tutorial contains many R scripts. You, as the user, are free to use all codes for your needs, and have the right to distribute this tutorial and refer to this tutorial as long as this tutorial is accredited appropriately. Any comments and/or contributions for this tutorial are welcome; you can send your remarks to Vladimir_Dobrushkin@brown.edu