R is an environment for statistical computing. This is a cheatsheet and a collection of additional notes, mostly collected during university courses or home exercises.

Typing `help(function)`

or `example(function)`

into the R console, gives us quite useful documentation.

Basic vectors are initialized with the `c()`

function. It accepts values or other vectors as arguments and it'll merge them.

Repeated sequences can be generated with the `rep(values, counts)`

function. For example `rep(1:3, c(2, 2, 3))`

will return `1, 1, 2, 2, 3, 3, 3`

.

We can generate closed numeric sequences with the `seq(from, to, step)`

function or with the `from:to`

shorthand.

Matrix objects can be generated by using the `matrix(values, rows, cols)`

function or by setting the shape of a vector with `dim(x) <- c(rows, cols)`

.

The built-in functions in R to calculate summary statistics are using the formulas for sampled data. We can fake the raw data with some `c(rep(mark1, af1), rep(mark2, af2))`

calls.

Alternatively, we can apply some corrections to the built-in functions to get the precise result. In common cases (such as variance or covariance) this equals to replacing the `n - 1`

denominator with `n`

.

```
empirical <- function (fn, x) {
fn(x) * (length(x) - 1) / length(x)
}
```

We can start writing to a file with `sink(path)`

and return the output to the console with `sink(NULL)`

. Writing both to a file and to the console can be done with `split=TRUE`

.

Outputting to images can be started by calling the `png(path, width=w, height=h)`

or `jpeg(path, width=w, height=h)`

functions. Afterwards all plots are written to the file. This can be stopped with `dev.off()`

.

R has powerful plotting functions to represent different types of data. These functions generally accept the `main`

, `xlab`

and `ylab`

parameters for the different labels. Simple ones are `pie(x)`

, `barplot(x)`

, `plot(x, y)`

, `plot.stepfun(x, pch=16)`

, `boxplot(x)`

.

Histograms can be plotted with the `hist(x)`

function. With `freq=FALSE`

the function will use relative frequencies. We can specify the breaks with the `breaks`

parameter or by passing custom breaks as an argument: `hist(x, c(0, k1, k2, n))`

Calculating quantiles for raw data can be done in many ways. Tukey's five numbers (min, 1st quartile, median, 3rd quartile, max) with `fivenum(x)`

, specific percentile with `quantile(x, probs=c(p1, p2))`

. Calculating the quantile of grouped data can be done with faking the raw data (biased quantile) or by implementing its formula as a function.

The built-in `cov(x, y)`

function is handy to calculate the covariance of sampled data. We can calculate difference correlation coefficients with the `cor(x, y, method="method")`

function, where the method can be one of `"pearson"`

, `"spearman"`

or `"kendall"`

.

R provides a very simple way to fit linear regression models through the `lm(y~x)`

or `lm(y ~ x1 + x2)`

function. Afterwards, using the `summary(model)`

function we can get values like the intercept, slope or R squared.

We can plot the model by calling `abline(model)`

after a `plot`

call.

The package `gtools`

provides simple helper functions for combinatorics: `permutations(length(x), n, x)`

and `combinations(n, k, x)`

. `choose(n, k)`

provides a quick way to calculate binomial coefficients.