# Notes on the R language

R is an environment for statistical computing. This is a cheatsheet and a collection of additional notes, mostly collected during university courses or home exercises.

Typing `help(function)` or `example(function)` into the R console, gives us quite useful documentation.

### Vector objects

Basic vectors are initialized with the `c()` function. It accepts values or other vectors as arguments and it'll merge them.

Repeated sequences can be generated with the `rep(values, counts)` function. For example `rep(1:3, c(2, 2, 3))` will return `1, 1, 2, 2, 3, 3, 3`.

We can generate closed numeric sequences with the `seq(from, to, step)` function or with the `from:to` shorthand.

### Matrix objects

Matrix objects can be generated by using the `matrix(values, rows, cols)` function or by setting the shape of a vector with `dim(x) <- c(rows, cols)`.

### Empirical data

The built-in functions in R to calculate summary statistics are using the formulas for sampled data. We can fake the raw data with some `c(rep(mark1, af1), rep(mark2, af2))` calls.

Alternatively, we can apply some corrections to the built-in functions to get the precise result. In common cases (such as variance or covariance) this equals to replacing the `n - 1` denominator with `n`.

``````empirical <- function (fn, x) {
fn(x) * (length(x) - 1) / length(x)
}
``````

### I/O

We can start writing to a file with `sink(path)` and return the output to the console with `sink(NULL)`. Writing both to a file and to the console can be done with `split=TRUE`.

Outputting to images can be started by calling the `png(path, width=w, height=h)` or `jpeg(path, width=w, height=h)` functions. Afterwards all plots are written to the file. This can be stopped with `dev.off()`.

### Charts, plots

R has powerful plotting functions to represent different types of data. These functions generally accept the `main`, `xlab` and `ylab` parameters for the different labels. Simple ones are `pie(x)`, `barplot(x)`, `plot(x, y)`, `plot.stepfun(x, pch=16)`, `boxplot(x)`.

Histograms can be plotted with the `hist(x)` function. With `freq=FALSE` the function will use relative frequencies. We can specify the breaks with the `breaks` parameter or by passing custom breaks as an argument: `hist(x, c(0, k1, k2, n))`

### Quantiles, percentiles

Calculating quantiles for raw data can be done in many ways. Tukey's five numbers (min, 1st quartile, median, 3rd quartile, max) with `fivenum(x)`, specific percentile with `quantile(x, probs=c(p1, p2))`. Calculating the quantile of grouped data can be done with faking the raw data (biased quantile) or by implementing its formula as a function.

### Measures of correlation

The built-in `cov(x, y)` function is handy to calculate the covariance of sampled data. We can calculate difference correlation coefficients with the `cor(x, y, method="method")` function, where the method can be one of `"pearson"`, `"spearman"` or `"kendall"`.

### Linear Regression

R provides a very simple way to fit linear regression models through the `lm(y~x)` or `lm(y ~ x1 + x2)` function. Afterwards, using the `summary(model)` function we can get values like the intercept, slope or R squared.

We can plot the model by calling `abline(model)` after a `plot` call.

### Combinatorics

The package `gtools` provides simple helper functions for combinatorics: `permutations(length(x), n, x)` and `combinations(n, k, x)`. `choose(n, k)` provides a quick way to calculate binomial coefficients.