There’s a lot of terminology used in R, some of which is widely used in programming and bioinformatics and some of which is more specific to R. In this document we provide a glossary of commonly used terms and their meaning.
Argument - something that is given as input to a function. For example, if you run the code “mean(x)”, mean is the function and x is the argument that is given to that function. Different functions will take a different number of arguments. The function “mean” takes a single argument while the function “seq” would usually take 3 arguments. If you’re not sure how many arguments a function takes, type ? followed by the function name into the bottom left window in RStudio to access the help pages.
Character data - what our brain would recognise as letters, words and sentences. Data like sample names, countries, reagents and cell lines are character data. You can recognise this sort of data in R as it has quotes around it. To create a new variable containing character data, you’ll need to put quotes around the data, for example to create a vector called cell_line with two cell lines in it, you’d use:
Command - another name for a function.
Comment - a bit of text in your code that is not run. Comments follow the “#” character and are usually used to write a description of what the line of code does. For example:
The line on the left hand side of the “#” is the code that will be run. The line on the right hand side of the “#” is a description of that code and will not be run.
Conditional statement - checks to see if a particular critera is met. If it is, a bit more code will be run. If it is not, the code won’t be run. Setting up a conditional statement is covered in section 6 of the Introduction to R tutorial, but in short you have a criteria specified between “(” and “)” and if that criteria is met, you have some code between “{” and “}” that is run. For example:
Here, we create a variable called “a”. The conditional statement checks if the value in a is greater than 5 and if it is prints a statement to the screen.
Curly brackets {} - specify the beginning and end of a segment of code, usually code associated with a function, loop or conditional statement. For example:
a <- c(1,2,3,4,5) #This line creates a vector containing the numbers 1 to 5
for (eachNumber in a) { #This { represents the start of the code that will be run in the for loop
if (eachNumber > 1) { #This { represents the start of the code that will be run if the condition in the conditional statement is matched
print(eachNumber)}} #These } represent the end of the code in the conditional statement and for loop. The first } closes the most recently opened { (in this case the conditional statement)
Here, we create a vector containing 5 numbers. We then loop through that vector and print the number to the screen if it is greater than 1. The code that runs in the for loop starts with the “{” at the end of the for loop line. The code that will run if the conditional statement is matched starts with the “{” at the end of the “if” line. The first “}” closes the code that started with the most recent “{”. So here, the first “}” following “print(eachNumber)” closes the conditional statement code and the second “}” closes the for loop code. The “}”’s immediately follow one another here but they don’t need to.
Data frame - a common way of working with tables in R. Similar to how you’d have your data in an excel spreadsheet. Consists of columns and rows. Each column can only contain one type of data, e.g. character data or numerical data. Different columns can contain different types of data.
Data table - a common way of working with tables in R. Often easier to work with than data frames. Similar to how you’d have your data in an excel spreadsheet. Consists of columns and rows. Each column can only contain one type of data, e.g. character data or numerical data. Different columns can contain different types of data.
Directory - the computer science name for a folder on your computer, for example the Documents directory is just the Documents folder on your computer. The directory you are currently in is called the working directory. Its important to know what the current working directory is if you want to open or save anything with R as you either need to be in the same directory as the file you want to open/save or tell R where the file is relative to your working directory. You can change directory in RStudio using the command “setwd” which sets the working directory.
For loop - used to run through a set of values and do something with each value in turn. How to use for loops is covered in section 6 of the Introduction to R tutorial, but in short you specify what values to run through using code between “(” and “)” and then specify what to do with each of these values using code between “{” and “}”.
Function - code that takes some input, does something with that input and produces some output. R contains a large number of in-built functions, for example the function mean() calculates the mean of a set of numbers. The values that you give to the function between the “(” and “)” are the arguments for the function and different functions take different numbers of arguments. You can also write your own functions to carry out useful tasks for your data and install libraries that contain a set of functions for working with a particular type of data (e.g. RNASeq data or phylogenetic trees).
Help pages - where you can go to get information on a function if you’re not sure how to run it. The help page for a function gives you information on what input to give it, what the function does and what output it gives you. You can get to the help page for a function by typing ? followed by the function’s name in the bottom left window of RStudio or by typing the function’s name into the search bar in the “Help” tab in the bottom right window of RStudio.
Library - a set of functions that are useful for analysing a particular type of data. For our purposes, interchangeable with package. To use the functions within a library, you first need to install the library using “install.packages(PackageName)” and then load the library using “library(PackageName)”. You only need to install the library once per computer, but you need to load the library once in each RStudio session in which you want to use it.
Logical data - data that consists of true or false. For example, you could have logical data describing whether samples within an experiment match a particular condition. Used by conditional statements to determine whether to proceed or not.
Numeric data - what our brain would see as numbers. Includes whole numbers, decimals and logarithms. R will be able to carry out mathematical operations with this sort of data. To create an object with numerical data, you don’t put quotes around the numbers so to create a vector containing 5 numbers, you’d use:
Package - contains a set of functions useful for analysing a particular type of data. For our purposes, interchangeable with library. To use the functions within a package, you first need to install the package using “install.packages(PackageName)” and then load the package using “library(PackageName)”. You only need to install the package once per computer, but you need to load the package once in each RStudio session in which you want to use it.
Square brackets [] - used to refer to columns, rows or cells in data frames and data tables, to positions within a vector and to objects within a list. The numbers within the square brackets determines the numbers of the row, column, cell, position or object that is being referred to. For data frames and data tables, values are referred to as [row_number, column_number]. List positions are referred to in double square brackes. Examples of square bracket usage:
dataFrameA = data.frame(Numbers = c(1,2,3,4,5), Letters = c("A","B","C","D","E")) #Creates a data frame containing 5 numbers and 5 letters
dataFrameA[,1] #Print the values in column 1 of dataFrameA
dataFrameA[1,] #Print the values in row 1 of dataFrameA
dataFrameA[1,1] #Print the value in column 1 of row 1 of dataFrameA
dataFrameA[c(1,2),] #Print the values in rows 1 and 2 of dataFrameA
vectorA <- c(1,2,3,4,5) #Creates a vector containing 5 numbers
vectorA[1] #Print the first value in vectorA
vectorA[1:4] #Print the first four values in vectorA
listA <- list(1,2,3,4,5) #Creates a list containing 5 numbers
listA[[1]] #Print the first object in listA
listA[[1:4]] #Print the first four objects in listA
Tab complete - a useful trick to reduce the amount that you have to type. If you start to type a function, argument to a function or file name and press tab, R will show you the possibilities that start with what you’ve typed. You can then select the one that you want using your mouse, the arrow keys or, if its the first one in the list, by pressing tab again.
Variable - an object that is assigned one or more values. For example, all of the following commands assign one or more values to a variable:
numberA <- 1 #Create a variable containing the number 1
vectorA <- c(1,2,3,4,5) #Create a vector variable containing 5 numbers
dataFrameA <- data.frame(Numbers = c(1,2,3,4,5), Letters = c("A","B","C","D","E")) #Create a data frame variable containing 2 columns and 5 rows
library(data.table)
dataTableA <- data.table(Numbers = c(1,2,3,4,5), Letters = c("A","B","C","D","E")) #Create a data table variable containing 2 columns and 5 rows
listA <- list(1,2,3,4,5) #Create a list variable containing 5 numbers
Vector - a set of values of a particular data type, created using c().