add <- function(x) {
x + 1
}Class06: R Functions
Background
Function are at the heart of using R. Everything we do involves calling using functions (from data input, analysis to results output).
All functions in R have at least three things:
- A name: the thing we use to call the function
- One or more input arguments that are comma separated
- The body: lines of code between curly brackets { } that does the work of the function
A first function
Let’s write a function that adds some numbers:
Let’s try it out:
add(100)[1] 101
Will this work?
add(c(100, 200, 300))[1] 101 201 301
Modify to be more useful and add more than just 1
add <- function(x, y=1) {
x + y
}add(100, 10)[1] 110
Will this still work?
add(100)[1] 101
No, because there is no y-component. We have to set y equal to value if adding only one argument or component in the created function.
How to argue:
plot(1:10, col="blue", typ="b")
log(10, base=10)[1] 1
N.B. Input arguments can be either required or optional. The latter have a fall-back default that is specified in the function code with an equals sign.
#add(100, 200, 300)A second function
All functions in R look like this
name <- function(arg) {
body
}
The sample() function in R generates random numbers from the range of numbers given. It randomly picks items from the vector inputted.
sample(1:10, size=4)[1] 5 6 7 2
Q. Return 12 numbers picked randomly from the input 1:10
sample(1:10, size=12, replace=TRUE) [1] 4 6 7 1 9 7 2 6 3 2 10 10
Q. Write the code to generate a random 12 nucleotide long DNA sequence
sample(c("a", "c", "g", "t"), size=12, replace=TRUE) [1] "t" "t" "c" "c" "c" "g" "g" "t" "t" "g" "g" "t"
Another way to write the code:
bases <- c("A", "T", "G", "C")
sample(bases, size=12, replace=TRUE) [1] "G" "A" "T" "C" "C" "G" "G" "C" "T" "A" "A" "T"
Q. Write a first version function called
generate_dna()that generates a user specified lengthn, random DNA sequence
n <- sample(c(4:30), size=1)
generate_dna <- sample(bases, size = n, replace = TRUE)
generate_dna [1] "G" "G" "T" "A" "T" "T" "T" "T" "A" "T" "G" "T" "A" "G" "T" "A" "G" "G" "C"
[20] "A" "G" "C" "T" "G"
generate_dna <- function(n=6) {
bases <- c("A", "T", "G", "C")
sample(bases, size=n, replace=TRUE)
}generate_dna(100) [1] "A" "A" "G" "G" "T" "G" "T" "T" "C" "T" "T" "A" "G" "C" "C" "T" "A" "T"
[19] "C" "G" "A" "T" "A" "C" "A" "G" "T" "C" "T" "C" "C" "G" "A" "G" "G" "G"
[37] "T" "A" "A" "A" "G" "G" "G" "A" "A" "G" "G" "A" "G" "T" "T" "G" "C" "G"
[55] "A" "A" "A" "A" "A" "C" "A" "A" "A" "T" "C" "T" "C" "C" "G" "T" "C" "G"
[73] "A" "G" "T" "A" "T" "A" "G" "C" "A" "A" "A" "G" "A" "T" "C" "T" "C" "A"
[91] "A" "G" "A" "G" "G" "A" "C" "T" "G" "T"
Q. Modify your function to return a FASTA-like sequence so rather than [10] “G” “C” “A” “A” “T” we want “GCAAT”
generate_dna <- function(n=6) {
bases <- c("A", "T", "G", "C")
sequence <- sample(bases, size=n, replace=TRUE)
sequence <- paste(sequence, collapse="")
return(sequence)
}generate_dna(10)[1] "GGAATGAAGT"
Q. Give the user an option to return FASTA format output sequence or standard multi-element vector format
generate_dna <- function(n=6, fasta=TRUE) {
bases <- c("A", "T", "G", "C")
sequence <- sample(bases, size=n, replace=TRUE)
if(fasta) {
sequence <- paste(sequence, collapse="")
cat("Hello...")
} else {
cat("is it me you're looking for...")
}
return(sequence)
} generate_dna(10)Hello...
[1] "TCCATTCATC"
generate_dna(10, fasta = FALSE)is it me you're looking for...
[1] "T" "A" "T" "A" "A" "A" "A" "A" "C" "C"
A new cool function
Q. Write a function called
generate_protein()that generates a user specific length protein sequence in FASTA format
generate_protein <- function(n) {
aa <- sample(c("A","R","N","D","C","Q","E","G","H",
"I","L","K","M","F","P","S","T","W","Y","V"), size=n, replace=TRUE)
protein <- paste(aa, collapse="")
return(protein)
}generate_protein(10)[1] "FIAMGCLFYV"
Q. Use your new
generate_protein()function to generate sequences between length 6 and 12 amino acids in length and check if any of these are unique in nature (i.e. found in the NR database at NCBI)
generate_protein <- function(n) {
aa <- sample(c("A","R","N","D","C","Q","E","G","H",
"I","L","K","M","F","P","S","T","W","Y","V"), size=n, replace=TRUE)
protein <- paste(aa, collapse="")
return(protein)
}generate_protein(6)[1] "HEQGMM"
generate_protein(7)[1] "SITVSNT"
generate_protein(8)[1] "YCEVTDGP"
generate_protein(9)[1] "LCNQFYICR"
generate_protein(10)[1] "CWRLAFNYHC"
generate_protein(11)[1] "QYIVKKTGKVK"
generate_protein(12)[1] "VIVDKYNASMMS"
Or we could do a for() loop:
for(i in 6:12) {
cat(">", i, sep="", "\n" )
cat(generate_protein(i), "\n")
}>6
ARDDHM
>7
WQYMSYN
>8
EWFNKAVL
>9
SDSNMGRAW
>10
TDCQFDCMLQ
>11
FSNFRTQIRKF
>12
MFHEWVWGKYYA