- Defining functions: Tying related commands into bundles
- Interfaces: Controlling what the function can see and do
- Inputs
- Environment
- Outputs and side effects
Data structures tie related values into one object
Functions tie related commands into one object
In both cases: easier to understand, easier to work with, easier to build into larger things
cube <- function(x) x ^ 3 cube
## function(x) x ^ 3
cube(3)
## [1] 27
cube(1:10)
## [1] 1 8 27 64 125 216 343 512 729 1000
cube(matrix(1:8, 2, 4))
## [,1] [,2] [,3] [,4] ## [1,] 1 27 125 343 ## [2,] 8 64 216 512
matrix(cube(1:8), 2, 4)
## [,1] [,2] [,3] [,4] ## [1,] 1 27 125 343 ## [2,] 8 64 216 512
# cube(array(1:24, c(2, 3, 4))) # cube each element in an array mode(cube)
## [1] "function"
\[ \psi(x) = \left\{ \begin{array}{cl} x^2 & \mathrm{if}~|x|\leq 1\\ 2|x|-1 &\mathrm{if}~ |x| > 1\end{array}\right. \]
if (x^2 < 1) { x^2 } else if (x >= 1) { 2*x-1 } else { -2*x-1 }
Call function()
to create your own function. Document your function with comments
# "Robust" loss function, for outlier-resistant regression # Inputs: vector of numbers (x) # Outputs: vector with x^2 for small entries, 2|x|-1 for large ones psi.1 <- function(x) { psi <- ifelse(x^2 > 1, 2*abs(x)-1, x^2) return(psi) }
Our functions get used just like the built-in ones:
z <- c(-0.5,-5,0.9,9) psi.1(z)
## [1] 0.25 9.00 0.81 17.00
psi.1
## function(x) { ## psi <- ifelse(x^2 > 1, 2*abs(x)-1, x^2) ## return(psi) ## }
x <- seq(from = -5, to = 5, by = 0.01) plot(x, psi.1(x), type="l", col="red")
The structure of a function has three basic parts:
R doesn't let your function have multiple outputs, but you can return a list
ifelse()
, abs()
, operators ^
and >
, and could also call other functions we've writtenreturn()
says what the output is; alternately, return the last evaluationWith no explicit return()
statement, the default is just to return whatever is on the last line. So the following is equivalent to what we had before
psi.1 <- function(x) { psi <- ifelse(x^2 > 1, 2*abs(x)-1, x^2) psi }
# "Robust" loss function, for outlier-resistant regression # Inputs: vector of numbers (x), scale for crossover (c) # Outputs: vector with x^2 for small entries, 2c|x|-c^2 for large ones psi.2 <- function(x, c) { psi <- ifelse(x^2 > c^2, 2*c*abs(x)-c^2, x^2) return(psi) } identical(psi.1(z), psi.2(z,c=1))
## [1] TRUE
Our function can also specify default values for the inputs (if the user doesn't specify an input in the function call, then the default value is used)
# "Robust" loss function, for outlier-resistant regression # Inputs: vector of numbers (x), scale for crossover (c) # Outputs: vector with x^2 for small entries, 2c|x|-c^2 for large ones psi.2 <- function(x, c = 1) { psi <- ifelse(x^2 > c^2, 2*c*abs(x)-c^2, x^2) return(psi) } identical(psi.2(z,c=1), psi.2(z))
## [1] TRUE
While named inputs can go in any order, unnamed inputs must go in the proper order (as they are specified in the function's definition). Sometimes, the code would even throw an error:
psi.2(z, 1)
## [1] 0.25 9.00 0.81 17.00
psi.2(1, z)
## [1] -1.25 1.00 0.99 1.00
When calling a function with multiple arguments, use input names for safety, unless you're absolutely certain of the right order for (some) inputs
Named arguments can go in any order when explicitly tagged:
identical(psi.2(x=z,c=2), psi.2(c=2,x=z))
## [1] TRUE
Problem: Odd behavior when arguments aren't as we expect
psi.2(x=z,c=c(1,1,1,10))
## [1] 0.25 9.00 0.81 81.00
psi.2(x=z,c=-1)
## [1] 0.25 -11.00 0.81 -19.00
Solution: Put little sanity checks into the code
# "Robust" loss function, for outlier-resistant regression # Inputs: vector of numbers (x), scale for crossover (c) # Outputs: vector with x^2 for small entries, 2c|x|-c^2 for large ones psi.3 <- function(x,c=1) { # Scale should be a single positive number stopifnot(length(c) == 1,c>0) psi <- ifelse(x^2 > c^2, 2*c*abs(x)-c^2, x^2) return(psi) }
Arguments to stopifnot()
are a series of expressions which should all be TRUE; execution halts, with error message, at first FALSE (try it!)
When creating a function in R, though you cannot return more than one output, you can return a list. This (by definition) can contain an arbitrary number of arbitrary objects
# "Robust" loss function, for outlier-resistant regression # Inputs: vector of numbers (x), scale for crossover (c) # Outputs: vector with x^2 for small entries, 2c|x|-c^2 for large ones psi.4 <- function(x,c=1) { # Scale should be a single positive number stopifnot(length(c) == 1,c>0) psi <- ifelse(x^2 > c^2, 2*c*abs(x)-c^2, x^2) return(list(psi = psi, x = x, c = c)) } psi.4(z)
## $psi ## [1] 0.25 9.00 0.81 17.00 ## ## $x ## [1] -0.5 -5.0 0.9 9.0 ## ## $c ## [1] 1
A side effect of a function is something that happens as a result of the function's body, but is not returned. Examples:
# "Robust" loss function, for outlier-resistant regression # Inputs: vector of numbers (x), scale for crossover (c) # Outputs: vector with x^2 for small entries, 2c|x|-c^2 for large ones psi.5 <- function(x,c = 1) { # Scale should be a single positive number stopifnot(length(c) == 1,c>0) cat(paste0("x = ", x, ", c = ", c, "\n")) psi <- ifelse(x^2 > c^2, 2*c*abs(x)-c^2, x^2) return(list(psi = psi, x = x, c = c)) } psi.5(z)
## x = -0.5, c = 1 ## x = -5, c = 1 ## x = 0.9, c = 1 ## x = 9, c = 1
## $psi ## [1] 0.25 9.00 0.81 17.00 ## ## $x ## [1] -0.5 -5.0 0.9 9.0 ## ## $c ## [1] 1
x <- 7 y <- c("A","C","G","T","U") adder <- function(y) { x<- x+y; return(x) } adder(1)
## [1] 8
x
## [1] 7
y
## [1] "A" "C" "G" "T" "U"
circle.area <- function(r) { return(pi*r^2) } circle.area(c(1,2,3))
## [1] 3.141593 12.566371 28.274334
truepi <- pi pi <- 3 circle.area(c(1,2,3))
## [1] 3 12 27
pi <- truepi # Restore sanity circle.area(c(1,2,3))
## [1] 3.141593 12.566371 28.274334
pi
, letters
, month.names
, etc.Not all side effects are desirable. One particularly bad side effect is if the function's body changes the value of some variable outside of the function's environment
You can write top-level code, right away, for your function's design:
# Not actual code big.job = function(lots.of.arguments) { first.result = first.step(some.of.the.args) second.result = second.step(first.result, more.of.the.args) final.result = third.step(second.result, rest.of.the.args) return(final.result) }
After you write down your design, go ahead and write the sub-functions (here first.step()
, second.step()
, third.step()
). The process may be iterative, in that you may write these sub-functions, then go back and change the design a bit, etc.
With practice, this design strategy should become natural