Understanding apply(), lapply(), sapply(), tapply() Functions in R with Examples

One of the widely-used programming languages for statistical computing and developing statistical software in R. The R programming language is licensed under the GNU General Public License. It has all programs for handling interfaces, called command-line interface. Additionally, the R programming language is integrated with other graphical user interfaces, like RStudio, Jupyter notebook. You will learn some significant functions of R, like apply(), tapply(), lapply(), and sapply() in this article. 

What is the R Programming Language?

In 1993, the programming language called R came into existence. Ross Ihaka and Robert Gentleman designed the R programming language. The R programming language is the modern version of the S programming language. The S programming language was also developed for statistical computing. The name R is given to the programming language based on Ross and Robert’s names. 

As the R programming language is specially designed for statistics and graphics, it incorporates various statistical and graphical methods. These methods include classification, clustering, linear modeling, non-linear modeling, and many other techniques. The significant perk of the R programming language is it has object-oriented facilities than other statistical programming languages. 

Let us know how the program is executed in the R programming language. It has an R command prompt that runs any code or a line of code. Suppose a user needs to calculate 2+2 on the R command prompt. It will display result 4, as shown below:

>2+2
[1] 4

Here, every single element is treated as a single vector. Therefore, 2+2 involves two different vectors, each having 2 as its value. Looking at the output, it displays [1] before the actual output, 4. The [1] represents the number of elements in a vector. 

Data Structures in R

The R programming language also supports the use of matrices. It supports data structures, like arrays, matrices, vectors, lists, and data frames. 

sapply
  1. Array:

The array data structure in the R programming language holds data of similar types. It stores data in more than two dimension form. For example, if we define an array (3,5,4), it creates 4 matrices, where each matrix will have 3 rows and 5 columns. You can create an array in R programming language using the array() method. 

The array() function produces an array, which is considered as a vector. The syntax of the array() function in R is given below:

array(data, dim = (nrow, ncol, nmat), dimnames=names)

Here, nrow means the number of rows, ncol implies the number of columns, and the nmat means the number of matrices. 

Let us look at the example of creating an array in R. The below code creates a 3*3 array. 

vector1 <- c(3, 6, 7)
vector2 <- c(1, 4, 5, 2, 9, 8)
res <- array(c(vector1, vector2), dim = (3,3,2)
print(res)

Output:

, , 1
[ , 1][ , 2][ , 3]
[ , 1]312
[ , 2]649
[ , 3]758
, , 2
[ , 1][ , 2][ , 3]
[ , 1]312
[ , 2]649
[ , 3]758
  1. Matrices:

In the R programming language, the matrix data structure represents the data in the form of a two-dimensional rectangular shape. All elements present in the matrix are of the same data types. Like the array() method is used to create an array, the matrix() function is used to create a matrix. 

The matrix is commonly used for mathematical calculations. However, it can also take character values as well as logical values. But, these values are generally not used extensively in matrices. The syntax of the matrix() function in R is given as follows:

matrix(data, nrow, ncol, byrow, dimnames)

Here, data is treated as a vector, which consists of matric elements. The other argument, nrow, implies the number of rows, and ncol means the number of columns. The last argument, dimnames, represents the names of rows and columns. The unique argument in the matrix() is the byrow, which is a logical value. If the byrow is TRUE, all vector elements are represented in a row format. 

Below is the example that will help you in implementing matrices. 

A <- matrix(c(4:12), nrow = 3, byrow = TRUE)
print(A)
B <- matrix(c(4:12), nrow = 3, byrow = FALSE)
print(B)
rnames = c(“row1”, “row2”, “row3”)
cnames = c(“col1”, “col2”, “col3”)
Z <- matrix(c(4:12), nrow = 3, byrow = TRUE, dimnames = list(rnames, cnames))
print(Z)

Output:

[ , 1][ , 2][ , 3]
[ , 1]456
[ , 2]789
[ , 3]101112
[ , 1][ , 2][ , 3]
[ , 1]4710
[ , 2]5811
[ , 3]6912
col1col2col3
row1456
row2789
row3101112
  1. Vectors:

A vector is a very fundamental data structure in the R programming language. It supports six distinct data type values, like integer, double, complex, character, raw, and logical. In the vector data structure, you can include one single element of multiple elements. Let us see examples of single as well as multiple elements in a vector. 

See also  14 Fixes For 'DNS Server Not Responding’ Error

Single-Element:

print("abc");
Output: [1] "abc"
print(5)
Output: [1] 5

Multi-Elements:

v <- 4:10
print(v)
Output: 
[1] 4 5 6 7 8 9 10
  1. List:

A list data structure can hold any data type together, like integers, strings, vectors, etc. It is possible to have a list inside a list. One wondering feature of a list data structure is it can also have a matrix or any function inside it. We create a list in the R programming language using the list() function. 

The following is the list syntax in R:

names(x) <- value

Here, x is an object of the list, and the value represents elements of the object x in the list. Let us see an example of a list in R. 

lstdata <- list("Black", "Blue", c(11, 12, 13), FALSE, 44.55, 456.98)
print(lstdata)

Output:

[[1]]
[1] "Black"

[[2]]
[1] "Blue"

[[3]]
[1] 11 12 13

[[4]]
[1] FALSE

[[5]]
[1] 44.55

[[6]]
[1] 456.98
  1. Data Frames:

Data Frames is another data structure in the R programming language. It is represented in the tabular format. In other words, it can be treated as two-dimensional matrices, where column values can be of any type. The function called data.frame() is used to create the data frame structure. 

Let us depict one example to understand how the data frame is created in R. 

First_Name = c("Sam", "John", "Steve")
Programming_language = c("R", "C", "C++")
Age = c(18, 23, 21)
dfs = data.frame(First_Name, Programming_language, Age)
print(dfs)

Output:

First_Name      Programming_language     Age
Sam                 R                    18
John                C                    23
Steve               C++                  21

Why use R apply() Family Function?

In programming, we use a  for loop for iteration. However, there are some adverse effects of using loops. We use objects in the for loop. These objects remain inside the for loop in the workspace. Some people may require these objects, while some may treat them as unwanted. Let us see how the for loop has side effects with an example. 

> song <- 'Shining in the shade...'
> for(song 1:5) print('...Come on heal me!')

This is the code. What is the expected output? You might expect the outcome as the string, ‘Shining in the shade…’. After running this code, the actual result you get is 5, as for loop runs for 5 iterations. 

Output: 

> song
[1] 5

For each iteration, the variable ‘song’ takes value from the vector ‘song.’ 

The R programming language introduces another robust looping system to avoid the issues caused due to the for loop. The new looping system is apply family. An apply family is the collection of functions that do not have any side effects like the for loop. There are seven functions incorporated in an apply family. 

Features of apply() Family Function

Here are some essential features of the apply() family functions. Before diving into the details of apply() family function, we shall consider some primary features.

  1. There are two arguments present in every function of the apply() family. The first argument to be passed to the apply() function is the object, and the second is the function. R allows a provision of treating a function as an argument while passing it in any of the apply() functions. 
  2. One of the significant and primary features of using the apply() family functions is there are no side effects on the code. 
  3. apply() functions use dots arguments to pass arguments to any function.
  4. Whenever we use any apply() function, it returns some output after executing the code. Use apply() functions only if they are required. For example, if you just want results on the command prompt using the print() function, do not use apply() functions. 

Advantages of apply() Function Over Traditional Loop

Any code or block of code is executed faster with apply() functions than traditional loops. Several built-in packages are available in the R programming language. One of the R’s packages is gamclass, which includes the Fatality Analysis Recording System (FARS) dataset. The FARS dataset has 17 different characteristics having 15118 observations. 

We shall take an example to look at the time required for running the code using the apply() function and the traditional loop method. Consider that we need to find the mean of the given weights. So, we will build a code using the for loop and the apply() function.

library("gamclass")
data(FARS)
mean_weight <- NULL
total_weight <- NULL
for(j in 1:length(FARS$weight)){
total_weight <- sum(total_weight, FARS$weight[j])
}
Mean_weight <- total_weight/ length(FARS$weight)
mean_weight

Using the for loop:

Using apply() function:

apply(FARS[3], 2, mean)

Using the apply() function requires fewer lines of code than the for loop. We have to see the time needed to calculate the mean of the given weights. Therefore, we will use a unique R’s package, called Profvis. 

The Provifs package’s primary goal is to offer a graphical representation of the time and memory consumed by each instruction in the code. Hence, using the Provifs package will enable us to know the time needed to calculate the mean of weight using the for loop and apply() function. 

provifs({
mean_weight <- NULL
total_weight <- NULL
for(j in 1:length(FARS$weight)){
total_weight <- sum(total_weight, FARS$weight[j])
}
Mean_weight <- total_weight/ length(FARS$weight)
Mean_weight
})

Using the for loop:

In the above code’s output, you will notice that the time required for the for loop will be displayed. 

for(j in 1:length(FARS$weight)){

The line of code takes 30 ms to execute. 

 total_weight <- sum(total_weight, FARS$weight[j])

This line of code takes 1600 ms to execute. Therefore, two lines take 1900 ms time for execution.

profvis({
apply(FARS[3], 2, mean)
})

Using apply() function:

In the above code, the line 

apply(FARS[3], 2, mean)

takes only 20 ms for the execution. 

See also  Top 7 Fixes For ERR_EMPTY_RESPONSE Error on Google Chrome

From the above two code executions, we conclude that the apply() function is beneficial and time-saving than the traditional for loop. It is also straightforward to use the apply() function, as it requires fewer code lines. Let us now know different apply() functions and their implementations. 

apply() Function

We have studied five different data structures of the R programming language. The apply() function uses two out of five data structures, data frame, and matrix. In other words, the data frame and matrix are used as input to the apply() function. The apply() function’s output is expressed in a vector. 

The apply() function is the fundamental function of all other functions. It avoids the problem of explicit loop constructs. Let us see what arguments are used in the apply() function. Below is the syntax:

apply(X, MARGIN, FUN)

Let us discuss each apply() function’s argument in detail. The first parameter, X, implies a matrix or an array. Next, MARGIN represents two values 1 and 2, indicating on which data the apply() should be used. 

For MARGIN=1, the apply() function should be used on rows, whereas, for MARGIN=2, the apply() function should be applied on columns. The last parameter, FUN, represents the function to be applied. If you want to use the apply() function on rows and columns, you need to define MARGIN = c(1,2). 

There are several built-in functions in the R programming language, like sum, mean, median, max, and min. You can also use user-defined functions. We shall now see the example where the apply() function can be used. Take an example of adding two matrices over all columns. 

m1 <- matrix(C <- (1:10), nrow=5, ncol=6)
m1
b_m1 <- apply(m1, 2, sum)
b_m1

In the upper lines of code, we have displayed the matrix p. Hence, the output will be:

[ ,1] [ , 2][ , 3][ , 4][ , 5][ ,6]
[1, ]161616
[2, ]272727
[3, ]383838
[4, ]494949
[5, ]510510510

We performed the sum of columns, and the output will be as follows:

Output: 

[1] 15 40 15 40 15 40

lappaly() Function

Another function of the apply() family function is lapply() function. In the lapply() function, we use the list as the input and it produces a list as the output. The output list’s length is the same as that of the input list’s length. In the context of the lapply() function, l implies a list. Additionally, the lapply() function also takes a data frame and vector as inputs and produces the list as output. Below is the syntax of the lapply() function:

lapply(X, FUN)

The lapply() function takes two arguments. Here, X implies an object or vector, and FUN represents any function that is to be applied to the object. 

Difference between apply() and lapply() Functions:

  1. In the apply() function, the input given is the data frame and matrix. On the other hand, the lapply() function takes the data frame, list, and vector as the input. 
  2. The apply() function’s output is represented as a vector, whereas the lapply() function’s output is in the list form. 
  3. The lapply() function does not include the MARGIN argument. 

We take an example to understand how the lapply() function works. In this example, we will convert the uppercase matrix’s string to the lowercase. 

names <- c("JOHN", "STEVE", "STEPHEN", "OLIVER")
names_lowercase <- lappaly(names, tolower)
str(names_lowercase)

Result:

## List of 4
## $ : chr"john"
## $ : chr"steve"
## $ : chr"stephen"
## $ : chr"oliver"

You can convert your list into a vector using the unlist() function. Let us know how the unlist() function works.

names_lowercase <- unlist(lapply(names, tolower))
str (names_lowercase)

Output:

## chr [1:4] "john" "steve" "stephen" "oliver"

sapply() Function

Another function from the apply() family is the sapply() function. The sapply() function takes vector or data frame data structures as the input and produces the output in the vector or matrix form. Additionally, the sapply() function takes the list as the input and generates the same length list as output. The lapply() and the apply() functions are similar, but the only difference is the apply() produces a vector. 

Below is the sapply() function’s syntax:

sapply(X, FUN)

The sapply() function takes two input parameters, X and FUN. The parameter, X, implies a vector or an object, and the FUN means the function to be used with X.  

Difference between the apply() and sapply() Functions:

  1. The apply() function takes the data frame and a matrix as the input, whereas the sapply() function takes the data frame, vector, and list as the input. The lapply() function also takes the same input as the sapply() function. 
  2. Next, the apply() function presents its result as a vector. On the other hand, the sapply() function produces a vector and list as outputs. The lapply() function only has output as the list. 
  3. Like the lapply() function, the sapply() function also does not have MARGIN in its arguments. 
See also  Fix: Twitch Hosting Not Working

The following code depicts an example of the sapply() function. The code below uses the car dataset and calculates cars’ least speed and stopping distances. 

d <- car
lmn_car <- lapply(d, min)
smn_car <- sapply(d, min)
lmn_car

Output: The output below is generated from lapply() function, and displays the min speed and stopping distance. 

## $speed
## [1] 6
## $distance
## [1] 3

smn_car

Output: The below result is produced from the sapply() function, representing the least speed and stopping distances. 

##   speed      distance
##    6             3

Let us see a different example, which displays the maximum stopping distance and speed of cars. 

lmxcar <- lapply(d, max)
smxcar <- sapply(d, max)
lmxcar

Output: Here, the result is obtained from the lapply() function, showing the maximum speed and distance. 

## $speed
## [1] 30
## $distance
## [1] 150

smxcar

The output of the above line is obtained using the sapply() function. 

Output:

## speed      distance
##  30          150

One of the significant advantages of using lapply() and sapply() functions is that users can use user-defined functions. We will now know how the user-defined function can be used within lapply(), and sapply() functions. 

In the current example, we will define an avgr function, which will find the least and the maximum average of a vector. 

avgr <- function(x){
(min(x) + max(x))/2}
car <- sapply(d, avgr)
car

Output:

## speed         distance
##   12           76.5

Difference between apply(), sapply(), and lapply() Functions

The following table depicts overall differences between the above three functions, apply(), lapply(), and sapply().

FunctionArgumentsObjectiveInputOutput
apply()It takes three arguments, namely, X, MARGIN, and FUN. We use the apply() function on rows and columns of a matrix. The apply() function takes a data frame or vector as input. It produces output in the form of a list, array, or vector. 
lapply()The lapply() function includes two arguments, X and FUN. It does not have the MARGIN. It is used to apply on all elements of a list, vector, or data frame. This function involves input data structures, like list, vector, or data frame. The lapply() function generates only a list as its output. 
sapply()This function also has two arguments, X and FUN. The ARGIN argument is not included. You can use the sapply() function to use on all elements of the input. The sapply() function also takes a list, vector, or data frame as input. It represents the output in the vector or matrix form. 

tapply() Function

The tapply() function is used to calculate the mean, median, average, max, min, sum, etc. You can computer these measures for every factor variable of a vector. One of the interesting parts of the tapply() function is taking a vector’s any subset part and performing any measures on it. Below is the syntax of the tapply() function:

tapply(X, INDEX, FUN=NULL)

Here, X is an object or a vector, INDEX is a list containing factor, and FUN is a function applied to X. 

We shall understand the tapply() function by taking an example of the iris dataset. Data scientists or researchers perform grouping of data based on specific characteristics, like ID, country, or city. The machine learning domain uses the iris dataset widely. 

Suppose there are three different flower types, Sepal, Versicolor, and Virginica. The iris dataset predicts the flower species by collecting information, like the length and width of flowers. Consider that we need to calculate the median length of every species. You can use the tapply() function to calculate the median of length. 

data(iris)
tapply(iris$Sepa.Width, iris$Species, median)

Output:

## sentosa      versicolor      verginica
##   3.0           3.4              2.8

mapply() Function

The mapply() function in the apply() family is similar to the sapply() function. It also generates a vector as the output. The mapply() function is also referred to as the multi-variate function. The name is the multi-variate function, as it can be used with multiple vector and list arguments. In other words, the mapply() function is used to carry out iterations on multiple objects parallelly. The FUN in the mapply() function is used with every element of each argument. 

The following example will make your understanding more clear about the mapply() function. 

mapply(function(a,b))
{a^b}, a=c(3,4), b=c(2,3)

Output:

[1] 9 64

Here, the function is the argument passed to the mapply() function. The function involves two parameters, a and b. The second argument given is a=c(3,4), and the third argument is b= c(2,3). Hence, a and b have two different values. So, the function in mapply() is called two times. The first call for the mapply() function is for a=3 and y=2. For x=4 and y=3, the second call is given to the mapply() function. 

Conclusion

The R programming language is specially developed for representing statistical and graphical data. The RStudio is a particular interface designed for the R language. There are five primary data structures in the R languages. They are Vector, List, Matrices, Data Frame, and Array. We have seen each of these R data structures in detail with their syntaxes. 

Several functions are incorporated in the apply() family function. This article includes apply(), lapply(), sapply(), tapply(), and mapply() functions. Each function belongs to the apply() family and is explained clearly with example. 

Later, we went through the difference between apply() and lapply() functions, ad apply() and sapply() functions. A detailed comparison of apply(), lapply(), and sapply() functions is depicted in a table, which makes any reader easier to understand. 

Recommended Articles