MoreParallelR::parallel.apply provides a convenient solution
for parallelizing the apply function on array. This
function first breaks the dimension specified by MARGIN
to a list of smaller arrays and then call the mcapply
function to achieve the rest of the parallelization.
parallel.apply(X, MARGIN, FUN, ..., verbose = F, cores = 1, progress.bar = F)
| X | An array, including a matrix. |
|---|---|
| MARGIN | A vector giving the subscripts which the function will be applied over. |
| FUN | The function to be applied. |
| ... | Optional arguments to |
| verbose | Whether to print progress information. |
| cores | The number of cores for parallelization. |
| progress.bar | Whether to show a progress bar.
This requires the package |
An array.
To see better improvement by the parallelization, it is
preferred to have the runtime of FUN longer. In other
words, this solution works better when you have a heavy
workload in the function FUN.
This idea was originally inspired by my advisor, Prof. Guido Cervone, during a casual conversation.
This function is different from
plyr::laply
that it returns an array with the specified MARGIN as
dimensions.
Please be aware of whether your FUN behaves
differently for a vector, a matrix, or an array. If you
are applying the function on a matrix or an array, lapply
and plyr:laply will coerce the high-dimensional object
to vector; but parallel.apply will take the data AS IT IS
to feed the FUN. This might cause different results
from this function and apply.
# This example shows you how to run parallel.apply on a synthetic # array and the how the performance compares to a serial run. # library(profvis) profvis({ library(MoreParallelR) library(magrittr) # Generate synthesized data dims <- c(80 , 90, 100, 15) X <- dims %>% prod() %>% runif(min = 1, max = 10) %>% array(dim = dims) MARGIN <- c(2, 4) cores <- 4 FUN <- function(v) { library(magrittr) # A costly function ret <- v %>% as.vector() %>% sin() %>% cos() %>% var() return(ret) } # Run the paralle code X.new.par <- parallel.apply( X, MARGIN, cores = cores, FUN) # Run the serial code X.new.sq <- apply(X, MARGIN, FUN) # Compare results identical(X.new.par, X.new.sq) })