MoreParallelR::index.apply makes it convenient to parallel processing with multiple arrays. Sometimes, the parallelization becomes hard because you need to access index (positions) during the process. This function is designed specifically for this situation.
index.apply(X, MARGIN, FUN, ..., verbose = F, debug = F, cores = 1, progress.bar = F)
X | A list of arrays or matrices. |
---|---|
MARGIN | A vector giving the subscripts which the
function will be applied over. Slices from all elements
in the input list |
FUN | The function to be applied. The first argument
of this function should take a member from the list that is
constructed internally by this function. Please use
|
... | Additional arguments to |
verbose | Whether to print progress information. |
debug | Set this to TRUE to return the internally constructed list. This is helpful when you are designing your function. |
cores | The number of cores for parallelization. |
progress.bar | Whether to show a progress bar.
This requires the package |
An array.
The way this has been done is that a list of arrays are expected from the user and the MARGIN dimensions should be identical because those are the dimensions to be partitioned. All members in the list will be partitioned and put into a separate list. The first member in this list is the corresponding index for the MARGIN dimensions and the subsequent members in this list are the input partitioned arrays.
The difficulty might be how to write FUN. It should be
a function which takes one list with the partitioned arrays
and any other extra arguments. If you want to examine
the partitioned list, it is suggested to use debug = T
before
you write your FUN.
To see better improvement by the parallelization, it is
preferred to have the runtime of FUN
longer. In other
words, this solution works better when you have a heavy
workload in the function FUN
.
# Imagine that you have 2 arrays with different dimensions. a1 <- array(1:1000, dim = c(10, 5, 20)) a2 <- array(1:2000, dim = c(10, 10, 20)) # You have a constant which will be involved # during the calculation. # c <- 5.5 # You have a function that you would like to iterate # on the first and third dimensions of the arrays. # foo <- function(x, y, c) { return(mean(x) + mean(y) + c) } # To write the sequential version, we need to preallocate # a new array and the required memory. # d <- array(NA, dim = c(10, 20)) # We need to write a nested for loop because we want # to apply the function on two arrays based on the iteration # on the first and the third dimensions. # for (i in 1:10) { for (j in 1:20) { d[i, j] <- foo(a1[i, , j], a2[i, , j], c) } } # To use the index.apply function, we need to put # our arrays in a list. # X <- list(a1, a2) # Define our margin MARGIN <- c(1, 3) # Define the functions. The first argument should be # an element from the list which you can get by running # the following code with `debug = T`. # internal.list <- index.apply(X, MARGIN, 4, NULL, debug = T) # The returned value is a list with length of 200, because # the iteration is carried out on the first (length of 10) # and third (length of 20) dimensions. Each element in this # list will be fed into the function as the first argument. # So you should take a look at the element in order to design # your function accordingly. # length(internal.list)#> [1] 200# Each element is a list with the iteration index as the # first member, and then the sliced arrays/matrices with # these indices as the following members. # length(internal.list[[1]])#> [1] 3#> [1] "index" "array1" "array2"# Therefore, we design our function accordingly. FUN <- function(l, c) { return(mean(l[[2]]) + mean(l[[3]]) + c)} # Run the same calculation with index.apply. d.new <- index.apply(X, MARGIN, FUN = FUN, c = c, cores = 2) # Check identical(d, d.new)#> [1] TRUE