MoreParallelR::index.apply makes it convenient to parallel processing with multiple arrays. Sometimes, the parallelization becomes hard because you need to access index (positions) during the process. This function is designed specifically for this situation.

index.apply(X, MARGIN, FUN, ..., verbose = F, debug = F, cores = 1,
  progress.bar = F)

Arguments

X

A list of arrays or matrices.

MARGIN

A vector giving the subscripts which the function will be applied over. Slices from all elements in the input list X will be created.

FUN

The function to be applied. The first argument of this function should take a member from the list that is constructed internally by this function. Please use debug = T to examine this internally constructed list.

...

Additional arguments to FUN.

verbose

Whether to print progress information.

debug

Set this to TRUE to return the internally constructed list. This is helpful when you are designing your function.

cores

The number of cores for parallelization.

progress.bar

Whether to show a progress bar. This requires the package pbmcapply.

Value

An array.

Details

The way this has been done is that a list of arrays are expected from the user and the MARGIN dimensions should be identical because those are the dimensions to be partitioned. All members in the list will be partitioned and put into a separate list. The first member in this list is the corresponding index for the MARGIN dimensions and the subsequent members in this list are the input partitioned arrays.

The difficulty might be how to write FUN. It should be a function which takes one list with the partitioned arrays and any other extra arguments. If you want to examine the partitioned list, it is suggested to use debug = T before you write your FUN.

To see better improvement by the parallelization, it is preferred to have the runtime of FUN longer. In other words, this solution works better when you have a heavy workload in the function FUN.

Examples

# Imagine that you have 2 arrays with different dimensions. a1 <- array(1:1000, dim = c(10, 5, 20)) a2 <- array(1:2000, dim = c(10, 10, 20)) # You have a constant which will be involved # during the calculation. # c <- 5.5 # You have a function that you would like to iterate # on the first and third dimensions of the arrays. # foo <- function(x, y, c) { return(mean(x) + mean(y) + c) } # To write the sequential version, we need to preallocate # a new array and the required memory. # d <- array(NA, dim = c(10, 20)) # We need to write a nested for loop because we want # to apply the function on two arrays based on the iteration # on the first and the third dimensions. # for (i in 1:10) { for (j in 1:20) { d[i, j] <- foo(a1[i, , j], a2[i, , j], c) } } # To use the index.apply function, we need to put # our arrays in a list. # X <- list(a1, a2) # Define our margin MARGIN <- c(1, 3) # Define the functions. The first argument should be # an element from the list which you can get by running # the following code with `debug = T`. # internal.list <- index.apply(X, MARGIN, 4, NULL, debug = T) # The returned value is a list with length of 200, because # the iteration is carried out on the first (length of 10) # and third (length of 20) dimensions. Each element in this # list will be fed into the function as the first argument. # So you should take a look at the element in order to design # your function accordingly. # length(internal.list)
#> [1] 200
# Each element is a list with the iteration index as the # first member, and then the sliced arrays/matrices with # these indices as the following members. # length(internal.list[[1]])
#> [1] 3
names(internal.list[[1]])
#> [1] "index" "array1" "array2"
# Therefore, we design our function accordingly. FUN <- function(l, c) { return(mean(l[[2]]) + mean(l[[3]]) + c)} # Run the same calculation with index.apply. d.new <- index.apply(X, MARGIN, FUN = FUN, c = c, cores = 2) # Check identical(d, d.new)
#> [1] TRUE