Open Source Your Knowledge, Become a Contributor

Technology knowledge has to be shared and made accessible for free. Join the movement.

Create Content

Reductions

Reductions are among the most useful MPI operations you can use. Reductions are basically a very simple operation that will be applied on all the buffers the processes. The operation can be either user-specified (we will not cover this here) or from the list of pre-defined operations. Usually, the predefined operations are largely sufficient for any application. Let's take, once again, a very simple (and very inefficient) example. Consider a system where you have N processes. The goal of the game is to compute the dot product of two N-vectors in parallel. Now the dot product of two vectors u and v, for those who forgot, is the following operation :

uv=u1v1+u2v2+...+uNvN

As you can imagine, this is highly parallelizable. If you have N processes, each process i can compute the intermediate value ui×vi. Then, the program needs to find a way to sum all of these values. This is where the reduction comes into play. We can ask MPI to sum all those value and store them either on only one process (for instance process 0) or to redistribute the value to every process. Here is how we would do it in C++ :

Reduction example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include <iostream>
#include <mpi.h>
#include <cmath>
int main(int argc, char **argv) {
MPI_Init(&argc, &argv);
int rank, size;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
// The initial values, u_i = i^2; v_i = log(i+1)
float u_i = rank*rank;
float v_i = log(rank+1.0);
// Computing the intermediate value
float tmp = u_i * v_i;
// Reducing on process 0 :
float result;
MPI_Reduce(&tmp, &result, 1, MPI_FLOAT, MPI_SUM, 0, MPI_COMM_WORLD);
if (rank == 0) {
std::cout << "The reduced value is " << result << std::endl;
// Checking the result
float validation = 0.0f;
for (int i=0; i < size; ++i)
validation += i*i * log(i+1.0f);
std::cout << "Validation gives the value : " << validation << std::endl;
}
MPI_Finalize();
return 0;
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

As for broadcasting, reduction also provides optimised implementation that will be much better than any naive solution you could come up with p2p communications.

Operations

In the example, you see that we indicate we want to sum all the values using the MPI_SUM flag. There are other predefined flags that allow you to do diverse operations on your data. You can find a precise list of there predefined flags if you look at the manual of MPI_Reduce. This list is reproduced below :

OperatorOperation
MPI_MAXmaximum value
MPI_MINminimum value
MPI_SUMsum
MPI_PRODproduct
MPI_LANDlogical and
MPI_BANDbit-wise and
MPI_LORlogical or
MPI_BORbit-wise or
MPI_LXORlogical xor
MPI_BXORbit-wise xor
MPI_MAXLOCmax value and location
MPI_MINLOCmin value and location

For the following, we will only concentrate on the first four. The logical/bitwise operators are pretty obvious and should be used in a straightforward manner. As for the last two, they are really useful but require some datatype we have not seen (and won't see), so we will not cover these. You can find more information and examples on MPI_MAXLOC and MPI_MINLOC in the manual of MPI_Reduce.

Operations on arrays

Now, everything should be pretty obvious when each process is dealing with a single value as we have shown on the example. But what happens when processes have buffers ? How does the sum behaves for instance ? Will it give us a single scalar at the end, or will sum the various buffers together.

The answer is : the operations are computed element-wise. So if you apply MPI_MAX, MPI_MIN or MPI_SUM on a buffer with five elements in a five-processes program, you will get five values as result :

Processbuf[0]buf[1]buf[2]buf[3]buf[4]
04.012.0-1.07.2-23.0
10.00.00.00.00.0
2-1.02.0-3.04.0-5.0
37.3-5.0-12.0-3.21.23
4-1.0-1.0-1.0-1.0-1.0
MPI_MIN-1.0-5.0-12.0-3.2-23.0
MPI_MAX7.312.00.07.21.23
MPI_SUM9.38.0-17.07.0-27.77

Reducing on all processes

There are multiple flavours of reduction. The example above shows us MPI_Reduce in which the reduction operation takes place on only one process (in this case process 0). In our case, the reception buffer (result) is only valid for process 0. The other processes will not have a valid value stored in result. Sometimes, you might want to have the result of the reduction stored on all processes, in which case MPI_Reduce is not suited. In such a case, you can use MPI_Allreduce to store the result on every process. So, if we had used MPI_Allreduce instead of MPI_Reduce in the example, all processes would have a valid value in result and could be using this value after the communication.

These two are not the only ways of doing reductions. You can find more information by looking at the MPI standard and the MPI implementation APIs.

Buffer matters

You might wonder why, in the example, we bother using a different variable for storing the intermediate value, and the final result. After all, we could directly use :

MPI_Reduce(&tmp, &tmp, 1, MPI_FLOAT, MPI_SUM, 0, MPI_COMM_WORLD);

This will invariably produce garbage in your result buffer. Do NOT use the same buffer for sending and receiving in a reduction. It is possible in some cases to use the same buffer, but to avoid bugs and improve the semantics and readability of your code, it is strongly suggested you use different buffers for sending and receiving data.

Let's try to apply this in two exercises now !

Open Source Your Knowledge: become a Contributor and help others learn. Create New Content