Open Source Your Knowledge, Become a Contributor

Technology knowledge has to be shared and made accessible for free. Join the movement.

Create Content

Race conditions

Looking back at the previous exercise you could wonder what forces us to call MPI_Wait or MPI_Test. After all, if we are sure the process has been idling long enough, the receiving buffer should be filled with the data eventually. Not quite in fact. The problem lies between what the standard does in theory, and how it is implemented in practice. To convince you, here is a very simple example. Try to guess what the result should be in theory, then click the run button to see what happens (No need to modify anything from the source code, just watch the behaviour).

No waiting, no testing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#include <iostream>
#include <unistd.h>
#include <mpi.h>
int main(int argc, char **argv) {
MPI_Init(&argc, &argv);
MPI_Request request;
MPI_Status status;
int request_complete = 0;
int rank, size;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
constexpr int buffer_count = 10;
int buffer[buffer_count];
// Rank 0 sends, rank 1 receives
if (rank == 0) {
// Filling the buffer
std::cout << "Process 0 is sending : ";
for (int i=0; i < buffer_count; ++i) {
buffer[i] = i*i;
std::cout << buffer[i] << " ";
}
std::cout << std::endl;
// Sending the data and waiting for 5 seconds
MPI_Isend(buffer, buffer_count, MPI_INT, 1, 0, MPI_COMM_WORLD, &request);
sleep(5);
}
else {
// Resetting the buffer
for (int i=0; i < buffer_count; ++i)
buffer[i] = 0;
// Receiving and sleeping for 5 seconds
MPI_Irecv(buffer, buffer_count, MPI_INT, 0, 0, MPI_COMM_WORLD, &request);
sleep(5);
// Printing the buffer received by process 1
int ite = 0;
std::cout << "Process 1 received : ";
for (int i=0; i < buffer_count; ++i)
std::cout << buffer[i] << " ";
std::cout << std::endl;
}
MPI_Finalize();
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

As you can see, the result is not quite expected. Why does that happen ? Well it is mainly an implementation question, and the technicallity of it might be too complicated for this course, so let's just state things in a simple way : you HAVE to call MPI_Wait at some point, or MPI_Test until the request is processed. Such a limitation might be annoying, but that's because we have not introduced the biggest of all plagues in parallel computing : race conditions.

A race condition is when two process try to access some data at the same time. For instance (cf figure below), if I have two processes P0 and P1. P0 sends a non-blocking message to P1 then starts working again on the same buffer it has just sent. P1 only waits for the reception. Well, if the request is not processed before P1 has actually received the data, it is possible that P0 has already modified its buffer and the data received by P1 might be invalid. The name race condition is now really clear : Both processing are racing for the data. If P0 is faster then the data P1 will receive might not be what you wanted to send it. On the other hand, if P1 is faster, then the data received will be correct. However, since both process are running in parallel, you have absolutely no guarantee on the outcome. This is a very serious problem in every parallel program. In MPI we are spared most of the race conditions problems (compared to shared-memory parallelism where these problems are omnipresent), but non-blocking communications are the exception to this.

Race conditions

One additional thing that is pictured on the image is that it does not matter if your MPI_Irecv is actually called after MPI_Isend, the request will be processed internally so even if both of your processes are ready and willing to transmit the data, you never choose when the non-blockin communication is going to happen. Also note that what is depicted is really schematical to help you understand what is happening in the worst case.

It is imperative to understand the semantic behind non-blocking communications in MPI. A non-blocking communication is an agreement both of your process are making, to avoid reading or writing the communication buffers before the request is fulfilled. The only way to make sure the request has been completed is to use MPI_Wait or MPI_Test. Only after can your processes use the communication buffers. Hence, a program using non-blocking sends or receives will always have to call MPI_Wait eventually before reading or writing to the comm-buffer.

That also encourages your processes to separate the communication buffers from the data structures of your program.

Take-away point :

MPI_Isend and MPI_Irecv MUST be followed at some point by MPI_Test and MPI_Wait. The process sending should never write in the buffer until the request has been completed. On the other hand, the process receiving should never read in the buffer before the request has been completed. And the only way to know if a request is completed, is to call MPI_Wait and MPI_Test.

Open Source Your Knowledge: become a Contributor and help others learn. Create New Content