Open Source Your Knowledge, Become a Contributor

Technology knowledge has to be shared and made accessible for free. Join the movement.

Create Content
Previous: Race conditions Next: Probing - exercise

Probing incoming communications

For very simple examples as we have seen previously, with very few processes and small communications, everything looks pretty simple and straightforward. On the other hand, in real-life applications, the amount of data communicated can be really big, and in such a case, optimising the size of the messages sent have a real influence on the performance of the system. As a general rules of thumb, there are two points you should always obey when it comes to MPI communications :

  • Try to group as many data as possible in one communication. Sending N communications of M bytes will always be more costly than one communication of N×M bytes.
  • Try to send the exact amount of data you are storing in your buffer and no more.

This first point will be detailed later and is mainly subject to communication overhead/latency. The second point might seem obvious, but it can happen sometimes that before a transfer, the receiving process has actually no idea of the quantity of data that is being transferred to it. In such a case, there are three solutions :

  • Sending everything in two communications : first sending the quantity of data you are going to send (1 int), then another communication with the actual buffer of N elements. This tends to go against rule #1.
  • Sending a nominal amount of data : always sending the maximum amount in the buffer. That goes against rule #2.
  • Probing the message. Basically asking MPI to give you the size of the message.

Probing is actually very useful, it serves many purposes such as getting the count of elements we are about to receive, the id and tags of the processes we are receiving from or if we are actually receiving anything or not.

There are two functions used to probe : MPI_Probe and MPI_IProbe. As you can guess by now, the first one is a blocking call, while the second one is not. Now MPI_Probe is only going to give you the MPI_Status associated with the next message received corresponding to a certain tag and id. As before, it is possible to use MPI_ANY_SOURCE and MPI_ANY_TAG if you want to probe the reception of any kind of message or from any source. Then, it is possible to use the resulting MPI_Status object in conjunction with other functions to get more information. Let's see a quick example of how that works :

Probing example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
#include <iostream>
#include <cstdlib>
#include <mpi.h>
int main(int argc, char **argv) {
MPI_Init(&argc, &argv);
int size, rank;
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0) {
// Process 0 is sending a random number (between 10 and 25) of integers to process 1
int n_items = rand() % 16 + 10; // BAD way of doing random.
std::cout << "Process 0, random count gives us " << n_items << " ints to send." << std::endl;
// Allocation and initialisation of the buffer
int *send_buf = new int[n_items];
for (int i=0; i < n_items; ++i)
send_buf[i] = i*i;
std::cout << "Process 0, sending : ";
for (int i=0; i < n_items; ++i)
std::cout << send_buf[i] << " ";
std::cout << std::endl;
// Blocking send
MPI_Send(send_buf, n_items, MPI_INT, 1, 0, MPI_COMM_WORLD);
// Deallocation
delete [] send_buf;
}
else {
// Probing the reception of messages
MPI_Status status;
MPI_Probe(0, 0, MPI_COMM_WORLD, &status);
// From the probed status we get the number of elements to receive
int n_items;
MPI_Get_count(&status, MPI_INT, &n_items);
std::cout << "Process 1, probing tells us message will have " << n_items << " ints." << std::endl;
// Allocating and receiving
int *recv_buf = new int[n_items];
MPI_Recv(recv_buf, n_items, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
std::cout << "Process 1, buffer received : ";
for (int i=0; i < n_items; ++i)
std::cout << recv_buf[i] << " ";
std::cout << std::endl;
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

In the example, you can see that we probe the arrival of a message on process 1. The probing is blocking so process 1 is going to wait for a message from process 0. As soon as the message arrives, the execution resumes. But unlike a blocking receive, no information is actually transferred to the process. The probing only informs that the process is ready to receive a communication. Then, we use MPI_Get_count on the received status to retrieve the information we want : the number of elements in the communication. This way, we can allocate only the right number of elements and transfer the exact amount.

Note that, in terms of performance, this example is really a toy problem. In a real situation it would not be wise to reallocate the communication buffer for each call. Although every code should be taylored to the situations they are handling, one good rule to avoid superfluous overhead is to allocate once one or two buffers for communication (typically, either one universal buffer, or one for sending and one for receiving). This way, allocation is done only once, but communication can be probed to avoid sending too much information.

Memory layout warning

If you choose the last solution you have to be extremely careful about the layout in memory of your buffers. Consider for instance, your program is handling and communicating 3-vectors. If you allocate a buffer in C++ the layout you choose might be bug-prone. Consider the two following codes :

Version 1

float vectors[100][3]; // A 100 vector table

int n_to_send = 10;
fill_in_vectors(vectors, n_to_send);

MPI_Send(vectors, n_to_send*3, MPI_FLOAT, 1, 0, MPI_COMM_WORLD);

Version 2

float vectors[3][100]; // Same but inverted layout

int n_to_send = 10;
fill_in_vectors(vectors, n_to_send);

MPI_Send(vectors, n_to_send*3, MPI_FLOAT, 1, 0, MPI_COMM_WORLD);

Take a few minutes and try to undertand what will happen in both cases.

Which code is actually sending 10 vectors to process 1 ?

Funny trivia : If you are using Fortran, then the answer to this question is actually the opposite !

Open Source Your Knowledge: become a Contributor and help others learn. Create New Content