I am learning MPI, and just encountered this problem. The answer suggested
#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv) {
        int rank;
        int buf;
        const int root=0;
        MPI_Init(&argc, &argv);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
        if(rank == root) {
           buf = 777;
        }
        printf("[%d]: Before Bcast, buf is %d\n", rank, buf);
        /* everyone calls bcast, data is taken from root and ends up in everyone's buf */
        MPI_Bcast(&buf, 1, MPI_INT, root, MPI_COMM_WORLD);
        printf("[%d]: After Bcast, buf is %d\n", rank, buf);
        MPI_Finalize();
        return 0;
}
However when I compile it (assume it is saved in main.cpp) with mpic++ -O3 -Wall -Wextra -std=c++11 -march=native -fopenmp -Wpedantic -DOMPI_SKIP_MPICXX main.cpp -o main and launch with mpirun -np 3 main it got stuck.
However when I launch with mpirun -np 2 main it works fine and can finish. I found that deadlock would happen for > 2 processors.
I wonder what might be the issue. I am using Open MPI 4.1.5 and I have Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz, which I believe have 12 processors (by checking cat /proc/cpuinfo).
 
    