TinyURL widget - shorten your URL's for free!

Enter a long URL to make tiny:

Friday, December 16, 2016

OpenMPI working nameserver publish / lookup example

This blog post entails a simple client / server example using OpenMPI, the opmi-server, and simple commands to publish a named server, lookup the server using a client, then connect and transceive data between the server and client.

If you need a refresher on OpenMPI first then this is a good start


The Gist is located here. 


I got this working on:

 uname -a
Linux hellion 3.19.0-77-generic #85-Ubuntu SMP Fri Dec 2 03:43:54 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

mpicc --version
gcc (Ubuntu 4.9.2-10ubuntu13) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
This post assumes you've got the OpenMPI installed completely and working properly.


This builds on pseudo-examples left here and mainly here: http://www.mcs.anl.gov/research/projects/mpi/mpi-standard/mpi-report-2.0/node106.htm#Node108

You compile the client and server:

The client is here

https://gist.github.com/DaemonDave/21fea476847d94326ec6c9664c15fb87#file-name-client-c 




 mpicc -o client name-client.c
 The server is here

https://gist.github.com/DaemonDave/21fea476847d94326ec6c9664c15fb87#file-name-server-c

mpicc -o server  name-server.c


You run the ompi-server first, to establish the name server. But to do this you run mpirun with ompi-server and not just run it.

*!
 \brief How to execute mpi name server
 mpirun -np 1  ompi-server --no-daemonize -r + &
 *
 * Success looks like this :
 *
    "server available at 3653042176.0;tcp://192.168.10.191:52434+3653042177.0;tcp://192.168.10.191:48880:300"

*/


Once the ompi-server is running, you can refer to it by file, but you can also refer to it by it's pid if it's local. To get things running with the minimum of overlapping errors, I start with simple local setups.

You find the pid of the ompi-server:

ps -ef | grep ompi     

1634  1458  0 Dec15 ?        00:03:41 compiz  dave     

16050  1954  0 10:26 pts/18   00:00:00 mpirun -n 1 ompi-server --no-daemonize -r + dave    

16051 16050  0 10:26 pts/18   00:00:00 ompi-server --no-daemonize -r + dave

You run the server like this:
mpirun -np 1  --ompi-server pid:16050  ./server
It looks successful like this:
server available at 3500802048.0;tcp://192.168.10.191:50129+3500802049.0;tcp://192.168.10.191:53693:300

Now you mpirun the client and it looks like this:
  mpirun -np 1 --ompi-server pid:14341 ./client
looking up  server ...
The server  responds like this:

 we got a client's data: 25.500000
 we got a client's data: 26.500000
 we got a client's data: 27.500000
^Cmpirun: killing job...

You can mpirun the client and server from a config file as well instead of informing of the ompi-server location by PID like this:

 mpirun -np 1 --ompi-server file:./nameserver.cfg ./server
server available at 2886402048.0;tcp://192.168.10.191:51344+2886402049.0;tcp://192.168.10.191:54857:300



2 comments:

  1. I am trying to run this example, and the client and server both crash. After the server crashes I see the following output: https://imgur.com/a/ij2lv

    I am not sure where ORTE_ERROR_LOG can be found. Or what the "Data unpack" error message means.

    Any thoughts on what is going on here?

    1. I launch ompi-server:
    /usr/bin/ompi-server --no-daemonize -r mpiuri

    2. I launch server code:
    /usr/bin/mpirun -np 1 --ompi-server file:mpiuri ./out_name_server

    3. I launch client code:
    /usr/bin/mpirun -np 1 --ompi-server file:mpiuri ./out_name_client

    All 3 commands are issued in separate terminal windows, in the order indicated above. After launching the client application, I see the server application crashes as well as the client application.

    Any ideas on what is going on? After the server crashes, there is a message that says "An error occurred in MPI_Comm_connect"

    Any help or thoughts are much appreciated! I am lost!
    Thanks,
    Matt Overlin

    ReplyDelete
  2. Hi Matt;

    Well, to be clear, I am not an MPI expert per se. I am just learning just like you, you're just a few lessons behind me.

    Looking at the data errors the data overrun at dpm.c that to my experience that probably means you have a library mismatch in the software build or a library it is depending on. Without a full compile output it's hard to guess. But guess I will.

    So it compiled OK and it started OK, so why would that be?
    It found all the right libraries, and it found the right headers so the compiler knew all the right symbols to link together correctly. Then it crashed - MPI_ERROR_UNKNOWN - which means in general ( as a coder) that this is an unforeseen case that the dev didn't spend a lot of time on.

    So how could that be?

    Either you have a bug in your code that is causing a segfault because it is sending the wrong data (most often a NULL pointer) or the symbols are connected and there is a data struct or function call difference that exceeds the expected version. Makes sense?

    So what else could go wrong?

    Well, when all the variable declarations are correct then take your code out and test it in isolation. That eliminates you as the problem.

    Then try running the mpi processes without your code. That isolates mpi as the problem. If it doesn't work here you have two options: the configuration is wrong or your libraries are wrong. Another source of segfaults is uninitialized data, mine works exactly as built. If you added anything else then that's also suspect.

    Are you running on a Xen or another virtual box? Can that system allow what MPI expects? You haven't excluded other error sources.

    Since I don't know if you cross-compiled, canadian cross-compiled, are running it on an arduino, or whatever your system to start from it, that's the best I can offer I'm afraid.

    What I recommend is start by isolating one component and prove it works exclusive of the others. Then add one more. Repeat until happy.

    Lots of my time is spend debugging.
    Best of Luck!

    ReplyDelete