Rob Krten's Web Page

Synchronous Message Passing IPC

The main concept of resource managers is that they use a fixed set of well-defined messages. The messages are implemented on top of synchronous message-passing inter-process communications. Certain system functions, such as device drivers and filesystems, fit very well within the resource manager framework, and should definitely be left as resource managers (or implemented as bona-fide device drivers in other operating systems).

It's the large number of "devices" and servers that fit into the gray area that bear discussing. The example I'll use here is a caller ID server (CLID), which needs to integrate with a reverse directory number (DN) telephone database lookup server, and several clients (some to generate CLID events, others to get them, etc).

Neither of these servers fits the concepts of a "driver" in the traditional sense. But they do fit into the client/server model perfectly.

Standard practice would dictate that the CLID server and the reverse DN server be implemented as resource managers, and that's exactly what I did. When I ported them to FreeBSD, though, I was faced with a decision. How would I port it? After asking some knowledgeable people on the net, I was told to use "UNIX domain sockets", or "RPC", or "TCP/IP sockets", etc.

In this article, I'm going to present the original client and server code (would you believe QNX 4 code? it was never ported to Neutrino, but the concepts are similar enough) and the FreeBSD versions using TCP/IP sockets. Why TCP/IP sockets? Because, the Internet is everywhere. Why not make something that's distributable to a heteroendian server that's on a machine halfway around the world, and have it "just work" — like it was on your desktop?

The Caller ID Server

The heart of the QNX 4 caller ID server is name registration, and the main message processing loop. Under Neutrino, the name registration is done radically differently (via resmgr_attach, but you'll get the idea).

Under QNX 4, you could register a "name" in a non-filesystem visible location (which is what the CLID server does), or you could register the name right in the filesystem. The latter was a little bit more difficult. Under Neutrino, you could easily register your name under a filesystem visible location — that's what resmgr_attach did.

Server Name Registration

QNX 4 name registration looks like this:

void
registerName ()
{
  if ((nameID = qnx_name_attach (0, optn)) == -1) {
    fprintf (stderr, "%s:  can't register \"%s\", errno %d\n",
             progname, optn, errno);
    perror (NULL);
    exit (1);
  }
}

The optn variable is the name to register, the default is /PARSE/CLID. This registers the name in the non-filesystem-visible location.

Under FreeBSD, though, we need to register in a different manner. The purpose of registration is so that unrelated clients can find the server.

I didn't like the RPC mechanism, because it relied on yet-another-server to handle name resolution (the whole portmapper concept). I figured for this level of "small utility" I could just get away with storing the name myself. So, I chose to add two entries in /etc/services on my machines (one is for the CLID server, the other is for the reverse DN server):

dbphone_server  50000/tcp       # Database lookup for phoneDB
clid_server     50001/tcp       # CLID server

If these ports conflict with what is already in use in your system, change them. Unless you want to go to the trouble of creating some kind of "port registration" service, or dealing with portmapper, that's the level that you need to deal with.

Therefore, the code for registerName under FreeBSD is as follows:

void
registerName (void)
{
  struct servent      *sp;
  struct sockaddr_in  saddr;
  // 1) find service name
  sp = getservbyname ("clid_server", "tcp");
  if (sp == NULL) {
    fprintf (stderr, "%s: unknown service\n", progname);
    exit (EXIT_FAILURE);
  }
  memset (&saddr, 0, sizeof (saddr));
  saddr.sin_family = AF_INET;
  saddr.sin_addr.s_addr = htonl (INADDR_ANY);
  saddr.sin_port = sp -> s_port;
  // 2) create a socket
  socket_fd = socket (AF_INET, SOCK_STREAM, 0);
  if (socket_fd == -1) {
    fprintf (stderr, "%s:  can't socket, errno %d (%s)\n",
             progname, errno, strerror (errno));
    exit (EXIT_FAILURE);
  }
  // 3) bind the socket to the port
  if (bind (socket_fd, (struct sockaddr *) &saddr,
      sizeof (saddr)) < 0) {
        fprintf (stderr, "%s:  can't bind, errno %d (%s)\n",
                 progname, errno, strerror (errno));
    exit (EXIT_FAILURE);
  }
  // 4) tell socket manager we're ready to handle requests
  listen (socket_fd, 5);
}

As you can see, the steps, while "foreign" to the concept of QNX 4 and Neutrino's name registration, are fairly straightforward:

The getservbyname function converts a service name (in this case clid_server) and a type of service (in this case TCP) to a port number (amongst other things).
Next, we create a socket. A socket is basically a rendezvous point — a place to listen for messages. The socket we've created doesn't yet have an address or a port associated with it.
Then we bind the socket to the address and the port. Note that just above step 2 we specified an address of INADDR_ANY, which means that we don't care who connects to this socket, and we specified a port number which was the return value from the getservbyname function.
Finally, we call listen to tell the socket manager that we're ready to handle requests on the socket given by socket_fd, and that we can handle 5 concurrent connections. Why 5? Dunno, it's just a number. You'll see later that we don't do the traditional fork for servers...

Server Main Loop

Under QNX 4, the main loop is accomplished as follows:

void
serve ()
{
  for (;;) {
    replyTID = Receive (0, &CLID_Msg,
               sizeof (CLID_ServerIPC));
    switch (CLID_Msg.type) {
    case _SYSMSG:
      break;
    case 0x8001:
      shouldReply = 1;
      if (CLID_Msg.func >= NSupported) {
        CLID_Msg.func = 0;    /* errorRoutine */
      }
      CLID_Msg.rval = (*serverJT [CLID_Msg.func]) ();
      if (shouldReply) {
        Reply (replyTID, &CLID_Msg,
               sizeof (CLID_ServerIPC));
      }
      break;
    }
  }
}

The key concepts are that the QNX 4 server is listening for messages from anyone (the 0 parameter passed to Receive indicates this). The Receive call blocks until a message arrives, and then the data for the message is copied into CLID_Msg which is then used in the server. When Receive unblocks, it tells us who the message is from via its return code (we stash that into replyTID).

Under Neutrino, this process is almost identical, except that an extra step is required: we need to create a channel (which is in place to allow multiple threads in Neutrino to coordinate which rendezvous point they use — although most resource managers only ever have the one channel, in reality). Under Neutrino, Receive is replaced with MsgReceive, and it is passed the channel number. Apart from that, identical.

However, under FreeBSD with TCP/IP sockets, it's a little bit trickier:

void
serve (void)
{
  int   client_fd;
  int   n;
  fd_set  master_fds;
  fd_set  read_fds;
  int   num_fds;
  // 1) initialize the select FDs
  FD_ZERO (&master_fds);
  FD_SET (socket_fd, &master_fds);
  // 2) limit the search length
  num_fds = socket_fd + 1;
  // 3) enter the main server loop
  for (;;) {
    // 4) refresh the select FD mask, and wait
    memcpy (&read_fds, &master_fds, sizeof (read_fds));
    select (num_fds, &read_fds, NULL, NULL, NULL);
    // 5) something happend, was it the socket?
    if (FD_ISSET (socket_fd, &read_fds)) {
      // 6) it's ready for an accept() operation
      if ((client_fd = accept (socket_fd, 0, 0)) == -1) {
        fprintf (stderr, "%s:  accept failed, errno %d (%s)\n",
             progname, errno, strerror (errno));
        exit (EXIT_FAILURE);
      }
      // 7) add this client FD into the set
      FD_SET (client_fd, &master_fds);
      if (client_fd >= num_fds) {
        num_fds = client_fd + 1;
      }
    // 8) it wasn't the socket, must have been a client
    } else {
      // 9) process all clients
      for (replyTID = 0; replyTID < num_fds; replyTID++) {
        if (FD_ISSET (replyTID, &read_fds)) {
          // 10) read data from the client
          if ((n = read (replyTID, &CLID_Msg,
                   sizeof (CLID_ServerIPC))) == -1) {
            fprintf (stderr, "%s:  can't read fd %d errno %d\n",
                 progname, replyTID, errno);
            exit (EXIT_FAILURE);
          }
          // 11) handle a client termination here
          if (n == 0) {
            FD_CLR (replyTID, &master_fds);
            close (replyTID);
            cancelCopy ();
            // recalculate highest fd in use
            num_fds = 0;
            for (n = 0; n < FD_SETSIZE; n++) {
              if (FD_ISSET (n, &master_fds)) {
                num_fds = n + 1;
              }
            }
          // 12) handle a normal client message here
          } else {
            shouldReply = 1;
            if (CLID_Msg.func >= NSupported) {
              CLID_Msg.func = 0; // errorRoutine
            }
            CLID_Msg.rval =
              (*serverJT [CLID_Msg.func]) ();
            if (shouldReply) {
              // 13) "Reply" to the client
              write (replyTID, &CLID_Msg,
                   sizeof (CLID_ServerIPC));
            }
          }
        }
      }
    }
  }
}

Let's take a look at the steps:

We keep two sets of select FD sets — one set is the master set, which indicates who all we want to hear from, and the second set is the result of calling select — it tells us who's actually ready to be read from.
I keep track of the highest FD that we've used for performance reasons.
Just like in QNX 4, we have a main server loop that never exits.
Here we copy the master FD set into the "tell me who's ready to be read from" FD set, and call select. select will return when it's detected that at least one of the file descriptors in the read_fds set is ready to be read (or, in the case of the socket itself, indicates if a new client is waiting on the socket and we can call accept to handle the new client).
Determine who caused select to unblock.
It was the socket, and we should now go ahead and accept the connection from the new client.
Since we now have an active client, we need to update the master list of FDs that need to be checked.
Here we're dealing with a client, not the socket.
Walk through all the clients...
We use read here instead of QNX 4's Receive or Neutrino's MsgReceive to perform the "message transfer" from the client to the server.
This is an interesting wrinkle. When read returns zero bytes, it means that the client has shut down the connection. We didn't have this in QNX 4, (we did it a different way), so I added the same code from a different section of the QNX 4 code into this case. From the point of view of our server, we need to remove this particular FD from the list of FDs that we select on, and close the file descriptor. For efficiency, we recalculate the highest "in use" file descriptor at this time.
This is effectively the same code as what was present under QNX 4. We removed some of the additional "message type" checks.
This step completes the "message passing" interaction with the client. If the shouldReply flag is set, we "reply" to the client by doing a write of the data.

On further reflection, it might be more efficient to walk through all the FDs and pick out the socket FD and handle it specially, rather than look for the socket FD and then walk through the rest of the FDs — we're going to go right by the socket FD anyway.

select vs fork

Traditionally, socket servers are done with a fork such that when a new connection is accept'ed a fork/i>()'ed process runs with it. I felt this was too heavy handed, and introduced the potential for synchronization issues (locking of shared resources, etc.) so I decided not to do that. Thus, the only way (short of polling!) to find out which client was ready to talk with the server was to use our friend select.

That said, using fork has its place. What it buys you is that you can forget about the read and write blocking — they're in their own thread of execution, which is dedicated to that client, and that client only.

The Caller ID Client Library

On the other side of the pond, we have the client to consider.

The QNX 4 caller ID client library consists of something to attach to the server, and then various API functions that do the IPC.

To attach to the server:

int
CLID_Attach (char *serverName)
{
    if (serverName == NULL) {
        sprintf (CLID_serverName, "/PARSE/CLID");
    } else {
        strcpy (CLID_serverName, serverName);
    }
    CLID_TID = qnx_name_locate (0, CLID_serverName,
                    sizeof (CLID_ServerIPC), NULL);
    if (CLID_TID != -1) {
        CLID_IPC (CLID_MsgAttach);
        return (CLID_Ok);
    }
    return (CLID_NoServer);
}

Nothing special here (except the defaulting of the server name and storing it in a global variable). The "heart" of the "finding the server" operation is the QNX 4 call qnx_name_locate which takes some flags and the name of the thing to find (in this case, it finds it in the non-filesystem-visible namespace, and the object is called /PARSE/CLID). Under Neutrino, a standard open is used.

Now, under FreeBSD, to attach to the server, the following slightly more complicated code is used:

int
CLID_Attach (char *serverName)
{
  char  *host, *port;
  if (serverName == NULL) {
    CLID_TID = tcpip_connect_host_port ("localhost",
               "clid_server");
    strcpy (CLID_serverName, "localhost:clid_server");
  } else {
    strcpy (CLID_serverName, serverName);// scratch
    if (*CLID_serverName == ':') {
      host = "localhost";
      port = strtok (CLID_serverName, ":");
    } else {
      host = strtok (CLID_serverName, ":");
      port = strtok (NULL, ":");
    }
    if (!port) {
      port = "clid_server";
    }
    CLID_TID = tcpip_connect_host_port (host, port);
    strcpy (CLID_serverName, serverName);// restore
    strcat (CLID_serverName, ":");
    strcat (CLID_serverName, port);
  }
  if (CLID_TID > TCPIP_FAIL) {
    CLID_IPC (CLID_MsgAttach);
    return (CLID_Ok);
  }
  return (CLID_NoServer);
}

The code is fairly straighforward, with a few notes:

I've used the notation machinename:portname to identify the server. This means I can connect to a server anywhere in the world. Sweet!
The real work of "attaching to a server anywhere in the world" is done by the tcpip_connect_host_port function, which is in a library, and is defined below. The rationale for putting it into a library is the same as Neutrino's open — it's a well-defined system service that connects to something.

Finally, to do the actual IPC (shown here with a typical blocking call, one that fetches the version number from the server):

int
CLID_GetVersionNumber (char *versionString)
{
    checkAttach ();
    CLID_IPC (CLID_MsgGetVersionNumber);
    if (CLID_IPCData.rval == CLID_Ok) {
        strncpy (versionString, CLID_IPCData.rawData,
                 MaxVersion);
    }
    return (CLID_IPCData.rval);
}
void
CLID_IPC (IPCMessage)
int     IPCMessage;
{
    if (CLID_TID == 0 || CLID_TID == -1) {
        CLID_IPCData.rval = CLID_NoServer;
        return;
    }
    CLID_IPCData.func = IPCMessage;
    CLID_IPCData.type = 0x8001;
    CLID_IPCData.subtype = 0;
    if (Send (CLID_TID, &CLID_IPCData,
              &CLID_IPCData, sizeof (CLID_IPCData),
              sizeof (CLID_IPCData))) {
        CLID_IPCData.rval = CLID_IPCError;
        return;
    }
}

The CLID_GetVersionNumber does a bit of housekeeping (via checkAttach to see if it has a connection) and then calls CLID_IPC which is the heart of the IPC service.

CLID_IPC does a bit of error checking itself, marshals the data into the appropriate elements of the data structure, and then calls the QNX 4 operating system primitive Send to transfer the data from the client to the server and back.

Under FreeBSD, the code is very similar:

void
CLID_IPC (int IPCMessage)
{
    if (CLID_TID == 0 || CLID_TID == -1) {
        CLID_IPCData.rval = CLID_NoServer;
        return;
    }
    CLID_IPCData.func = IPCMessage;
    CLID_IPCData.type = 0x8001;
    CLID_IPCData.subtype = 0;
    if (write (CLID_TID, &CLID_IPCData,
               sizeof (CLID_IPCData))
        != sizeof (CLID_IPCData)) {
        CLID_IPCData.rval = CLID_IPCError;
        return;
    }
    if (read (CLID_TID, &CLID_IPCData,
              sizeof (CLID_IPCData)) < 0) {
        CLID_IPCData.rval = CLID_IPCError;
        return;
    }
}

(Note that the CLID_GetVersionNumber function is identical in both operating systems.)

We perform some sanity checking, marshal the data, and then perform the "message passing IPC" by calling write to get the data from the client to the server, and then calling read to not only get the data from the server back to the client, but also to block the client until data is ready. This final point performs the "synchronous" part of our synchronous IPC model.

You'll note that this code (foolishly) assumes that the machines are the same endianness. Some habits are hard to break. Judicious use of hton and friends will fix that right up.

The Library

I put tcpip_connect_host_port into its own library (because I use it for other things than Caller ID or synchronous IPC simulation — I use it for "normal" TCP/IP communications, like NNTP transport access).

Here's the code:

int
tcpip_connect_host_port (char *host, char *port)
{
  int                 sock;
  u_long            **addrlist;
  struct sockaddr_in  sadr;
  struct servent     *service_entry;
  // 1) figure out the address
  if ((addrlist = name_to_address (host))
    == (u_long **)NULL) {
    return(TCPIP_NOHOST);
  }
  /* Only internet for now */
  sadr.sin_family = (u_short)AF_INET;
  if (!port) {
    return (TCPIP_NOPORT);
  }
  // 2) figure out the port
  if (isdigit (*port)) {
    sadr.sin_port = htons (atoi (port));
  } else {
    // lookup port
    service_entry = getservbyname (port, "tcp");
    if (!service_entry) {
      return (TCPIP_NOPORT);
    }
    sadr.sin_port = service_entry -> s_port;
  }
  // 3) try and create a socket to the address/port
  for(; *addrlist != (u_long *)NULL; addrlist++) {
    bcopy((caddr_t)*addrlist, (caddr_t)&sadr.sin_addr,
          sizeof(sadr.sin_addr));
    if ((sock = socket(AF_INET, SOCK_STREAM,
         IPPROTO_TCP)) < 0) {
      return(TCPIP_FAIL);
    }
    if (connect(sock, (struct sockaddr *)&sadr,
        sizeof(sadr)) < 0) {
      int e_save = errno;
      fprintf(stderr, "%s: %s:%s [%s:%d]: errno %d\n",
          progname, host, port,
          inet_ntoa(sadr.sin_addr),
          ntohs (sadr.sin_port), errno);
      close(sock);  /* dump descriptor */
      errno = e_save;
    } else {
      return(sock);
    }
  }
  return(TCPIP_FAIL);
}

Fairly basic TCP/IP code; the main trick is in steps one and two where we convert the hostname into a 32-bit IPv4 address and a port number.

And name_to_address is in the same library as well:

static u_long **
name_to_address(char *host)
{
    static  u_long  *host_addresses[2];
    static  u_long  haddr;
    if (host == (char *)NULL) {
        return((u_long **)NULL);
    }
    host_addresses[0] = &haddr;
    host_addresses[1] = (u_long *)NULL;
    // Check if an ASCII internet address
    if (*host == '[' || isdigit(*host)) {
        char    namebuf[128];
        register char   *cp = namebuf;
        // strip brackets [] or anything else we don't want.
        while(*host != '\0' && cp < &namebuf[sizeof(namebuf)]) {
            if (isdigit(*host) || *host == '.') {
                *cp++ = *host++;    /* copy */
            } else {
                host++;         /* skip */
            }
        }
        *cp = '\0';
        haddr = inet_addr(namebuf);
        return(&host_addresses[0]);
    } else {
        struct hostent  *hstp = gethostbyname(host);
        if (hstp == NULL) {
            return((u_long **)NULL);    /* no such host */
        }
        return((u_long **)hstp->h_addr_list);
}

This code gets one or more IPv4 addresses for a given host name and returns the list.