Device Drivers and Real-Time Systems

This article originally appeared in Doctor Dobb's Journal, October 1998

In this article, I'm going to examine two radically different device drivers, and their implementation under the QNX 4 realtime operating system. The focus of this article is to illustrate realtime device driver issues that come up in the real world.

When writing drivers for realtime operating systems, such as QNX 4, you will encounter a number of challenges. We'll examine two of these challenges; latencies, and timing accuracies.

Latencies

What makes a realtime operating system "real time" is its ability to respond to events in a deterministic (and hopefully fast!) manner. The amount of time this response takes is called the latency. There are several types of latencies. In this article, we'll be looking at interrupt latency (the amount of time that elapses from the hardware raising an interrupt, to the execution of the first instruction of the ISR), and scheduling latency (the amount of time that elapses from a particular process being made ready to execute, and the execution of that process). While both latencies are important in realtime systems, the crucial one is the interrupt latency. This is because the source of the interrupt, the hardware, usually has no buffering. If you miss the interrupt, the data is gone (e.g. a network card getting data from the network). If the scheduling latency is long, this can, to a degree, be compensated for in the ISR. In this case, however, the ISR is more complex, because it has to effectively buffer up the data. In extreme cases, the ISR can get very complex, because it may have to respond to the ultimate source of the interrupt (e.g., a network card ISR may need to send back acknowledgements within a short amount of time, otherwise the other end will time out. This means that the ISR must be intimately aware of the protocol, and perhaps have access to a lot of data structures from the corresponding process). With a short scheduling latency, you can defer processing to the controlling process, rather than doing it in the ISR.

Timing Accuracies

Often, the issue of "how accurate is the timing" is neglected, and comes back to haunt the designer after the initial design has been completed. As we will see in the article, consideration needs to be given to an analysis of the timing requirements right up front, during the design phase.

The Drivers

The first driver communicates with a BSR TW-523 X-10 controller in order to provide access to various X-10 modules that I have around the house. These modules allow you to perform functions such as controlling lights and appliances, by using the existing 110 VAC wiring in your house.

The second driver is for a home-built sound card for my PC, which has a, shall we say, "unique" architecture.

The X-10 Driver

Let's start with the X-10 driver. The BSR TW-523 controller ("X-10 controller") presents a simple interface to the PC — it has four wires: common, TX, RX, and Zero Cross, and also a 110VAC plug. The idea is that whenever the 110VAC changes polarity, the Zero Cross line will change state. Effectively, this presents the 110VAC line via an optically isolated square wave. According to the X-10 protocol, data must be transmitted immediately after a zero crossing of the AC has been detected. This data is transmitted by asserting the TX pin for one millisecond (if transmitting a "one"), or doing nothing (if transmitting a "zero"). When the TX pin is asserted, the X-10 controller generates a 120 kHz carrier on the AC line. Other devices listening on the AC line synchronize their reception of this carrier to the zero crossing -- quite a clever design.

There are two main software challenges in interfacing with this device: the response time required upon detection of a zero crossing, and the accuracy of the 1 millisecond pulse that needs to be generated.

When I built the hardware interface for the controller, I chose to use a standard RS-232 serial port. This was the easiest way I could think of to get interrupts from the zero crossing line. I then tied the DTR line to the TX pin, so that I could raise and lower it via software control. (We'll ignore the TW-523's RX pin in this article).

Now the design decisions. How much work should I do in the ISR versus the process level? This is a common tradeoff — the ISR, while running with the minimal latency after the time that the hardware interrupt was asserted by the hardware, is generally a much more "sensitive" environment. This is due to a number of reasons: ISRs generally have access to all of the I/O ports (on x86 processors) and can wreak havoc with other hardware devices; the amount of time spent inside of the ISR has a direct, negative impact on process scheduling; and finally, since the ISR isn't a real "process", it is generally limited in the number of kernel calls that it can use. On the other hand, deferring processing until "process" time, while avoiding the pitfalls of the ISR, can lead to unacceptable latencies under some operating systems.

So, what to do?

Let's look at both methods.

Doing the Work in the ISR

The actual work that needs to be done in the ISR for this example looks very minimal. After all, when the interrupt hits, we jump into the ISR, look at a circular buffer (containing the data that some client process wants us to transmit), and, if there is a "one" to be sent, we assert DTR. That part is no problem. I'd be amazed if this took more than 5 lines of C. However, once we turn on DTR, we need to be able to turn it off 1 millisecond later. Depending upon what type of operating system you are using, this may range from a few lines of C to a few dozen lines of C — somehow you tell the O/S to schedule a process to run, and the process starts a 1 millisecond timer. When the timer fires, the process deasserts DTR. Under QNX, this is done by returning a non-zero value from the ISR itself. The kernel picks up the ISR's return value, and affects the scheduling queue.

Let's flip over to the other method, and then we'll look at the 1 millisecond issue.

Doing the Work in the process

By doing the work at process time, rather than within the ISR, the only thing that's changed is when/where we do the circular linked list management.

Since this example is interrupt driven, we still need an ISR, and we still need to clear the source of the interrupt (on the serial chip I'm using, this involves two I/O port reads). Then, we need to tell the kernel to schedule a process as a result of the ISR.

Doing the work in the ISR directly is more efficient, because the ISR (having access to the circular buffer) can determine whether or not it needs to tell the kernel to schedule a process. The "process" method requires that the ISR schedule the process every time, since the ISR has no idea of what data should or should not be sent.

So why would I do it in the process? Simple — because I can. It is *MUCH* easier to debug things in the process: I can use source-level debug profiling tools (though my personal favourite is the printf() debugger). Also, in this case, I'm only getting interrupted at 120 Hz. This low interrupt rate is not an issue.

The real issue is, "How long does it take to get there?" Under QNX 4, running on a Pentium 100 MHz processor, it takes 1.8 µs to run the first line of the ISR, and another 4.7 µs after the ISR has exited to run the first line of the "process". These numbers were obtained directly from QNX Software Systems, and are stated as being "typical".

So, can I afford 6.5 µs of delay? Well, where's the 110VAC sine wave, 6.5 µs after the zero crossing? Only 417 mV higher (or lower), or about 0.12% of the range. Probably not significant.

What do you mean by "hard" and "soft"?

I quoted two numbers; the 1.8 µsec ISR latency, and the 4.7 µs "scheduling latency". While both numbers are equal in importance, one number is a little more equal than the other :-) The ISR latency time will ONLY be affected by one of two things: a process or ISR that has interrupts disabled; or a higher (in terms of hardware priority) ISR. Since most realtime architectures (and programmers, for that matter) try to disable interrupts for the smallest possible amounts of time, and run the ISR's for the least amount of time, statistically speaking we should have a good success rate with this 1.8 µs number.

Now, what about the 4.7 µs number? This number is, first of all, AFTER the ISR has completed execution, and secondly, is affected by the priority of the process. The ultimate decision as to whether this is good or bad depends upon the system designer — whoever decided at what priority things should run. If you NEED to attain the 4.7 µs number, then the process should run at a higher priority that other processes, period.

The 1 millisecond issue

Regardless of where the DTR pin was asserted (in the ISR or in the process), most operating systems require you to do timing functions within a process. (We certainly don't want to spend one millisecond in an ISR!)

There are, of course, some design issues associated with this. Since the kernel receives periodic interrupts from some hardware clock, and indeed bases all of its timing on those interrupts, you cannot delay for a period of time whose granularity is finer than the base clock tick rate. For example, if the kernel gets periodic interrupts every ten milliseconds, you certainly CANNOT reliably delay for anything less than ten milliseconds. However, it's not as simple as boosting the hardware clocks rate. Even if we boosted the rate to, let's say, one millisecond, the next issue that hits us is the fact that the hardware clock is asynchronous to the process. If the hardware clock has just interrupted the kernel, and we now tell the kernel that we want to sleep for one millisecond, we'll get pretty close to a one millisecond delay. However, if the kernel clock is ABOUT TO interrupt the kernel, and we schedule our one millisecond delay, we will be woken up much too early. The best that you can do in this case is to boost the hardware clock rate so that the "jitter" (the amount of variability in the delay time) is "acceptable". Even if we boosted the clock rate to 100 µs, we would still only be able to reliably sleep for between just over 900 µs and just under 1000 µs (1 millisecond). And, of course, we can't just boost the hardware clock to an arbitrary rate, like 1 µs, because the kernel wouldn't be able to handle the interrupts at that rate.

As a side note, I have personally found a hardware clock rate of 500 µs to work just fine for the X-10 application — as it turns out, the timing length isn't THAT sensitive.

In this case, however, there is a slightly more elegant solution. Since what we want is a time source that is SYNCHRONOUS with the assertion of the DTR pin, why not use the serial port chip's TX pin instead? You could program the serial port for 9 kbaud, and send it one byte with all of the bits set the same. The serial port chip will send out 8 data bits and one stop bit, (9 bits in all), which, at 9 kbaud will be extremely close to 1 millisecond! Tying the TX pin to the TW-523's TX pin means that the hardware has effectively generated a nice, clean 1 millisecond pulse for you. (Of course, this occurred to me AFTER I built the hardware and got it running.)

Can you rely on this as an "external" synchronous timing source? It depends on your hardware. Certainly, but it depends on your willingness to modify the hardware such that the TX pin is looped back to a modem status pin that can generate an interrupt, such as CD.

The Audio Driver

Let's look at something completely different, to illustrate some other timing issues. About 6 years ago, when sound cards weren't very good (and were somewhat pricey), I managed to wangle some samples of digital- audio quality A/D and D/A parts. These parts worked with a serial data stream, so I designed an IBM-PC/AT compatible ISA card with 4 FIFO chips on it, and some serial/parallel and parallel/serial conversion circuitry logic on it. I wasn't quite sure how to work with the hardware interrupt system, so I left it off for what I thought was the initial test. To my surprise, the board worked! (and the interrupt circuitry has stayed off of the board.) So what does this have to do with realtime?

Let's examine how a FIFO chip works. A FIFO chip has two "sides". In my case (for the D/A portion), one side is connected to the ISA bus (the writer side), and the other side is connected to the parallel/serial conversion logic (the reader side). The reader side is driven by a steady 44.1 kHz clock — that's the sampling rate that the card and D/A converter operates at. This means that the parallel/serial conversion logic is reading data out of the FIFO at a fixed rate (44.1 kHz). Since the FIFOs are 512 bytes (and there are two of them, to make, effectively, a single 512 word FIFO), this means that the FIFO will go from full to empty in 512/44100 seconds (11.6 milliseconds).

I realized that I didn't *need* interrupts, and could get away with just polling the FIFO's "FULL/EMPTY" flag!

All I had to do was fill the FIFO completely, and then I had 11.6 milliseconds where I could do whatever other processing was required. Here's what I mean. This is the main polling loop in my audio driver:

while (!done) {
    // if FIFO is full, go to sleep
    if (inpw (FIFOport) & S_FIFOFull) {
        delay (1);
    } else {
        // FIFO is not full, write data
        outpw (FIFOport, buf [bufptr++]);
        if (bufptr >= BlockSize) {
            done = 1;
        }
    }
}

The decision to call delay (which sleeps for 1 millisecond) as opposed to calling it with a number closer to 11.6 was made for two reasons. First of all, I didn't feel comfortable with sleeping until the FIFO was almost empty — what if something caused me to oversleep? Then there would be a "click" in the audio stream as the parallel/serial logic sucked zero's out of the FIFO! Also, and more importantly, I didn't want to be hogging the CPU at a high priority for the entire time that it took to fill the FIFO from a near-empty state. It's much better to fill it in tiny bursts, as this allows other, lower priority processes to run more often. This second point may appear moot -- except for the fact that at one point I contemplated buying 16 kbyte FIFO's instead of the "wimpy" 512 byte FIFO's I had, until I found out that they were about $50 EACH.

Another consequence of not using an interrupt is that I'm avoiding the context switch of entering an ISR and scheduling a process.

Note that the code presented above is just an extract, the actual code has multiple buffers for "buf" that are fetched from disk during the "idle" time when the FIFO is full.

Real Realtime Timing

I hope that this discussion has enlightened you about some of the issues that arise during the design of a driver that has to deal with realtime devices. The key things to keep in mind are: how good is your kernel's clock granularity; how fast are the context switch times (both into the ISR, the "Interrupt Latency" number, and from the ISR to the process, the "Scheduling Latency" number), and finally, are there any good tricks that you can do in the hardware to offload the software's timing burden?

About the Author

Robert Krten is an independent consultant specializing in realtime systems design and development work. He has written three books on the QNX Realtime Operating System, as well as several articles. He is the president of "PARSE Software Devices", a consulting company specializing in QNX and realtime projects.