Jack Ganssle
Digi-Key
Part 1 - Overview and features
Internal WDTs
Watchdogs can be divided into two general categories: those that are on board the processor chip, and external devices added by the hardware designer. Most microcontrollers have an internal watchdog, though their efficacy varies widely.
An example is Maxim’s (nee Dallas’) DS80C320/DS80C323, an 8031 variant that has been around for quite a while. This part has two really nice features in the watchdog. First, one can program it to generate an interrupt, but 512 cycles later it will reset the CPU, so debugging breadcrumbs are easy to save. Also, access to the WDT registers can be restricted; one has to execute two particular move instructions back-to-back, and then there is just a three cycle window in which a WDT register access can take place. This hugely reduces the chances that rogue code will disable the protection mechanisms. However, one wonders what happens if an interrupt occurs between these instructions. Presumably the WDT access will not occur, making it impossible to enable that feature. Clearly, the software engineer must disable interrupts when executing this sequence.
Freescale’s MCF520x series are rather different. To tickle the watchdog, one must issue two writes to the watchdog service register, but any number of instructions may occur between these. This could defeat reliable operation if the CPU is crashed and running random code. On the up side, the reset status register does log whether the prior reset was due to an external hardware signal or from the WDT timeout, a useful way to log errors after rebooting. One may program the watchdog to generate either a reset or an interrupt; the latter is a very bad idea. If the stack were to go odd – due to a bug or rogue code – the system will go into a double-bus fault. An interrupt will not restore the CPU to normal operation; only a reset will.
STMicroelectronics' new series of STM32F4 Cortex-M4 CPUs has two independent watchdogs. One runs from its own internal RC oscillator. That means that all kinds of things can collapse in the CPU and the WDT will still fire. There is also a “window watchdog” (WWDT) which requires the code to tickle it frequently, but not too often. This is a very effective way to insure crashed code that randomly writes to the protection mechanism does not cause a WDT tickle, and the WWDT can generate an interrupt shortly before reset is asserted.
Intriguingly, some of these parts also include an “analog watchdog” which fires an interrupt if an input to an A/D exceeds a programmed limit. One could monitor the power supply and detect brownouts. In a system that controls dangerous hardware, this early-warning could be used to put the system into a safe state before the power goes out of operating limits.
Many of Microchip’s PIC24F series have WWDTs, as do some of NXP’s parts such as the LPC18xx and LPC43xx series. NXP’s parts can be configured so that, once enabled, it is impossible for the software to turn the WDT off, which offers more protection from code that is running amok.
None of these processors signal the outside world that a timeout took place. The designer may have to assert a parallel I/O bit to reset external hardware if the software cannot guarantee a proper re-initialization.
External WDTs
Few microprocessors (as opposed to microcontrollers) have an internal watchdog timer, and in many cases internal WDTs do not provide the reliability needed for a particular application. In these cases the design should be augmented with external hardware that monitors system operation and issues a reset if needed.
In a system that uses two or more CPUs, it is reasonable to have each processor monitor the other’s operation.
There are a number of WDT chips available. In general, their operation is not controlled by software, so the crashed program cannot disable their functionality. Additionally, they also assert reset during power-up, eliminating the need for those external components.
One external WWDT is Maxim’s MAX6751. It has a WWDT whose timeouts are controlled by capacitors, as shown in Figure 1.
Рисунок 1. | The MAX6751’s timeouts are set by a pair of capacitors (Courtesy Maxim). |
Texas Instruments’ TPS3126 is similar to a WWDT without the window capability. They are available for a variety of supply voltages and delay times, are inexpensive, and come in SOT-23 packages. Figure 2 outlines their configuration.
Figure 2. | TI’s TPS3126 monitors power as well as have WDTs (Courtesy Texas Instruments). |
TI also has a family of parts – including the TPS386000 – which monitor four separate power rails and include a WDT with a fixed delay. One of the voltage monitors can handle negative supplies. If any go out of tolerance, individual “RESET” outputs are activated. Being open drain, they can be wire-ORed together. Alternatively, one could connect these to input pins so the CPU can know which supply is low, and take appropriate action.
Analog Devices' ADM699 is a simple WDT which also monitors one supply. Figure 3 shows its implementation.
Figure 3. | The ADM699 has a very clean and simple design (Courtesy Analog Devices). |
Some microprocessors now are very particular about the reset input, tolerating only relatively rapid slew rates and signal levels. An open drain drive can meet those requirements only by using a very low-value pull-up resistor, which increases power consumption. Analog Devices has several components, like the ADM6316, that use a push-pull output to meet these stringent requirements. Figure 4 shows the part’s block diagram.
Figure 4. | The ADM6316 has push-pull drivers (Courtesy Analog Devices). |
Software considerations
Even the best watchdog circuitry is but a poor safety mechanism if the code is not properly constructed. Alas, in most systems, developers sprinkle watchdog tickles throughout the code without thinking through the design.
The most important consideration is to insure that all of the code is running correctly, not just part of it. Therefore, never put a WDT tickle in an interrupt service routine, and never devote an RTOS task to this activity. If the main code crashes the interrupts, and even the RTOS’s scheduler, it may continue to run, so the watchdog never times out.
In a single-threaded design, use a state machine-like architecture. Example code is shown in Figure 5.
Figure 5. Code to handle a non-multitasking WDT.
main(){
state=0x5555;
wdt_a{);
.
.
.
.
state+=0x2222;
wdt b();
}
wdt_a(){
if (state!= 0x5555) halt;
state+=0x1111;
}
wdt_b(){
if (state!= 0x8888) halt;
kick dog;
state=0;
}
Here the main loop starts by setting variable “state” to 0x5555. It calls wdt_a() which checks to see if the value is correct; if not, it halts and the WDT resets the system. Otherwise the value in “state” is changed by adding an offset. Note that the watchdog has not been tickled.
At the end of the loop (we have executed all of the code), “state” is adjusted once again and wdt_b() gets called. Now, if “state” has not properly cycled through all of the changes to it – indicating we did run the entire loop – the code halts and the CPU gets reset. Otherwise the watchdog is tickled and “state” is set to zero. Note that if the code crashes and wanders into wdt_b(), “state” will not be correct and a reset will happen.
In a multitasking application, each task increments a value associated with the task in a data structure every time each starts. A low-priority task occasionally examines the data structure and checks to make sure the data is reasonable. If one task runs often, the number will be large; conversely, slow tasks will have low numbers. If everything is OK the task tickles the WDT, zeroes the values, and returns. Otherwise it halts and initiates a reset.
If there are exceptions that the code cannot handle or recover from (e.g., a divide by zero or a malloc() failure), write a handler that disables interrupts and halts. The watchdog will bring the system back to life.
Conclusion
The watchdog timer is the last line of defense against crashed code, and as such, must be well designed and implemented. Today many microcontrollers include WDTs that are very sophisticated and resilient to spoofing by a program that has run amok. Alternatively, use an external WDT; probably the safest bet is one that supports windowing. Also, structure the code carefully, so that errant software does not fall into the tickle routine and keep the timer from resetting the CPU.