热度 21
2011-6-21 10:05
1737 次阅读|
0 个评论
Building predictably reliable systems is the point of engineering. However, most firmware engineers ignore the role of determinism in real-time systems. Few can answer questions like "how can you guarantee that the system won't fail when stressed?" Today's hardware is often cursed with all sorts of nifty speed-enhancers like cache, pipelines, and speculative execution. All of these contribute to execution time uncertainty. The system's performance can vary wildly depending on a lot of hard-to-predict events. An interrupt may occur at any time, and will require at least a partial cache flush. Resuming execution flow means rereading instructions from L2 or memory, which can take a surprisingly long time. A system that is running fine but close to the edge may suddenly crumble in meeting its hard real-time deadlines. Can you really guarantee the highest priority task will complete on time? What if there's a perfect storm of interrupts? Or of bus activity ( DMA or having to yield the bus to another master )? In big systems a task may depend in very complex ways on externalities ( other computers, systems, I/O ) that aren't ready in time. Preemptive multitasking is itself inherently non-deterministic, though techniques like rate-monotonic analysis can mitigate the problem. But RMA requires more analysis than most developers will ever do. Even extremely simple systems that have none of these speed-enhancing features can suffer from serious timing problems. A little bit of C code that looks quite deterministic probably makes calls to the black hole that is the runtime library, which is generally uncharacterized ( in the time domain ) by the vendor. Does that call take a microsecond or a week? No one knows. It's my belief that too many systems "work" due only to divine intervention. Developers chase down the usual procedural bugs and then breathe a sigh of relief that, once again, a miracle has occurred. But all too often that gift from heaven is merely a reprieve, an indulgence, with damnation still possible or even likely when the system experiences unexpected stresses. Or when luck runs out and interrupts bunch up. Unlike most other engineered systems our real-time devices don't have fuses that blow when something goes wrong. Instead of a controlled shutdown or fallback to a less-capable mode, firmware completely collapses in an unpredictable way. What do you do to convince yourself ( at least ) that the system will be reliable in the time domain?