原创 The pursuit of perfection in RTL power estimation

 2014-12-3 19:00  2036 30 30 分类: 消费电子

Estimating likely power consumption of a piece of silicon before building it is necessarily an approximate endeavor -- not quite divining the future by poking through chicken bones, but certainly laden with assumptions and approximations. Unfortunately, early estimates are critical to the economic and timely development of world-class designs. You can't just iterate through design, manufacture, and measurement as many times as it takes to get it right.

The gold-standard of pre-silicon estimation starts from a full physical gate-level implementation -- synthesized to final gates and placed-and-routed. Using this representation, with detailed power models for gates, extracted parasitics, and activity files from gate-level simulation, power estimates are claimed to fall within 5% of silicon measurements (with multiple caveats). However, a full implementation cycle takes time (enough time that you are not going to complete more than one a day for a ~1M gate block), and it is difficult to correlate power problems back to the RTL design. So, while an improvement over waiting for silicon, this method is still not well-suited to rapid design-measure-debug iteration.

The electronic system level (ESL) may be the best place to estimate and optimize architecture for power, but either way you have to re-estimate at the RTL (register transfer level) to get the implementation architecture right. This offers the fastest and most direct debug cycle, but with a penalty in accuracy over gate-level estimation. RTL estimation still uses the same Liberty power models used in gate-level estimation and the same simulation testbench, but takes a scientific wild-ass guess (SWAG) at what gates will be mapped to in synthesis, Vth mixes, cell drives, data path optimization, net capacitances, and more. Still, for many purposes at less aggressive nodes, this approach can provide useful guidelines to major implementation decisions.

While basic estimates are often reasonably accurate in terms of overall power, they don't typically stand up well to close examination. If you want to understand, for example, static versus dynamic power components or contributions by module or contributions of memories versus the clock tree, basic analysis can be significantly off. One way to get significant improvement is to calibrate the SWAG estimates against a fully-implemented version of an earlier generation/similar design. Unsurprisingly, mapping Vth mix, drive strengths, capacitance models, and clock trees by clock domains can significantly tighten up estimates at the detail level.

But what if you are working on a new design and you don't have prior examples to guide calibration? Or what if you want to optimize for power at the micro-architectural level -- for example, splitting high-fanout nets and pipelining? To usefully guide decisions in these cases means the estimation tool has to more closely emulate a real implementation flow. In turn, this means close correlation on cell selection module-by-module -- not just threshold mix, but also DesignWare selections. It also means close correlation on drive strengths, interconnect capacitances, and clock tree implementation, all of which require some form of physical prototyping. The trick here is to get a reasonable level of accuracy faster than you can through a full implementation cycle, so you can run through multiple design-measure-debug cycles in one day. Some approximations can be made to help achieve this speed, but your vendor still needs to provide a credible case that they can accomplish reasonable correlation with the real implementation flow.

Getting to really useful RTL power estimation is hard work. We are constantly refining correlation at the component level (static versus dynamic), at the module level, and at the detailed architectural level. If you have additional ideas or feedback on the limitations in RTL power estimation, I'd be very interested to hear them.

Bernard Murphy

CTO

Atrenta Inc.