原创 Are benchmarks the key to next gen DSP design?

 2013-8-22 19:11  1837 18 18 分类: 消费电子

Until recently most real-world signal processing in embedded designs have employed fixed point arithmetic because they had significant speed and cost benefits due to reduced hardware complexity, as well as faster time to market and lower development costs.

Floating point DSP was only used where other capabilities, including a wider dynamic range and greater precision, were required, such as in military/aerospace designs, medical designs, industrial robotics, and imaging systems used in factory automation.

Even there developers had to consider the trade-off between the reduced hardware cost of fixed point versus floating point, which required more expensive hardware but in the long term reduced the cost and complexity of software development needed to compensate for its limitations.

But there are a range of applications opening up beyond traditional embedded DSP markets, driven to some degree by the needs of many consumer apps for more multimedia, more and better audio and more sophisticated vision based user interfaces.

At the same time there has been the emergence of multi-core processor of more power and flexibility than traditional single core designs. These are not just homogenous collections of identical general purpose cores, but heterogeneous creations that mix general purpose and DSP cores capable of processing more sophisticated floating point implementations with the precision needed. And with competition and the economies of scale at work, costs of such powerful platforms are falling fast.

It is no longer as case of choosing between a lower cost and faster fixed point implementation and a more complex and expensive floating point design. Rather it an issue now of software and algorithm development. While powerful tools and methodologies have emerged to make it easier to create floating point designs, the real problems have been coming up with benchmark measurements on which the developer can rely unclouded by uncertainties about compiler efficiency and whether the benchmarks are truly reflective of a specific design.

Benchmarks are particularly important in both general purpose CPU and DSP design, according to Luther Johnson, Microchip Principal Compiler Engineer in Microchip Technology's' Development Systems Division, because they give both processor designers and users to the ability to measure critical parameters and make trade-offs.

"A benchmark's ability to extract the key algorithms of an application and acquire information on the performance-sensitive aspects of that design is critical to many embedded designs, as it relieves developers of having to run either in a simulator or later when hardware prototypes are available," he says. "Running a much smaller benchmark snippet on a cycle-accurate simulator can give clues on how to improve performance earlier. "

The problem is the reliability of the benchmarks used. Just as in traditional MPU designs the Drystone for integer and string operations is a problematic measuring stick upon which to depend, the bottleneck in floating point algorithm development is the reliability and accuracy of the traditional benchmarks, such as Whetstone, Linpak and Livermore loops.

The first two, said Johnson, are synthetic benchmarks, one written for measuring the performance of linear algebra on supercomputers, and the other for common operations on general purpose computers. The third—Livermore loops—are natural, derived from what the developers considered as common applications—in this case what researchers at the Lawerence Livermore National Laboratories needed. "Nothing has been available that truly reflects the application environments that embedded developers face," he said.

Never fear, the Embedded Microprocessor Benchmark Consortium is here. Just as it did with traditional processor and multi-processor evaluation metrics in its CoreMark benchmarking suite, it has now done with its just released FPMark.

Where CoreMark takes a common set of operations typical to most embedded designs and creates a suite of measurement algorithms so too in FPMark. According EEMBC president Markus Levy, FPMark contains single (32 bit) and double (64 bit) precision workloads, as well as a mixture of small to large data sets and algorithms common in many embedded designs, such as Fast Fourier Transform, linear algebra, ArcTan, Fourier coefficients, Horner's method, and Black Scholes), and complex algorithms such as a neural network routines, ray tracers, and a modified version of the Livermore Loops.

These algorithms are old friends to most of us in embedded design, when they are not or enemies in a truculent or difficult design. It will be interesting to see how useful these benchmarks will be to embedded developers making use of floating point operations. I look forward to hearing from you in the form of blogs and design articles on your experiences.