Digital Signal Processing Reference
In-Depth Information
will be exposed. In the next step, moving data between memories and other storage
devices will be modeled using program code or DMA transactions.
The firmware has runtime constraints when processing real-time data, especially
streaming data. From profiling of the source code, runtime cost can be estimated.
Extra run time cost is used to execute added subroutines for handling finite data
precision, extra memory transactions, and task management. If the total runtime is
less than the new data arrival time, the system implementation is feasible. Otherwise,
enhancing hardware performance and selecting low cost algorithms are required.
Finally, the assembly code is generated and the functionality is verified. The
assembly code as the firmware can be released when the final run time of the
assembly code follows the specification and the precisions of the results are
acceptable.
8.2
Benchmarking and Instruction Usage Profiling
Benchmarking by definition is some kind of measure on performance. In this
context, benchmarking is used to measure the performance of the instruction
set of an ASIP DSP. In general, benchmarking is to measure the run time (the
number of clock cycles) for a defined task. Other measurement can also be
important, for example memory usage and power consumption. Memory usage can
be further divided into the cost of program memory and data memory required by
a benchmark. DSP benchmarking is usually conducted by running the assembly
benchmark code on the instruction set simulator of the ASIP.
DSP Benchmarking can be further divided into the benchmarking of DSP
algorithm kernels and the benchmarking of DSP applications. By benchmarking
DSP kernels, the 10% codes taking 90% runtime, essential performance and cost
can be estimated. When benchmarking the kernel algorithms, kernel assembly code
such as filters FIR, IIR, LMS (least mean square) adaptive filter, transforms (FFT
and DCT), matrix computing, and function (1/x, for example) solvers are executed
on the instruction set simulator of the processor. Benchmarking of an application
runs the application codes on assembly instruction set simulator of the processor.
The best way to evaluate a processor is to run applications because all overheads
can be taken into account. However, the coding cost for an entire application can be
very high. The benchmarking of an application should only be conducted on part of
the application, the cost extensive part.
To benchmark a DSP instruction set, two kinds of cycle cost measurement are
frequently used. One is the cycle cost per algorithm per sample data. For example
“30 clock cycles are used to process a data sample by a 16-tap FIR filter”. Another
kind of measurement is the data throughput of an application firmware per mega
Hertz. For example, “In one million clock cycles (1 MHz), up to 500 voice samples
can be processed in a 2048-tap acoustic echo canceller”. If the voice sampling rate
is 8 kHz (it usually is), the computing cost of 16 MHz will be required in this case.
Search WWH ::




Custom Search