Application Specific Instruction Set DSP Processors - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

will be exposed. In the next step, moving data between memories and other storage

devices will be modeled using program code or DMA transactions.

The firmware has runtime constraints when processing real-time data, especially

streaming data. From profiling of the source code, runtime cost can be estimated.

Extra run time cost is used to execute added subroutines for handling finite data

precision, extra memory transactions, and task management. If the total runtime is

less than the new data arrival time, the system implementation is feasible. Otherwise,

enhancing hardware performance and selecting low cost algorithms are required.

Finally, the assembly code is generated and the functionality is verified. The

assembly code as the firmware can be released when the final run time of the

assembly code follows the specification and the precisions of the results are

acceptable.

8.2

Benchmarking and Instruction Usage Profiling

Benchmarking by definition is some kind of measure on performance. In this

context, benchmarking is used to measure the performance of the instruction

set of an ASIP DSP. In general, benchmarking is to measure the run time (the

number of clock cycles) for a defined task. Other measurement can also be

important, for example memory usage and power consumption. Memory usage can

be further divided into the cost of program memory and data memory required by

a benchmark. DSP benchmarking is usually conducted by running the assembly

benchmark code on the instruction set simulator of the ASIP.

DSP Benchmarking can be further divided into the benchmarking of DSP

algorithm kernels and the benchmarking of DSP applications. By benchmarking

DSP kernels, the 10% codes taking 90% runtime, essential performance and cost

can be estimated. When benchmarking the kernel algorithms, kernel assembly code

such as filters FIR, IIR, LMS (least mean square) adaptive filter, transforms (FFT

and DCT), matrix computing, and function (1/x, for example) solvers are executed

on the instruction set simulator of the processor. Benchmarking of an application

runs the application codes on assembly instruction set simulator of the processor.

The best way to evaluate a processor is to run applications because all overheads

can be taken into account. However, the coding cost for an entire application can be

very high. The benchmarking of an application should only be conducted on part of

the application, the cost extensive part.

To benchmark a DSP instruction set, two kinds of cycle cost measurement are

frequently used. One is the cycle cost per algorithm per sample data. For example

“30 clock cycles are used to process a data sample by a 16-tap FIR filter”. Another

kind of measurement is the data throughput of an application firmware per mega

Hertz. For example, “In one million clock cycles (1 MHz), up to 500 voice samples

can be processed in a 2048-tap acoustic echo canceller”. If the voice sampling rate

is 8 kHz (it usually is), the computing cost of 16 MHz will be required in this case.

Search WWH ::

Custom Search

Home