Signal Processing Architecture Trends
Ned BinghamSince the 1960s, three distinct architectures have been used to accelerate computational tasks for DSP systems: Microprocessors, Field Programmable Gate Arrays (FPGA), and Coarse Grained Reconfigurable Arrays (CGRA), all with variations optimizing the problem domain with specialization , parallelism , and configurability .
Early DSP history was myopically focused on specialization in Microprocessor architectures primarily due to limited area on die . The first single-chip DSP, the TMC 0280, was developed in 1978 with a dedicated multiply accumulate (MAC) unit , and dedicated complex operators are a mainstay of DSP architectures to this day. The TMS 32010 adopted the Harvard Architecture in 1982 to satisfy intensive IO bandwidth requirements , and numerous variations appeared shortly thereafter . The DSP 32 added floating point arithmetic to deal with data scaling issues in 1984 , and the DSP 56001 found a better solution in 1987 with saturating fixed-point arithmetic on a wide datapath . The DSP 32 also added register indirect addressing modes to compress memory addresses in the instruction words, and the DSP 56001 implemented circular buffers in memory to optimize delay computations.
With shrinking process technology nodes yielding more transistors on die, DSP architectures shifted focus toward parallelism . The TMS320C20 had a pipelined datapath to target data parallelism in 1993 . In 1996, the TMS320C8x added multiple cores to optimize task parallelism . Then in 1997, the DSP16xxx introduced a two lane pipelined Very Long Instruction Word (VLIW) architecture .
In the 2000s, the DSP market saw a fundamental shift. First, Intel introduced DSP extensions for their general purpose processors targeting non-embedded applications in 1999 . Second, Xilinx introduced FPGAs to the DSP market with the development of the Xilinx Virtex-II targeting embedded high-performance applications in 2001 . While difficult to program, FPGAs are much more flexible, have orders of magnitude better performance and energy consumption, and may be reconfigured in the field. As a result, specialized microprocessor DSP architectures were relegated to embedded low-performance problem domains. Since then, FPGA innovations have focused on application specific operator integration and network optimization , ease of use , embedded and non-embedded system integration , and run-time and partial reconfigurability .
While the dominance of FPGAs has demonstrated that array architectures are the right solution for the problem domain, CGRAs show the potential for significant improvements across the board . Historically, bit-parallel CGRAs have extremely limited capacity due to routing resource requirements. Digit-serial CGRAs solve the capacity issues by reducing the width of the datapath. However, they also sacrifice configurability in the face of complex timing and control requirements. This has led to a variety of systolic array architectures that accelerate extremely specific computational tasks. However, solving these configurability issues could open the door to a diverse set of new capabilities on mobile platforms.
Edward A Lee.Programmable dsp architectures i.ASSP, Volume 5, Issue 4. IEEE, 1988. Edward A Lee.
Programmable dsp architectures ii.ASSP, Volume, 6 Issue 1. IEEE, 1989. Edwin J. Tan, and Wendi B. Heinzelman.
DSP architectures: past, present and futures.Computer Architecture News (SIGARCH), Volume 31 Issue 3, Pages 6-19. ACM, 2003. Richard Wiggins and Larry Brantingham.
Three-Chip System Synthesizes Human Speech.Electronics, Pages 109-116. 1978. John So.
TMS 320-a step forward in digital signal processing.Microprocessors and Microsystems, Volume 7, Issue 10, Pages 451-460. 1983. R. Kershaw, et al.
A programmable digital signal processor with 32b floating point arithmetic.International Solid-State Circuits Conference, Volume 28. IEEE, 1985. Kevin Kloker.
The Motorola DSP56000 digital signal processor.Micro, Volume 6, Issue 06, Pages 29-48. IEEE, 1986. Jeff Bier.
DSP16xxx Targets Communications Apps.Memory, Volume 60, Page 16. 1997. Karl Guttag.
TMS320C8x family architecture and future roadmap.Digital Signal Processing Technology, Volume 2750. SPIE, 1996. Texas Instruments, Inc.
TMS32OC2x User's Guide.1993. Linley Gwennap.
Merced Shows Innovative Design.Microprocessor Report, 13.13. 1999. Xilinx.
Fiscal Year 2001 Form 10-K Annual Report.US Securities and Exchange Commission, 2001. Artur Podobas, et al.
A survey on coarse-grained reconfigurable architectures from a performance perspective.Access, 8. IEEE, 2020. Masudul Hassan Quraishi, et al.
A survey of system architectures and techniques for FPGA virtualization.Transactions on Parallel and Distributed Systems, 32.9. IEEE, 2021. Song Wu, et al.
When FPGA-accelerator meets stream data processing in the edge.39th International Conference on Distributed Computing Systems (ICDCS). IEEE, 2019. Valeria Cardellini, et al.
Run-time Adaptation of Data Stream Processing Systems: The State of the Art.Computing Surveys. ACM, 2022. Mark Wijtvliet, Luc Waeijen, and Henk Corporaal.
Coarse grained reconfigurable architectures in the past 25 years: Overview and classification.International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS). IEEE, 2016. Leibo Liu, et al.
A survey of coarse-grained reconfigurable architecture and design: Taxonomy, challenges, and applications.Computing Surveys (CSUR), Volume 52, Issue 6, Pages 1-39. ACM, 2019.