Design of a Pipelined DSP Microprocessor MUN DSP2000

Similar documents
CHAPTER 2 LITERATURE STUDY

CS 135: Computer Architecture I. Boolean Algebra. Basic Logic Gates

Mixed CMOS PTL Adders

Algorithms for Memory Hierarchies Lecture 14

ABB STOTZ-KONTAKT. ABB i-bus EIB Current Module SM/S Intelligent Installation Systems. User Manual SM/S In = 16 A AC Un = 230 V AC

MAXIMUM FLOWS IN FUZZY NETWORKS WITH FUNNEL-SHAPED NODES

Testing Delay Faults in Asynchronous Handshake Circuits

Inclined Plane Walking Compensation for a Humanoid Robot

ALTERNATIVE WAYS TO ENHANCE PERFORMANCE OF BTB HVDC SYSTEMS DURING POWER DISTURBANCES. Pretty Mary Tom 1, Anu Punnen 2.

Engineer-to-Engineer Note

Synchronous Machine Parameter Measurement

To provide data transmission in indoor

Sequential Logic (2) Synchronous vs Asynchronous Sequential Circuit. Clock Signal. Synchronous Sequential Circuits. FSM Overview 9/10/12

Architectural Support for Efficient Large-Scale Automata Processing

A Novel Back EMF Zero Crossing Detection of Brushless DC Motor Based on PWM

The Discussion of this exercise covers the following points:

Synchronous Machine Parameter Measurement

A New Stochastic Inner Product Core Design for Digital FIR Filters

Engineer-to-Engineer Note

Application Note. Differential Amplifier

Dataflow Language Model. DataFlow Models. Applications of Dataflow. Dataflow Languages. Kahn process networks. A Kahn Process (1)

Chapter 2 Literature Review

Design and Development of 8-Bits Fast Multiplier for Low Power Applications

EE Controls Lab #2: Implementing State-Transition Logic on a PLC

April 9, 2000 DIS chapter 10 CHAPTER 3 : INTEGRATED PROCESSOR-LEVEL ARCHITECTURES FOR REAL-TIME DIGITAL SIGNAL PROCESSING

ECE 274 Digital Logic. Digital Design. Datapath Components Shifters, Comparators, Counters, Multipliers Digital Design

Interference Cancellation Method without Feedback Amount for Three Users Interference Channel

Area-Time Efficient Digit-Serial-Serial Two s Complement Multiplier

Synchronous Generator Line Synchronization

PB-735 HD DP. Industrial Line. Automatic punch and bind machine for books and calendars

Multi-beam antennas in a broadband wireless access system

Engineer-to-Engineer Note

Birka B22: threaded in variation

ECE 274 Digital Logic. Digital Design. RTL Design RTL Design Method. RTL Design Memory Components

On the Description of Communications Between Software Components with UML

Modeling of Conduction and Switching Losses in Three-Phase Asymmetric Multi-Level Cascaded Inverter

Design and implementation of a high-speed bit-serial SFQ adder based on the binary decision diagram

Compared to generators DC MOTORS. Back e.m.f. Back e.m.f. Example. Example. The construction of a d.c. motor is the same as a d.c. generator.

Module 9. DC Machines. Version 2 EE IIT, Kharagpur

DIGITAL multipliers [1], [2] are the core components of

Fuzzy Logic Controller for Three Phase PWM AC-DC Converter

Three-Phase NPC Inverter Using Three-Phase Coupled Inductor

Exercise 1-1. The Sine Wave EXERCISE OBJECTIVE DISCUSSION OUTLINE. Relationship between a rotating phasor and a sine wave DISCUSSION

Eliminating Non-Determinism During Test of High-Speed Source Synchronous Differential Buses

(CATALYST GROUP) B"sic Electric"l Engineering

Digital Design. Sequential Logic Design -- Controllers. Copyright 2007 Frank Vahid

A Development of Earthing-Resistance-Estimation Instrument

Design And Implementation Of Luo Converter For Electric Vehicle Applications

CAL. NX15 DUO-DISPLAY QUARTZ

Threshold Logic Computing: Memristive-CMOS Circuits for Fast Fourier Transform and Vedic Multiplication

MEASURE THE CHARACTERISTIC CURVES RELEVANT TO AN NPN TRANSISTOR

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

High Speed On-Chip Interconnects: Trade offs in Passive Termination

CHAPTER 3 AMPLIFIER DESIGN TECHNIQUES

Section 2.2 PWM converter driven DC motor drives

DESIGN OF CONTINUOUS LAG COMPENSATORS

Lab 8. Speed Control of a D.C. motor. The Motor Drive

Implementation of Different Architectures of Forward 4x4 Integer DCT For H.264/AVC Encoder

Experiment 3: Non-Ideal Operational Amplifiers

Convolutional Networks. Lecture slides for Chapter 9 of Deep Learning Ian Goodfellow

Simulation of Transformer Based Z-Source Inverter to Obtain High Voltage Boost Ability

Series AE W PFC INDUSTRIAL POWER SUPPLY

The computer simulation of communication for PLC systems

Discontinued AN6262N, AN6263N. (planed maintenance type, maintenance type, planed discontinued typed, discontinued type)

Effect of High-speed Milling tool path strategies on the surface roughness of Stavax ESR mold insert machining

Understanding Basic Analog Ideal Op Amps

& Y Connected resistors, Light emitting diode.

Postprint. This is the accepted version of a paper presented at IEEE PES General Meeting.

Engineering: Elec 3509 Electronics II Instructor: Prof. Calvin Plett,

Experiment 3: Non-Ideal Operational Amplifiers

Homework #1 due Monday at 6pm. White drop box in Student Lounge on the second floor of Cory. Tuesday labs cancelled next week

Adaptive Geometric Features Based Filtering Impulse Noise in Colour Images

Study on SLT calibration method of 2-port waveguide DUT

Addressable relay driver

ECE 274 Digital Logic

Lecture 16: Four Quadrant operation of DC Drive (or) TYPE E Four Quadrant chopper Fed Drive: Operation

SMALL SIGNAL MODELING OF DC-DC POWER CONVERTERS BASED ON SEPARATION OF VARIABLES

First Round Solutions Grades 4, 5, and 6

(1) Non-linear system

University of North Carolina-Charlotte Department of Electrical and Computer Engineering ECGR 4143/5195 Electrical Machinery Fall 2009

Section 16.3 Double Integrals over General Regions

ECE Digital Logic (Labs) ECE 274 Digital Logic. ECE Digital Logic (Textbook) ECE Digital Logic (Optional Textbook)

Ultra Low Cost ACCELEROMETER

Quantitative Studies of Impact of 3D IC Design on Repeater Usage

Design of FPGA-Based Rapid Prototype Spectral Subtraction for Hands-free Speech Applications

Solutions to exercise 1 in ETS052 Computer Communication

Timing Constraint-driven Technology Mapping for FPGAs Considering False Paths and Multi-Clock Domains

Kirchhoff s Rules. Kirchhoff s Laws. Kirchhoff s Rules. Kirchhoff s Laws. Practice. Understanding SPH4UW. Kirchhoff s Voltage Rule (KVR):

Safe Inter-domain Routing under Diverse Commercial Agreements

Example. Check that the Jacobian of the transformation to spherical coordinates is

An Efficient SC-FDM Modulation Technique for a UAV Communication Link

Pennsylvania State University. University Park, PA only simple two or three input gates (e.g., AND/NAND,

Asynchronous Data-Driven Circuit Synthesis

Ultra Low Cost ACCELEROMETER

MOS Transistors. Silicon Lattice

Network Sharing and its Energy Benefits: a Study of European Mobile Network Operators

Figure 2.14: Illustration of spatial frequency in image data. a) original image, f(x,y), b) plot of f(x) for the transect across image at the arrow.

Math Circles Finite Automata Question Sheet 3 (Solutions)

Application of AHP in the Analysis of Flexible Manufacturing System

SOLVING TRIANGLES USING THE SINE AND COSINE RULES

Transcription:

Design of Pipeline DSP icroprocessor N DSP2000 Cheng Li, Lu io, Qiyo Yu, P.Gillr n R.Venktesn Fculty of Engineering n Applie Science emoril niversity of Newfounln St. John s, NF, Cn A1B 3 E-mil: {licheng, xio, qiyo, venky}@engr.mun.c, pul@cs.mun.c Abstrct Progrmmble igitl signl processing (DSP) microprocessors re the processors tht re esigne to perform in igitl signl processing-intensive pplictions. In this pper, We present the esign of simplifie DSP microprocessor with restricte instruction set N DSP2000, which consists of three mjor components: the control unit, the tpth n the system memory. A Hrvr rchitecture, pipeline tpth n t forwring techniques re use to improve the system performnce n voi hzrs (t hzr, control hzr, etc.). The whole system is coe using VHDL n simulte using Synopsys CAD tools. The system performnce is briefly nlyze bse on the synthesis results. Key wors: DSP microprocessor, Pipeline, ultiply n Accumulte (AC), VHDL 1. Introuction After commercil DSP microprocessors were first introuce in the erly 1980s, DSP technology n DSP microprocessors hs gine more n more significnce in the current informtion worl. Though DSP microprocessors n generl-purpose microprocessors shre number of common fetures, they hve importnt ifferences. DSP microprocessors re primrily esigne for rel-time high-spee clcultion pplictions [1]. Besies possessing mny of the fetures of generl-purpose microprocessor, DSP is lso chrcterize by fst multiply-ccumulte, multiple-ccess memory rchitecture, specilize ressing moes n specilize execution control [2]. The N DSP2000 microprocessor is -bit fixe-point prototype computtionl engine on which DSP pplictions cn be built. There re -bit generl-purpose registers within the CP, which re use for most of the opertions. Except tht R0 is re only register n lwys stores 0 s its vlue, ll the other registers re the sme. A lo n store rchitecture is use for the instructions except the AC opertion. Our trget is to buil DSP microprocessor tht cn support bsic DSP opertions like the DFT, FFT, igitl filters, n so on. For such DSP pplictions, the convolution opertion is wiely use. In the computer system, DSP ppliction, like igitl filter, cn be chieve by multiplying the current signl vlue (multiply opern ) with the coefficient of the igitl filter (multiply opern b) n then ing it bck to the previous sum ( opern c). All these work shoul be finishe in one step, i.e., in single system clock cycle. Thus n efficient AC opertion (multiply two operns n then to thir opern) is key requirement for DSP processor. In this project, we im t builing the DSP microprocessor to support the bsic function tht DSP processor shoul support the multiply n ccumulte (AC) opertion. Besies this DSP feture, we lso inclue some functions tht generl-purpose processor shoul support. The instruction set we choose is subset of the complete instruction set for generl-purpose processor. Restricte t forwring is relize to improve our esign. Some highlighte fetures of N DSP2000 re: Pipeline t-pth esign AC opertion Seprte instruction n t memory (Hrvr emory Structure) Fst t memory ccess (in one single clock cycle, two t memory elements cn be fetche) Support vrious opertions such s rithmetic, logic n control function 2. N DSP2000 control unit esign 2.1 Instruction set esign

A restricte instruction set is inclue in the N DSP2000 esign. Actully, it is subset of tht of the IPS processor [2]. The instructions re ll bits long. There re totl 16 instructions tht hve been implemente in the esign. Accoring to their purposes, the instructions cn be ivie into five groups: 1. Arithmetic opertion ADD, ADDI, SB, SBI, L, AC 2. Brnch opertion BNEZ, JP, JR 3. Logic opertion AND, OR, OR 4. emory opertion LW, SW, SWS. Other NOP A -bit instruction is encoe s follows. The most significnt 6 bits represents the opcoe (bit 31 bit 26). The next bits represent the first register number (bit 2 bit 21) n the following two bits represent the secon (bit 20-16) n the thir register (bit 1 bit 11) respectively. It is not necessry tht ll the three registers will be use in every instruction. The 16 LSB bits is use s the ress offset. When new instruction is fetche from the instruction memory, for exmple n rithmetic opertion ADD R1, R2, R3, this will be encoe in the instruction memory in the following formt: SB 001011 00001 00010 00011 00000 000000 LSB Opcoe R1 R2 R3 When this instruction comes to the controller n register file, the controller will interpret the opcoe bits n will fin tht this is n opertion. The controller will output the corresponing control signls tht set up the correct pth for this opertion. The register file will output the store vlue for these corresponing registers. These t will be use s the source for the specifie opertions. 2.2 System control esign The control unit is the key component for N DSP2000 to perform properly. The control unit performs the instruction vlition n instruction-ecoing tsks uring the instruction ecoing (ID) pipeline stge. System control is chieve through the controlle output using series of ifferent multiplexers. In pipeline structure, the control signls generte by the control unit re propgte own the pipeline synchronously with the system clock through pipeline registers. Besies these control signls, there re two other types of control signls generte by the control unit. One is use for the generl control purpose such s the memory re/write control signl n the other is use to control the AL opertion. 3. N DSP2000 t-pth esign In this pper, we will only look t some key components in the t-pth esign, inclue the AL, multiplier n memory esign. In the next section, we will iscuss how we cn put ll the components together n mke pipeline microprocessor. 3.1 AL esign The rithmetic logic unit (AL) is n essentil prt of computer processor. It performs the rithmetic opertions (ition n subtrction) n logic opertions (AND, OR, OR etc.). It hs two -bit t inputs A n B, n crry input C. The three control inputs ALOP0, ALOP1, ALOP2 ecie which opertion shoul be tken. The outputs inclue -bit rithmetic logic result n 3-bit flg (v: overflow; c: crry; z: zero). The most time consuming opertions in AL opertion re ition n subtrction. In ripple er esign, the er propgtes the crry from the lowest bit to the highest sequentilly. Thus the most significnt bit of the sum must wit for the sequentil evlution of the previous 31 1-bit ers, which is very slow. In theory we cn nticipte the crry input without witing for it to be generte by the previous 1-bit er component. This cn be one by pplying some clcultions on the two operns n the crry input to the lest significnt bit of the er. Dely for this kin of er will be in orer of log 2 N, where N is the bit number of the operns ( in our cse), inste of N s the usul ripple er hs. Therefore, to increse the spee, fst prllel er, the Look-Ahe-Crry is use in our esign. 3.2 ultiplier esign ultipliction with ccumultion is typicl opertion in DSP pplictions. Therefore the hrwre multiplier is n importnt chrcteristic of DSP processor. The spee of the multiplier is one of the most importnt

fctors etermining the overll system performnce. Ielly the multiplier shoul hve the bility to multiply two operns in single system clock cycle. We cn implement the multiplier in single unit using pure combintionl logic circuit. However this esign is not efficient becuse the corresponing pipeline stge will hve much longer ely thn other stges. This mkes the system clock cycle, which is etermine by the worst-cse ely, unnecessrily long. The synthesize result for the ely of 16-bit multiplier is 49 ns, much longer thn other components. Therefore it seems more ttrctive to split the multiplier into severl pipeline stges. We use 16-4 bit multipliction moulr n split the multiplier into stges. The synthesis result shows tht the ely for ech stge within the multiplier is 19 ns. Although the totl time is 19 = 9 ns, lmost twice s in single unit, it is vntgeous n more ttrctive since the clock cycle coul be reuce gretly when we integrte the multiplier pipeline in the system pipeline of the microprocessor. 3.3 N DSP2000 memory esign Typicl DSP opertions require mny itions n multiplictions. In our esign, the AC opertion requires us to fetch two operns from the t memory with resses inicte in the two source registers n perform the multipliction n ition. To fetch the two operns in single instruction cycle, we nee to mke two memory ccesses simultneously. The result of multipliction n ition is store bck in the register holing the ition opern. By oing so, we cn ccumulte the results of mny multipliction opertions in this register through consecutive AC instructions. One AC instruction nees four register ccess opertions (three res n one write) besies two t memory res in one instruction cycle. For the other instructions, it is necessry to ccess t memory once t most (e.g. LW, SW). There re two common methos to chieve multiple memory ccesses per instruction cycle: extene Hrvr rchitecture n moifie von Neumn rchitecture []. Extene Hrvr rchitecture The norml Hrvr rchitecture hs two seprte physicl memory buses. This llows two simultneous memory ccesses: one for instruction memory n one for t memory. This is inequte for AC opertions, which involves two operns in t memory. Some DSP Hrvr rchitectures permit the instruction bus to be use lso for ccess of operns. It is often necessry to fetch three wors (the instruction plus two operns) n the Hrvr rchitecture is inequte to support this. Thus DSP Hrvr rchitectures often inclue cche memory which cn be use to store instructions tht will be reuse, leving both Hrvr buses free for fetching operns. This extension is sometimes clle n extene Hrvr rchitecture. oifie von Neumn rchitecture The von Neumn rchitecture uses only single memory bus for both t memory n instruction memory. This is economicl n simple to use becuse the instructions or t cn be locte nywhere throughout the vilble memory. But it oes not permit multiple memory ccesses. The moifie von Neumn rchitecture llows multiple memory ccesses per instruction cycle by using two seprte clocks: one for instruction n the other for t memory. The clock for t memory is (n-1) times fster thn tht for instruction cycle. Ech instruction cycle is ivie into n mchine sttes, n memory ccess cn be me in ech mchine stte. Consequently, totl of n memory ccesses per instruction cycle re llowe. The t memory rchitecture of NDSP2000 uses the moifie Hrvr rchitecture. The t memory is seprte from instruction memory just like norml Hrvr rchitecture. The t memory consists of mster memory n slve memory. Both mster memory n slve memory hve their own ress n t buses. The mster memory is use s the common t memory. For ll the instructions except AC, the t memory ccesses re oriente to the mster memory. The AC instruction gets the first opern from the mster memory n the secon opern from the slve memory. Becuse they own the eicte ress n t buses, the two t memories cn be ccesse simultneously. One vntge is tht the progrmmer cn use resses freely since no ress conflicting cn occur between the two memories. 4. Pipelining n t forwring The pipeline structure cn be implemente by inserting register between ifferent stges. With these registers, both control signls n intermeite t in one stge re seprte from jcent stges. These registers re triggere by the system clock. Approprite logic circuits cn be e between inputs n output of ifferent pipeline registers to perform t forwring. In the esign of N DSP2000 microprocessor, the

t-pth is comprise of pipeline with five stges: Instruction fetch (IF), Instruction ecoing (ID), emory Access (E), Execution (E) n Write bck (WB) [2, 6]. For the AC opertion, we nee to get the memory t before we cn procee. Thus we move the E stge to the front of the E stge. Ech stge of the pipeline opertes inepenently n is synchronize with the system clock. Ech instruction execution tkes five clock cycles to complete n new instruction is fetche uring ech clock cycle. Figure 1 shows the pipeline esign igrm of the N DSP2000 microprocessor.. Performnce nlysis Bse on the timing specifiction from the synthesis result of ech system builing component, s shown in tble 1, we get the execution time for ll the instructions epening on the ifferent routes. To implement the pipeline structure of the processor, we ivie the totl execution time eqully into ll the pipeline stges. Accoring to the current timing figures, the spee of our DSP processor is roun 10 Hz. The instruction cycle T is etermine by the mxim of the worst cse ely mong ll pipeline stges. Component ximl ely (NS) Loction Progrm counter 1.3 Stge 1 Instruction memory (1K ) 13.23 Stge 1 1 Stge 1, 2, 4 2 1.39 Stge 1, 2, 3, 4 Register file 22 Stge 2 Controller.4 Stge 2 Zero Detector 3.11 Stge 2 4 2.7 Stge 2, 3, 4 OR Gte 0.8 Stge 2 Dt emory (1K) 13.23 Stge 3 ultiplier 49 Stge 4 AL 19 Stge 4 The tble inictes tht the multiplier is the criticl component for the mximum ely. The min reson for the low spee of our DSP microprocessor is tht we put both the multiplier n AL into the sme pipeline stge. Thus the ely of this pipeline stge limits the spee of the whole system. To improve this, we reesigne the multiplier n AL using n internl pipeline structure. It performe roun 4 times fster thn the previous one (roun 19 ns for the multiplier). Bse on the new esign, the spee of our DSP microprocessor will chieve roun 2 Hz. An ll these results re bse on the current 0.3um COS technology we re using. With more vnce COS technology such s 0.18um n 0.1um, N DSP2000 cn chieve much higher spee n performnce [6]. 6. Conclusion n future work The whole system is escribe n implemente in VHDL. Both the bsic components n the complete pipeline microprocessor re simulte n teste using Synopsys esign compiler. Ech system builing component is synthesize using the Synopsys esign nlyzer n is prove to work properly. Timing specifiction from the simultion n synthesis results re consiere n use to ecie system structure n pipeline esign. Bse on the esign n testing result of our implementtion, we cn conclue tht NDSP2000 is simple n prcticl prototype of DSP processor. It performs norml DSP pplictions efficiently n relibly. Becuse of esign time limittion, some ttrctive fetures of DSP processor were not inclue into our esign t the time. In the future, t forwring cn be further improve n the number of pipeline stges cn be further extene. The performnce of N DSP2000 microprocessor will then be further improve n enhnce. References 1. P.Lpsley et. l "DSP Processor Funmentls", IEEE Press, 199 2. J.H Hennessy n D.A. Ptterson, Computer Orgniztion n Design: The hrwre Softwre Interfce, orgn Kupmn Publishers, 1994 3. Synopsys: Design Compiler Reference nul, V3.2 1992( Online ocumenttion) 4. Z. Nvbi, VHDL: Anlysis n oelling of Digitl System, cgrw Hill 1998. Steve Heth, icroprocessor Architectures n Systems: RISC, CISC & DSP, Newnes, 1991 6. Cheng Li, Lu io, Qiyo Yu, N DSP2000 Processor Design, Course project report for Avnce Digitl System, emoril niversity, Aug. 2000

IR 20-16 4 PC Reset Clock Hlt 1 Instruction emory IR System Clock IR 2-0 26 Strt IR 31-26 Shifter PC + 4 IR 20-16 IR 1-11 IR 2-21 C O N T R O L L E R 28 Register 2 Register 3 Registers Register 1 Files Write Dt IR 1-0 16 4 PC + 4[31-28] Write Register Sign Exten Dt 2 Dt 3 Dt 1 Ctrl JP JP0 JP1 em1 em2 Rsem1 Rtem2_0 Rtem2_1 RtR3 ulsel ALOP0 ALOP1 ALOP3 ResSel_0 ResSel_1 ArSel Forwr_EN Shifter 2 JP 0 JP 1 0 ArSel JP 9 Ctrl 8 IR 1-11 IR 2-21 Zero FwDt F w D t F w D t Pipeline Registers Trnsfer e Fw3 Fw2 Fw2 c b Fw3 11 Dt emory Rsem1 3 4 em2 em1 Rtem2_0 Rtem2_1 FwDt 1 Fw1 Forwring nit Fw1 Fw3 Fw2 L 6 RtR3 7 ulsel AL ALOP0 ALOP1 ALOP2 Forwr_EN 7 ResSel_0 IR 2-21 FwDt IF/ID ID/E E/E E/WB Figure 1 N DSP2000 System Digrm