PIPE DIV Pipelined Divider with generic width Rev. 1.2 Key Design Features Block Diagram Synthesizable, technology independent VHDL Core Function y = a / b Input values as signed or unsigned integers Output values as signed or unsigned integers Configurable numerator and denominator data width High-speed fully pipelined architecture with configurable number of register stages for area/speed trade off Quotient, remainder and div zero outputs One output result per clock-cycle (i.e. pipelined operation) Capable of clock speeds of 400MHz+ on even basic FPGA platforms Figure 1: Pipelined Divider Architecture Applications General Description PIPE DIV (Figure 1) is a pipelined divider with configurable data width. Fundamental building block in digital processing functions The design is fully scalable and modular permitting the user to specify 1 large dividers without compromising maximum attainable clock-speed. Division of integers and fixed-point numbers The divider accepts input values as either signed or unsigned integers Implementation of the reciprocal function f(x) = 1/x depending on the generic setting use signed. An n-bit numerator and denominator will generate an n-bit result for the quotient and an n-bit result for the remainder. The output remainder always takes the sign of Pin-out Description the numerator and is determined by the formula: Pin name I/O Description Active state Rmd =NumQuoDen clk in Synchronous clock rising edge en in Clock enable high Where Rmd, Num, Quo and Den represent the remainder, numerator, quotient and denominator respectively. a in dw-1:0 in Input numerator data In the case of a divide by zero, then the div zero flag is asserted at the b in dw-1:0 in Input denominator data divider output and the maximum value possible is returned in the quotient. quotient dw -1:0 out Output quotient data The remainder takes the value of the numerator. For example, if dw = 8, the division -3/0 will return a result of -128 for the quotient and -3 for the remainder dw-1:0 out Output remainder data remainder. The division 3/0 would return a remainder of 3 and a quotient div zero out Divide by zero flag high of 127 for signed arithmetic or 255 for unsigned. Values are sampled on the rising clock-edge of clk when en is high. The number of register stages in the pipeline may be modified in order to trade Generic Parameters off maximum speed against the total resource used. The overall pipeline latency of the divider is given by the formula: Generic name Description Type Valid range dw Input data width integer 2 Latency=dw/ reg stages (dw 2 = 0) use signed Use signed or boolean TRUE/FALSE For example, a 24-bit divider with the number of register stages set to 3 unsigned arithmetic will result in a circuit with 8 clock cycles of latency. In other words, the result of a division will take 8 clock-cycles to appear at the output. Note reg stages Number of pipeline integer 1 that while the latency may change depending on the implementation, the register stages dw throughput is always maintained at one output result per clock. 1 For fixed-point numbers then inputs must be pre-scaled by a power of 2. E.g. the division 0.2/0.3 could be done as 51/77 in 8-bit arithmetic. Copyright 2011 www.zipcores.com Download this VHDL Core Page 1 of 3PIPE DIV Pipelined Divider with generic width Rev. 1.2 Functional Timing Figure 2 demonstrates two sequential calculations of 10/-3 and -5/0. In this example, the parameters have been set to dw = 4, use signed = true, reg stages = 1. The result has a latency of 4 clock cycles. Figure 3: Plot of test results for function: f(x) = 1/x Figure 2: Calculation of a/b Synthesis Source File Description The source files required for synthesis and the design hierarchy is shown All source files are provided as text files coded in VHDL. The following below: table gives a brief description of each file. pipe div.vhd Source file Description pipe div shiftsub.vhd pipe div shiftsub.vhd Shift-subtract block pipe div.vhd Top-level block The VHDL core is designed to be technology independent. However, as a benchmark, synthesis results have been provided for the Xilinx Virtex 5 pipe div bench.vhd Top-level test bench and the Altera Stratix III series of FPGA devices. The lowest and highest speed grade devices have been chosen in both cases for comparison. Functional Testing Note that the generic parameter reg stages will have a significant effect on the speed and area of the synthesized design. For the fastest possible design, the generic parameter reg stages should be set to 1. For the An example VHDL testbench is provided for use in a suitable VHDL smallest design, then reg stages should be set to equal the data width, simulator. The compilation order of the source code is as follows: dw. In addition, choice of unsigned logic will result in a design with a slightly smaller area. 1. pipe div shiftsub.vhd Trial synthesis results are shown with the generic parameters set to: dw = 2. pipe div.vhd 16, use signed = true, reg stages = 1. 3. pipe div bench.vhd Resource usage is specified after Place and Route. The VHDL testbench instantiates the divider component and the user may modify the generic parameters as required. The simulation must be run VIRTEX 5 for at least 2 ms during which time the divider will be driven with a Resource type Quantity used randomized sequence input values. The test terminates automatically. Slice register 801 The simulation generates two text files called: pipe div in.txt and Slice LUT 704 pipe div out.txt. These files respectively contain the input and output data samples captured at the interfaces during the test. Block RAM 0 DSP48 0 Figure 3 shows the results of the divider used to implement the function f(x) = 1/x with the generic parameter dw = 16. Results are shown for the Clock frequency (worst case) 385 MHz first 100 samples. Clock frequency (best case) 495 MHz Copyright 2011 www.zipcores.com Download this VHDL Core Page 2 of 3