Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Analysis of Periodic Functions and HDL Synthesis Report - Prof. Marek A. Perkowski, Exams of Electrical and Electronics Engineering

The analysis of a periodic function with a given period and frequency, and provides the mathematical expressions for the sum of its sinusoidal and cosinusoidal components. Additionally, it includes an hdl synthesis report for a digital design, detailing the macro statistics, optimization results, and device utilization summary.

Typology: Exams

Pre 2010

Uploaded on 08/18/2009

koofers-user-pm0-1
koofers-user-pm0-1 🇺🇸

10 documents

1 / 54

Toggle sidebar

Related documents


Partial preview of the text

Download Analysis of Periodic Functions and HDL Synthesis Report - Prof. Marek A. Perkowski and more Exams Electrical and Electronics Engineering in PDF only on Docsity! FINAL REPORT on IMAGE MATCHING PROCESSOR on Xilinx FPGA As Part of Project Work For ECE 573 By Satyanarayana Nekkalapu Graduate Student ID #: 947919900 Instructor: Dr. Marek Perkowski Professor, ECE Department, Portland State University Winter 2007 1 Goal: The goal of this project is to design an image matching processor, describe it in HDL (Verilog), synthesize, implement and test using Xilinx FPGA. The processor reads two images one is a sub set of the other and locates the subset in the superset image. 1. Introduction and Approach: Image matching is the fundamental basis for many problems in computer vision. In object recognition, images in the object library are compared with the image under test. In panorama mosaic, global registration is performed to determine the relative projective mapping between different images. Other applications include location recognition in robot navigation, content-based image retrieval, 3D reconstruction, stereo image matching and motion tracking. This project discusses one method of implementation of an image matching processor. The hardware model converts the images to frequency domain by performing FFT (Fast Fourier Transform) operation which allows reducing the convolution process to elementary matrix product. The resultant product matrix goes through the IFFT (Inverse Fast Fourier Transform) .The matrix obtained from IFFT is scanned for the highest intensity which indicates the high co-relation of the two images. 1.1 Introduction to Signals: A Signal can be defined as a set of information or data. In general signal can be a function of one or more variables. Example: Speech: Speech is an audio signal that varies with time. This is an example of one- dimensional signals, as the signal is a function of only one variable, time. The time, here, varies continuously. So Speech can be represented as a function of time (t) I.e. (Speech signal)F = f (t) Picture: A Picture is a two-dimensional signal that depends on the spatial coordinates (x and y coordinates) of the picture plane. A picture, in general, is continuous in x and y directions and the strength/intensity of colors at each point. So picture can be represented as a function of coordinates x and y. I.e. (Picture) P = f(x,y) 1.1.1 Classification of signals: The are a number of ways in which the signals are classified, but in general they are classified into 2 As we let _ approach zero, the approximation ~x(t) becomes better and better, and the in the limit equal to the original signal).Therefore, Also, as , the summation approaches an integral, and the pulse approaches the unit impulse: eq 1 In other words, we can represent any signal as an infinite sum of shifted and scaled unit impulses. 1.1.4. Linear Systems In other words, we can represent any signal as an infinite sum of shifted and scaled unit impulses Where denotes the transform, a function from input signals to output signals. Systems come in a wide variety of types. One important class is known as linear systems. To see whether a system is linear, we need to test whether it obeys certain rules that all linear systems obey. The two basic tests of linearity are homogeneity and additivity. 1.1.4.1 Homogeneity. As we increase the strength of the input to a linear system, say we double it, and then we predict that the output function will also be doubled. For example, if the current injected to a passive neural membrane is doubled, the resulting membrane potential fluctuations will double as well. This is called the scalar rule or sometimes the homogeneity of linear systems. 1.1.4.2 Additivity. Suppose we measure how the membrane potential fluctuates over time in response to a complicated time-series of injected current . Next, we present a second (different) complicated time-series . The second stimulus also generates fluctuations in the membrane potential which we measure and write down. Then, we present the sum of the two currents and see what happens. Since the system is linear, the measured membrane potential fluctuations will be just the sum of the fluctuations to each of the two currents presented separately. 1.1.4.3 Superposition. Systems that satisfy both homogeneity and additivity are considered to be Linear systems. These two rules, taken together, are often referred to as the principle of superposition. Mathematically, the principle of superposition is expressed as: Homogeneity is a special case in which one of the signals is absent. Additivity is a special case in which . 1.1.4.4 Shift-invariance. Suppose that we inject a pulse of current and measure the membrane potential fluctuations. Then we stimulate again with a similar pulse at a 5 different point in time, and again we measure the membrane potential fluctuations. If we haven’t damaged the membrane with the first impulse then we should expect that the response to the second pulse will be the same as the response to the first pulse. The only difference between them will be that the second pulse has occurred later in time, that is, it is shifted in time. When the responses to the identical stimulus presented shifted in time are the same, except for the corresponding shift in time, then we have a special kind of linear system called a shift-invariant linear system. Just as not all systems are linear, not all linear systems are shift-invariant. In mathematical language, a system T is shift-invariant if and only if: 1.1.5 Convolution: To characterize a shift-invariant linear system, we need to measure only one thing: the way the system responds to a unit impulse. This response is called the impulse response function of the system. Once we’ve measured this function, we can (in principle) predict how the system will respond to any other possible stimulus. Fig1.8 characterizing a linear system using its impulse response. The way we use the impulse response function is illustrated in Fig. xx. We conceive of the input stimulus, in this case a sinusoid, as if it were the sum of a set of impulses (Eq. 1). We know the responses we would get if each impulse was presented separately (i.e., scaled and shifted copies of the impulse response). We simply add together all of the (scaled and shifted) impulse responses to predict how the system will respond to the complete stimulus. 1.1.5.1 convolution integral. Begin by using Eq.1 to replace the input signal x (t) by its representation in terms of impulses: 6 Now let h (t) be the response of to the unshifted unit impulse, i.e., Then by using shift-invariance, eq (2) Notice what this last equation means. For any shift-invariant linear system T, we know its impulse response once we know its impulse response (that is, its response to a unit impulse), we can forget about T entirely, and just add up scaled and shifted copies of to calculate the response of it to any input whatsoever. Thus any shift-invariant linear system is completely characterized by its impulse response The way of combining two signals specified by Eq.2 is know as convolution. It is such a widespread and useful formula that it has its own shorthand notation,* .For any two signals x and y there will be another signal z z obtained by convolving x with y, 1.2. Time and Frequency Representation The most common representation of signals and waveforms is in the time domain. However, most signal analysis techniques work only in the frequency domain. The concept of the frequency domain representation of a signal is quite difficult to understand when one is first introduced to it. 1.2.1. Time and Frequency Domains The frequency domain is simply another way of representing a signal. For example, consider a simple sinusoid in Fig1.9 7 where 1.4.2. Fast Fourier Transform The fast Fourier transform (FFT) is simply a class of special algorithms which implement the discrete Fourier transform with considerable savings in computational time. It must be pointed out that the FFT is not a different transform from the DFT, but rather just a means of computing the DFT with a considerable reduction in the number of calculations required. The number of complex multiplication and addition operations required by the simple forms both for the Discrete Fourier Transform (DFT) and Inverse Discrete Fourier Transform (IDFT) is of order N2 as there are N data points to calculate, each of which requires N complex arithmetic operations. For length n input vector x, the DFT is a length n vector X, with n elements: In computer science jargon, we may say they have algorithmic complexity O(N ) and hence is not a very efficient method. If we can't do any better than this then the DFT will not be very useful for the majority of practical DSP applications. However, there are a number of different 'Fast Fourier Transform' (FFT) algorithms that enable the calculation the Fourier transform of a signal much faster than a DFT. As the name suggests, FFTs are algorithms for quick calculation of discrete Fourier transform of a data vector. The FFT is a DFT algorithm which reduces the number of computations needed for N points from O(N )2 to O(N log N) where log is the base-2 logarithm. The 'Radix 2' algorithms are useful if N is a regular power of 2 (N=2). If we assume that algorithmic complexity provides a direct measure of execution time and that the relevant logarithm base is 2 then as shown in Fig. 2.1, ratio of execution times for the (DFT) vs. (Radix 2FFT) (denoted as 'Speed Improvement Factor') increases tremendously with increase in N. The term 'FFT' is actually slightly ambiguous, because there are several commonly used ‘FFT’ algorithms. There are two different Radix 2 algorithms, the so-called 'Decimation in Time' (DIT) and ‘Decimation in Frequency’ (DIF) algorithms. Both of these rely on the recursive decomposition of an N point transform into 2 (N/2) point transforms. 10 Fig 2.1 The radix-2 decimation-in-frequency FFT is an important algorithm obtained by the divide-and-conquer approach. The Fig. 2.2 below shows the first stage of the 8-point DIF algorithm. Fig 2.2: Decimation in frequency of a length-N DFT into two length- N The decimation, however, causes shuffling in data. The entire process involves v = log 2 N stages of decimation, where each stage involves N/2 butterflies of the type shown in the Fig. 2.3. 11 Fig 2.3 Here W N = e –j 2Π/ N, is the Twiddle factor. Consequently, the computation of N-point DFT via this algorithm requires (N/2) log 2 N complex multiplications. For illustrative purposes, the eight-point decimation-in frequency algorithm is shown in the Figure below 2.4. We observe, as previously stated, that the output sequence occurs in bit-reversed order with respect to the input. Furthermore, if we abandon the requirement that the computations occur in place, it is also possible to have both the input and output in normal order. Fig 2.4 1.4.2. Multidimensional FFT: An interesting property of the Fourier transform is its separability. When we extend the DFT to two dimensions we can rewrite it as 12 Fig 3.0 2.1 Functional Specification: In this stage the designer need to understand the requirements of the system and specifying the abstract functionality of the system. 2.2 RTL Description/Simulation: ASIC design descriptions are written by designers at different levels of abstraction. Most common hardware description languages used by designers are Verilog and VHDL. Both these languages are equally capable of providing complex constructs to describe complex functionality. Behavioral modeling forms highest level of abstraction. 2.2.1. Behavioral description At initial stage of the design process the designer provides a Behavioral description of the functionality intended. Behavioral model does not care about the structure of the design, combinational or sequential elements used in the design, clock signal and the timing constraints involved. It captures the intended behavior of the design. The below given example describes the behavior of an adder that adds two four digit inputs to return and output. It is important to note that this description does not capture timing information. 15 2.2.2. RTL description RTL stands for Register Transfer Level. In this model the entire design is split into registers with flow of information between these registers at each clock cycle. RTL description captures the change in design at each clock cycle. All the registers are updated at the same time in a clock cycle. Typically an RTL description divides design into registers and the logic blocks that join those registers together. RTL captures the data flow but fails to give a good description of control flow. Register Transfer Level description of a design. 2.2.3. Structural Description Structural description consists of a network of instances of logic gates and registers described by a technology library as shown in fig 3.1 . Technology library is provided by fabrication houses. Technology library is a description of simple AND, OR, NOT and complicated multiple functionality cells. The description of a cell includes its geometry, delay and power characteristics. Structural modeling describes circuit in form of instances of cells and interconnects between those cells. Fig 3.1: structural modeling. 2.2.4. Simulation: Logic simulation is an essential part of digital circuit design. Logic simulation and verification are used to verify the functionality described by a design description against output values expected at the output ports of a digital integrated circuit. 16 There are mainly three classes of logic simulators: • compiled code logic simulators • event-driven logic simulators and • compiled-code event driven simulators. 2.3. Synthesis: Synthesis is the process of converting a design expressed in register-transfer level Hardware Description Language (HDL) into a netlist of gates or logic primitives that can be mapped to the cells in the technology library. Synthesis involves three major steps: • Transition from RTL description into gates and flip-flops • Optimization of logic, and • Placement and routing of optimized netlist. • Most of the intelligence resides in optimization stage but modern synthesis tools apply many smart techniques while converting RTL description into gates in order to reduce the number of gates in the design. The synthesis tool generates various report file 2.3.1. Area report: shows the designer how much of the resources of the chip the design has consumed. The designer can tell if the design is too big for a particular chip and the designer needs to target a larger chip, if the design should go into a smaller chip, or if the current chip will work fine. The designer can also get a relative size of the design to use in later stages of the design process 2.3.2. Timing report: shows the timing of critical paths or specified paths of the design. The designer examines the timing of the critical paths closely because these paths ultimately determine how fast the design can run. If the longest path is a timing critical part of the design and is not meeting the speed requirements of the designer, then the designer may have to modify the HDL code or try new timing constraints to make the path meet timing. The most important type of output data is the netlist for the design in the target technology. This output is a gate or macro level output in a format compatible with the place and route tools that are used to implement the design in the target chip. For instance, most place and route tools for FPGA technologies take in an EDIF netlist as an input format. The primitives used in the netlist are those used in the synthesis library to describe the technology. The place and route tools understand what to do with these primitives in terms of how to place a primitive and how to route wires to them. 2.4. Place & Route: Place-and-route (P&R) describes several processes where the netlist elements are physically places and mapped to the FPGA physical resources, to create a file that can be downloaded in the FPGA chip. Place and route tools are used to take the design netlist and implement the design in the target technology device. The place and route tools place each primitive from the netlist into an appropriate location on the target device and then route signals between the 17 assign prop = a ^ b; assign s = prop ^ cin; assign cout = (a & b) | (cin & (a | b)); endmodule Technically, it is not necessary to declare single-bit wires. However, it is necessary to declare multi-bit busses. It is good practice to declare all signals. Some Verilog simulation and synthesis tools give errors that are difficult to decipher when a wire is not declared. 3.2.2. Precedence: The order of precedence is important Assign cout = a&b | cin&(a|b) The operator precedence from highest to lowest is much as you would expect in other languages. AND has precedence over OR. 3.2.3. Constants: Constants may be specified in binary, octal, decimal, or hexadecimal. 3.2.4. Tristates: It is possible to leave a bus floating rather than drive it to 0 or 1. This floating value is called ’z in Verilog. For example, a tri-state buffer produces a floating output when the enable is false. module tristate(a, en, y); input [3:0] a; 20 input en; output [3:0] y; assign y = en ? a : 4’bz; endmodule Floating inputs to gates cause undefined outputs, displayed as ’x in Verilog. At startup, state nodes such as the internal node of flip-flops are also usually initialized to ’x, as we will see later. 3.2.5. Bit Swizzling: The {} notation is used to concatenate busses. For example, the following 8x8 multiplier produces a 16-bit result, which is, placed on the upper and lower 8-bit result busses. module mul(a, b, upper, lower); input [7:0] a, b; output [7:0] upper, lower; assign {upper, lower} = a*b; endmodule 3.3 Modeling with Always Blocks: Assign statements are reevaluated every time any term on the right hand side changes. Therefore, they must describe combinational logic. Always blocks are reevaluated only when signals in the header change. Depending on the form, always blocks may imply sequential or combinational circuits. 3.3.1 Flip-Flops: Flip-flops are described with an always @(posedge clk) statement: module flop(clk, d, q); input clk; input [3:0] d; output [3:0] q; reg [3:0] q; always @(posedge clk) q <= d; endmodule The body of the always statement is only evaluated on the rising (positive) edge of the clock. At this time, the output q is copied from the input d. The <= is called a nonblocking assignment. 3.3.2 Latches: Always blocks can also be used to model transparent latches, also known as D latches. When the clock is high, the latch is transparent and the data input flows to the output. When the clock is low, the latch goes opaque and the output remains constant. module latch(clk, d, q); input clk; 21 input [3:0] d; output [3:0] q; reg [3:0] q; always @(clk or d) if (clk) q <= d; endmodule The latch evaluates the always block any time either clk or d change. If the clock is high, the output gets the input. Notice that even though q is a latch node, not a register node, 3.4. Combinational Logic: Always blocks imply sequential logic when some of the inputs do not appear in the @ stimulus list or might not cause the output to change. For example, in the flop module, d is not in the @ list, so the flop does not immediately respond to changes of d. In the latch, d is in the @ list, but changes in d are ignored unless clk is high. Always blocks can also be used to imply combinational logic if they are written in such a way that the output always is reevaluated given changes in any of the inputs. The following code shows how to define a bank of inverters with an always block. module inv(a, y); input [3:0] a; output [3:0] y; reg [3:0] y; always @(a) y <= ~a; endmodule 3.5. Memories: Verilog has an array construct used to describe memories. The following module describes a 64 word x 16 bit RAM that is written when wrb is low and otherwise read. module ram(addr, wrb, din, dout); input [5:0] addr; input wrb; input [15:0] din; output [15:0] dout; reg [15:0] mem[63:0]; // the memory reg [15:0] dout; always @(addr or wrb or din) if (~wrb) mem[addr] <= din; else dout <= mem[addr]; endmodule 22 size_head = size(headin) size_pic = size_pic(1:2); size_head = size_head(1:2); % To achieve a linear convolution, must zero-pad the matrices % to a size 1 less than the sum of the two sizes. pad = size_pic + size_head - [1 1]; % Initialize final output matrix. fin = zeros(pad); % A convolution involves flipping the matrix up-down an d % left-right. We do not want this, so pre-flip the head % matrix. headin=flipud(fliplr(headin)); % Compute the linear convolution of the head with the picture by % multiplying the fourier coefficients together and then taking % the inverse fourier transfrom of it. fft_pic = fft2(picin, pad(1), pad(2)) fft_head = fft2(headin, pad(1), pad(2)) convo = ifft2(fft_pic.*fft_head); size(convo) figure; image(convo); axis('square'); % A high amount of correlation will be detected wherever there is % a large amount of brightness. To normalize this: compute the % convolution of the picture squared(to make differences more % extreme) with a matrix of ones the same size of the head matrix, % then divide the previous convolution matrix with this one. temp = fft2(picin,pad(1),pad(2)); temp2 =fft2(ones(size_head),pad(1),pad(2)); norml = ifft2(temp.*temp2).^.5; fin = fin + convo./norml; %end % Find the largest value in the final matrix. high = max(max(fin)); % Construct a new matrix, setting all values greater than a certain % threshold equal to white and everything else equal to black (on a % gray(2) colormap. %loc = (fin >= high - .05) + 1; size_fin = size(fin); loc =(fin(ceil(size_head(1)/2):size_fin(1)- ceil(size_head(2)/2),ceil(size_head(2)/2):size_fin(2)- ceil(size_head(2)/2)) >= high*th) +1; figure; image(uint8(picin)); axis('square'); figure; colormap(gray(2)); image(loc); axis('square'); 4.1.2. RESULTS FROM MATLAB SIMULATION: 25 SIMULATION1: Window 1 From the Window1 above: Figure 1: Is the image in which we need to search for match. Figure 2: Is the Kernel image which is to be traced in Fig 1. Figure 3: Is the convoluted image of images in figure1 and figure 2 .The dark red indicated the highest intensity. Figure 4: shows the position where the kernel image is located in Figure1. SIMULATION2: 26 Window 2 From the window2 Figure 1: Is the image in which we need to search for match. Figure 2: Is the Kernel image which is to be traced in Fig 1. Figure 3: Is the convoluted image of images in figure1 and figure 2 .The dark red indicated the highest intensity. Figure 4: shows the position where the kernel image is located in Figure1. 4.2. HW IMPLIMENTATION: 4.2.1 BLOCK DIAGRAM: 27 eee eee Parameters | ©] Core Overview| */ Contact| “J Web Links| CORRE Fast Fourier Transform Implementation Options O Pinmtiomss, Blrssarniny UO Radbea, Burst ii © Radi<2, Minimum Resources Iranstorm Length fapuons | [Run Time Configurable Transform Lenath Transrorm Length Information Lammest 16 Smallest 18 = wack Next = Page 1 of 3 Fig 4.2 4.2.3.1. Core symbol and port definition: — XNPE XK_RE —— — xm XK_[M Jee — | start XN_INDEX fo ——] untoap XK_INDEX foe — eer RFD | — ——] nreT_we Busy |__ pov — Fwo_inv EDONE | — — Fwo_INv_We DONE |___ — SCaLE_scH — ScaLE_SCH_WE BLK_EXPL — scin — ce OVFLO |__ — cix 30 Port Name Port Width Direction Description XN_RE Brn Input Input data bus: Real component (b,,, = $- 24) in two's complement format XN_IM an Input Input data bus. Imaginary component (b,, = 6 - 24) in two's complement format START Input FFT start signal (Active High): START is asserted to begin the data loading and transform calculation (for the burst I/O architectures). For streaming I/O, START will begin data loading, which proceeds directly to transform calculation and then data. unloading, UNLOAD Input Result unloading (Active High): For the burst I/O architectures, UNLOAD will start the unloading of the results in normal order. The UNLOAD port is not necessary for the Pipelined, Streaming VO architecture or for bit/digit reversed unloading. NFFT Input Paint size of the transform: NFFT can be the size of the transform or any smaller point size. For example, a 1024-point FFT can compute point sizes 1024, 512, 256, and so on. The value of NFFT is logs (point size). This port is only used with run-time configurable transform length Port Name Port Width Direction Description RFD Output Readly for data (Active High): RFD is High during the load operation. BUSY ‘Output Core activity indicator (Active High): This signal will go High while the core is computing the transform. by ‘Output Data valid (Active High): This signal is High when valid data is presented at the output. EDONE ‘Output Early done strobe (Active High): EDONE goes High one clock cycle immediately prior to DONE going active, DONE ‘Output FFT complete strobe (Active High): DONE will transition High for one clock cycle when the transform calculation has completed BLK_EXP Output Block exponent: The number of bits scaled for every point in the data frame. Available only when block-floating point is used. OVFLO Output Arithmetic overflow indicator (Active High): OVFLO will be High during result unloading if any value in the data frame overflowed. The OVFLO signal is reset at the beginning of a new frame of data. This port is optional and only available with scaled arithmetic. 31 Port Name Port Width Direction Description NFFT_WE Input Write enable for NFFT (Active High): Asserting NFFT_WE will automatically cause the FFT core to stop all processes and to initialize the state of the core tothe new point size on the NFFT port. This port is only used with run-time configurable transform length. FWD_INY Input ‘Control signal that indicates if a forward FFT or an inverse FFT is performed. When FWD_INV=1, a forward transform is computed. if FWD_INV=0, an inverse transform is performed! FWD_INWY_WE Input Write enable for FWD_INV (Active High). SCALE SCH for Pipelined Streaming VO and Radix-4 Burst 0 architectures or 2xNFFI for Radix-2 Minimum Resources where NFFT is logs (point size) or the number of stages Input Scaling schedule: For Burst 0 architectures, the scaling scheduk is specified with two bits for each stage, starting at the two LSBs. The scaling can be specified as 3, 2, 1, or 0, which represents the number of bits to be shifted. An example scaling schedule for =1024, Racix-4 Burst MO is[1 0232). For Ne128, Radix-2 or Radix-2-Lita, one possible scaling schedule is [141104 2) For Pipelined Streaming IO armhitecture, the scaling achedule is specified with two bits for every pair of Radix-2 stages, starting at the two LSBs, For example, a scaling schedule for N=256 could be [2 2 23]. When N is not a power of 4, the maximum bit growth for the last stags is one bit. For instance, [0 2 22 2)or(1 2222) are valid scaling schedules for N=512, but (2 2 2 2 3] is invalid. Tha two MSBs of SCALE_SCH can only be 00 or 04. This portis only available with scaled arithmetic (not unscaled or block-floating point). SCALE_SCH_WE Input White enable for SCALE_SCH (Active High): This portis available only with scaled arithmetic SOLA Input Master synchronous reset (Active High): Optional port. CE Input Clack enable (Active High): Optional port. CLK Input Clock XK_RE Output Output data bus: Real component in two's complement format. (For scaled arithmetic and block floating point arithmetic, 2,4=5,,. For unscaled arithmatic, 5y=5y_q¢NFFT ‘Output ‘Output deta bus: Imaginary component in two's complement format. (For scaled arithmetic and block floating point arithmetic, b,.=5,.. For unscaled arithmetic, 5, NFFT- *M_INDEX logs (point size) ‘Output Index of input data. XK_INDEX. log, (point size) ‘Output Index of output data. 32 The control unit generates the control signals to perform 2D FFT using ID FFT. Initially the Images are loaded into Image 1 Ram and Image 2 Ram, now Xilinx core FFT reads the matrix in Image ram1 row by row performs FFT and stores in Transformed image ram1 column by column after completing all rows the Xilinx core FFT reads the matrix in Transformed Image ram1 row by row performs FFT and stores in image ram1 column by column. Once it is done with Image 1 its starts with the Image 2. 4.2.5. CONTROLUNIT: The control unit is the heart of the design it co ordinates all the events by generating the corresponding control signals. The main tasks that are to be co ordinate by the control unit are • Compute 2D FFT for IMAGE 1 • Compute 2D FFT for IMAGE 2 • Compute element by element matrix product of the two transformed Images • Compute 2D IFFT of the resultant product matrix. • Compute the coordinates of max value Since the design shares a common address/data buss the control unit should also act as the bus arbitration. The control unit has 4 state machines and all the above tasks are performed sequentially. The logic is designed such that the control is passed form one state machines to other state machines which generate the respective control signal for getting the task to be done. The schematic is shown in fig 4.5 is obtained from the synthesized RTL viewer from the tool. Fig 4.5 35 4.2.5.1 Pin details: Output Ports: Enb_A, Enb_B, Enb_C : The enable signal for Ram. (Active low signal) Start : when asserted the FFT/IFFT block would start reading the vector on which the transformed is to be performed. Sclr : reset to FFT. Unload: on asserting it, FFT would start writing the transformed to ram locations. Row_Add_to_mem Col_Add_to_mem: Address bus Rst : external reset (Active high) Inputs Ports: CLK: global clock signal Rfd : output from FFT when asserted indicates the FFT is ready to read the data. Sg[1:0] : out put from the multiplication unit used as an acknowledgment for changing state of state machine . Busy: output from FFT indicates that FFT is busy in performing the transform. FFTIINDEX/FFTKINDEX: Out put from FFT indicates the index of element that’s is being written or read from. Inout: Re_data[7:0] :real data bus. Im_data[7:0] :imaginary data bus. 4.2.5.2. State Diagram for Computing 2d FFT/IFFT: The main difference between FFT/IFFT is asserting/disserting a signal FWD_INV. If FED_INV = 1 the core computes FFT If FED_INV = 0 the core computes IFFT The state diagram is shown in fig 4.6. 36 Fig 4.6 Description: The brief explanation of the importance of each state, transactions of states and the out puts that are generated are described below All the transaction are Synchronized Idle: In this state the machine resets all the conditions like enables, FFT forward inverter, and Scaling factor for FFT. On the positive edge of the clock the machine moves to the next state Startgen. Startgen: As the name indicates this state would initiate the computation of FFT by asserting a start signal. On the positive edge of the clock the machine moves to the next state Fromram. Fromram: This state generates the address and read signal to ram and the data is fed to FFT for performing transformation. Machine waits in this state until it receives a signal (e_done) from FFT indicating that computation of the transform is done and is ready to issue results. 37 Simulation Timing diagram: 4.2.6. MULTIPLIER: Since the outputs from the FFT are complex numbers a complex multiplier must be designed which performs matrix product element by element (convolution).The complex multiplier is generated by using Xilinx coregen. As the design shares single data bus the multiplier should get the operands sequentially. The schematic of multiplier is shown in fig The Multiplication has three stages • Read operand 1 from an address from Ram1 • Read operand 2 from an inverted !(address) from Ram2 • Perform product and write back to address of operand1 in Ram1. The corresponding rd/wr and address/! (Address) are generated by control unit, but the multiplies unit should acknowledge the events. The multiplier unit uses signal Sg[1:0] as an acknowledgement to the control unit based on which the state changes are done in control unit.. 40 Fig 4.9 Pin Details: Input: Clk: global clock signal Rd: signal for reading the computed result. Wr: signal for writing the operand values Set: active high reset. Output: Sg[1:0]: an out put signal to control unit acts as an acknowledgment ,by which the control unit changes state Inout: Im_data(7:0) :imaginary data bus Re_data(7:0) : real data bus. 4.2.6.1. Xilinx Core implementation: Fig 4.10 41 The Xilinx core using 4 real multipliers is shown in the figure Fig 4.10 Since the core accepts both the operands at a time to perform computation, but since we have a single data bus we could access single operand at a time so a wrapper multiplier is used to read the operand one by one and provide both the operands at a time to the core unit. So the block diagram would look like that as shown in figure 4.11 Fig 4.11 4.2.6.2 State Diagram: The state machine shown in the wrapper function helps in acknowledging the control unit by writing the operands data into the registers, generating a control signal to initiate the multiplication and invoke the control unit when the operation is completed indicating the result is ready to read. The state diagram is shown in fig 4.12. The state machine is having 7 states and each state is explained in brief. Fig 4.12 42 5. Simulation: 5.1. Simulation Setup: The simulation set up is a shown in the figure 4.15. The major two components are • The design under test (Image matching Processor –DUT ) • Signal Generator: generates continuous clock signal and reset signal. All the internal signals can be monitored on dynamic simulation. The out put from the DUT indicates the coordinate having max correlation. IMAGE MATCHING PROCESSOR MODULE TEST BENCH MODULE SIGNAL GENERATOR CLK RESET MAX value coor dinates Fig 4.15 5.2 Results 45 The memory elements are printed dynamically by performing simulations. IMAGE 1:(RAM !) Real Array Imaginary Array IMAGE 2: (RAM C) FFT2(Image 1)(RAM A) 46 FFT2(Image 2) (RAM C) Convolved Image:(RAM A) IFFT2(Convolved Image) (RAM A Result Max value – 24 Coordinates (4,2) 47 Optimizing block <top2dfft> to meet ratio 100 (+ 5) of 6144 slices: WARNING:Xst:2254 - Area constraint could not be met for block <top2dfft>, final ratio is 177. FlipFlop cu/enb_cnt_m_2 has been replicated 1 time(s) Final Macro Processing... Processing Unit <top2dfft>: Found 3-bit shift register for signal <FFT/xn_im_2_0>. Unit <top2dfft> processed. ====================================================================== Final Register Report Macro Statistics # Registers : 12489 Flip-Flops : 12489 # Shift Registers : 16 3-bit shift register : 16 ====================================================================== * Partition Report * ====================================================================== Partition Implementation Status ------------------------------- No Partitions were found in this design. ------------------------------- ====================================================================== * Final Report * Final Results RTL Top Level Output File Name : top2dfft.ngr Top Level Output File Name : top2dfft Output Format : NGC Optimization Goal : Speed Keep Hierarchy : NO Design Statistics # IOs : 2 Cell Usage : # BELS : 15080 # BUF : 1 # GND : 6 # INV : 36 # LUT2 : 607 # LUT2_D : 2 # LUT3 : 626 # LUT3_D : 2 # LUT3_L : 1 # LUT4 : 7146 50 # LUT4_D : 95 # LUT4_L : 8 # MULT_AND : 5 # MUXCY : 6278 # MUXF5 : 111 # MUXF6 : 4 # MUXF7 : 2 # VCC : 5 # XORCY : 145 # FlipFlops/Latches : 13248 # FD : 46 # FD_1 : 2 # FDC : 33 # FDCE : 12370 # FDE : 631 # FDP : 5 # FDPE : 69 # FDR : 3 # FDRE : 45 # FDRS : 5 # FDS : 1 # FDSE : 5 # LD : 27 # LDE_1 : 6 # RAMS : 3 # RAMB16 : 3 # Shift Registers : 123 # SRL16 : 16 # SRL16E : 106 # SRLC16E : 1 # Clock Buffers : 2 # BUFGP : 2 # DSPs : 6 # DSP48 : 6 ========================================================================= Device utilization summary: --------------------------- Selected Device : 4vlx15sf363-12 Number of Slices: 10929 out of 6144 177% (*) Number of Slice Flip Flops: 13248 out of 12288 107% (*) Number of 4 input LUTs: 8646 out of 12288 70% Number used as logic: 8523 Number used as Shift registers: 123 Number of IOs: 2 Number of bonded IOBs: 2 out of 240 0% Number of FIFO16/RAMB16s: 3 out of 10 30% Number of GCLKs: 2 out of 32 6% Number of DSP48s: 6 out of 32 18% WARNING:Xst:1336 - (*) More than 100% of Device resources are used Timing Report: 51 Clock Information: ------------------ ------------------------------------------------------------+--------- ----------------------+-------+ Clock Signal | Clock buffer(FF name) | Load | ------------------------------------------------------------+--------- ----------------------+-------+ clk | BUFGP | 13319 | cu/inc | NONE(cu/mulrow_2) | 9 | cu/col_ch_inv(cu/col_ch_inv1:O) | NONE(*)(cu/count_1) | 10 | cu/inc1 | NONE(cu/co_rmax_3) | 9 | rst | BUFGP | 6 | cu/row_add_to_mem_f_not0001(cu/row_add_to_mem_f_not000158:O)| NONE(*)(cu/row_add_to_mem_f_0)| 8 | cu/enb_cnt_f_not0001(cu/enb_cnt_f_not0001:O) | NONE(*)(cu/enb_cnt_f_3) | 2 | cu/enable_f_not0001(cu/enable_f_not0001175_f5:O) | NONE(*)(cu/enable_f_5) | 3 | cu/NS_I_not0001(cu/NS_I_not0001115:O) | NONE(*)(cu/NS_I_1) | 6 | cu/fwd_inv_we_not0001(cu/fwd_inv_we_not0001:O) | NONE(*)(cu/fwd_inv_we) | 3 | cu/enb_Ce_or0000(cu/enb_Ce_or00001:O) | NONE(*)(cu/enb_Ce) | 1 | cu/start_not0001(cu/start_not0001_f5:O) | NONE(*)(cu/start) | 1 | cu/unload_not0001(cu/unload_not000158:O) | NONE(*)(cu/unload) | 1 | cu/scale_sch_we_not0001(cu/scale_sch_we_not000130:O) | NONE(*)(cu/scale_sch_we) | 1 | cu/flag1_not0001(cu/flag1_not00011:O) | NONE(*)(cu/flag1) | 1 | ------------------------------------------------------------+--------- ----------------------+-------+ (*) These 11 clock signal(s) are generated by combinatorial logic, and XST is not able to identify which are the primary clock signals. Please use the CLOCK_SIGNAL constraint to specify the clock signal(s) generated by combinatorial logic. INFO:Xst:2169 - HDL ADVISOR - Some clock signals were not automatically buffered by XST with BUFG/BUFR resources. Please use the buffer_type constraint in order to insert these buffers to the clock signals to help prevent skew problems. Asynchronous Control Signals Information: ---------------------------------------- --------------------------------------------+------------------------- --+-------+ Control Signal | Buffer(FF name) | Load | --------------------------------------------+------------------------- --+-------+ 52
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved