Very Large Scale Integration (VLSI)

Monday, 25 June 2012

Myths of Verilog Case Statement

case, caseZ, caseX …..!!!! it use to be very confusing for me to differentiate between these three and i use to think that whats the need of other two guy (caseZ and caseX). I hope this article will help you to understand it in better way.

The common practice is to use casez statement in RTL coding. Use of casex is strongly discouraged. Now we will discuss whether casex is such a bad construct. Before we start let me say, all case statements are synthesizable.

If someone is required to tell the differences between case, casez, casex constructs in
verilog, the answer will be the pretty familiar one:

casez treats 'z' as dont care.
casex treats 'z' & 'x' as dont care.
case treats 'z' & 'x' as it is.

Now lets go further and unearth the differences between them.

A common misconception is '?' does mean a don’t care, but it does not. It is just an another representation of high impedence 'z'.

Ex:
case (sel)
00: mux_out = mux_in[0];
01: mux_out = mux_in[1];
1?: mux_out = mux_in[2];
default: mux_out = mux_in[3];
endcase

In the above case statement if the intention is to match case item 3 to either 10 or 11 then the code is absolutely wrong, because '?' doesnot mean a dont care. The actual case item written is 1z.

Case statement:
Case statement treats x & z as it is. So a case expression containing x or z will only match a case item containing x or z at the corresponding bit positions. If no case item matches then default item is executed. Its more like pattern matching. Any pattern formed from the symbol set {0,1,x,z} will match its clone only.
Ex:
case (sel)
00: y = a;
01: y = b;
x0: y = c;
1x: y = d;
z0: y = e;
1?: y = f;
default: y = g;
endcase

Result:
sel           y           case item
00           a           00
11           g           default
xx           g           default
x0           c           x0
1z           f           1?
z1           g           default
Note that when sel is 11, y is not assigned f, but is assigned the default value g.

Casez statement:
Casez statement treats z as dont care. It mean what it sounds, 'don’t care' (dont care whether the bit is 0,1 or even x i.e, match z(?) to 0 or 1 or x or z).

Ex:
casez (sel)
00: y = a;
01: y = b;
x0: y = c;
1x: y = d;
z0: y = e;
1?: y = f;
default: y = g;
endcase

Result:
sel           y           case item
00           a           00
11           f           1?
xx           g           default
x0           c           x0 (would have matched with z0(item 5) if item 3
                           is not present.)
1z           d           1x (would have matched with z0(item 5) &
                           1? (item 6) also.)
z1           b           01 (would have matched with 1?(item 6) also.)

The fact that x matches with z in casez gives the illusion 'x is being treated as a dont care in casez'. What is exactly happening is z, the dont care, is being matched to x. This will be more clear if you admire the fact 'x will not be matched to 1 or 0 in casez'(but will match z).

The point we dicussed at the beginning that ? is not dont care is worth an explanation here. ? is dont care only when used in casez, elsewhere it is nothing but z.

Casex statement:
Casex statement treats x and z as dont cares. x will be matched to 0 or 1 or z or x and z will be matched to 0 or 1 or x or z.

Ex:
casex (sel)
00: y = a;
01: y = b;
x0: y = c;
1x: y = d;
z0: y = e;
1?: y = f;
default: y = g;
endcase

Result:
sel           y           case item
00           a           00
11           d           1x (would have matched with 1? also)
xx           a           00 (would have matched with 1? also)
x0           a           00 (would have matched with all items except 01)
1z           c           x0 (would have matched with all items except
                           00,01)
z1           b           01 (would have matched with 1x, 1? also)

The summary of the discussion so far can be put up as follows:
Statement        Expression         contains           Item contains Result
Case                Binary                Binary              Match
                       Binary                x                     Don’t match
                       Binary                ? (z)                 Don’t match
                       x                       Binary              Don’t match
                       x                       x                     Match
                       x                       ? (z)                 Don’t match
                       z (?)                  Binary               Don’t match
                       z (?)                  x                      Don’t match
                       z (?)                  ? (z)                 Match
Casez              Binary               Binary               Match
                       Binary               x                      Don’t match
                       Binary               ? (z)                  Match
                       x                      Binary               Don’t match
                       x                      x                      Match
                       x                      ? (z)                 Match
                       z (?)                 Binary               Match
                       z (?)                 x                      Match
                       z (?)                 ? (z)                 Match
Casex              Binary              Binary               Match
                      Binary               x                      Match
                      Binary               ? (z)                 Match
                      x                     Binary                Match
                      x                     x                       Match
                      x                     ? (z)                   Match
                      z (?)                Binary                 Match
                      z (?)                x                        Match
                      z (?)                ? (z)                   Match

We have been discussing the simulations aspects, now we will discuss synthesis aspects of these statements.

Ex: case (sel)
00: mux_out = mux_in[0];
01: mux_out = mux_in[1];
1?: mux_out = mux_in[2];
default: mux_out = mux_in[3];
endcase

The outputs after synthesis for different case statements of the same code is shown below:

Now let the case item 3 be changed to 1x.

Ex: case (sel)
00: mux_out = mux_in[0];
01: mux_out = mux_in[1];
1x: mux_out = mux_in[2];
default: mux_out = mux_in[3];
endcase

The outputs of synthesis will look like:

We can have the following deductions by comparing the two sets of outputs.
1. Case statement will not consider for synthesis, the items containing x or z.
2. Casez and Casex will give the same output after synthesis, treating both x, z in case items as dont cares.(seems against the nature of casez ?)

If both casez and casex give the same netlist, then why was I adviced to use casez only in RTL design? The immediate answer would be because of synthesis-simulation mismatch. If I use casez then, will there be no mismatches? The painful fact is tha t there will be mismatches.

Normally in any RTL designed to be synthesized, no case item will be expected, rather desired to be x (as no such unknown state will be present on the chip). So we wont have any case item carrying x (as shown in the below example where only 0,1,?(z) is used).

Ex: casez (sel)
00: mux_out = mux_in[0];
01: mux_out = mux_in[1];
1?: mux_out = mux_in[2];
default: mux_out = mux_in[3];
endcase

Result:
                            casez                                           casex
sel      Pre-synthesis      Post-synthesis      Pre-synthesis      Post-synthesis
xx      mux_in[3]           x                           mux_in[0]            x
1x      mux_in[2]           mux_in[2]              mux_in[2]            mux_in[2]
0x      mux_in[3]           x                           mux_in[0]            x
zz      mux_in[0]           x                            mux_in[0]            x
1z      mux_in[2]           mux_in[2]              mux_in[2]            mux_in[2]
0z      mux_in[0]           x                           mux_in[0]             x

Note:
1.Binary combinations of sel signal are not considered as its obvious that no mismatch can occur for them.
2.Post-synthesis results of casez and casex match because both have the same netlist.

If we observe the pre and post synthesis results for casez, certainly there are some mismatches. For casex also there are some mismatches. In some mismatches(when sel is xx,0x), the pre-synthesis result of casez is different from pre-synthesis result of casex, but during the match conditions(when sel is 1x,1z) both casez and casex have the same pre-synthesis results.
Another interesting, very important observation is that when ever there is a mismatch, post-synthesis result will become x. This is such a crucial observation that it will make us reach the conclusion that we can use either casez or casex, which ever we wish as opposed to the common myth to use casez only. This is explained below in a detailed manner.

During RTL simulation if sel becomes xx, casez executes default statement(which is the intended behaviour) but casex executes case item1(which is not the intended behaviour), clearly a mismatch. In netlist simulation both casez and casex results will become x, making the RTL simulation mismatch between casez and casex trivial.

In all the cases where the netlist simulation result is not x, the RTL simulation results of casez and casex will match.From this behaviour we can conclude that casex can be used as freely as casez is being used in designs.

The reason why we prefer casez(casex) to case stament is that casez(casex) has the ability to represent dont cares.

Ex: casez (sel)
000: y = a;
001: y = b;
01?: y = c;
1??: y = d;
endcase

The same logic if we implement using case statement it will be like:

case (sel)
000: y = a;
001: y = b;
010,011: y = c;
100,101,110,111: y = d;
endcase

The number of values for a single case item may increase if the case expression size increases, resulting in the loss of readability and also it will make the coding difficult. Also do remember that the synthesized netlist will be different from that of casez (casex), if the case items contain x or z(?).

Summary:
1. We can’t put ?(meaning a z) or x in case item of a case statement as synthesis tool
wont consider that case item for synthesis.
2. casez and casex will result in the same netlist.
3. casez is preferred to case, only because it has the ability to represent dont cares.
4. The most important conclusion is that casex can also be used as freely as the casez
statement.
5. If you visualize a design in terms of gates(muxes) and then derive the case items for
that logic, then they can be put inside a casez or casex statement (both will give the same
results).

Sunday, 24 June 2012

SystemVerilog Data Types

SystemVerilog offers many improved data structures compared with Verilog. Some of these were created for designers but are also useful for testbenches. In this post of SystemVerilog Tutorial you will learn about the data structures most useful for verification.

SystemVerilog introduces new data types with the following benefits.

Two-state: better performance, reduced memory usage.
Queues, dynamic and associative arrays and automatic storage: reduced memory usage, built-in support for searching and sorting
Unions and packed structures: allows multiple views of the same data
Classes and structures: support for abstract data structures
Strings: built-in string support
Enumerated types: code is easier to write and understand

Learn More about Systemverilog Data types :

Sunday, 10 June 2012

SystemVerilog Language Reference Manual (LRM)

IEEE Standard 1800™ SystemVerilog is the industry's unified hardware description and verification language (HDVL) standard. SystemVerilog is a significant evolution of the traditional Verilog hardware description language. Its use dramatically improves productivity in the development of large-gate-count, IP-based, and bus-intensive chips. SystemVerilog is targeted primarily at the chip implementation and verification flows, with powerful links to the system-level design flow. SystemVerilog has been adopted by hundreds of semiconductor design companies and is supported by more than 75 EDA, IP, and training solutions providers worldwide.

IEEE Standard 1800™-2012 SystemVerilog LRM can be downloaded through the IEEE-SA and industry support, in PDF format, at no charge from below link.

DOWNLOAD SYSTEM VERILOG LRM PDF

Thursday, 31 May 2012

8-bit Micro Processor

This is 8-bit microprocessor with 5 instructions. It is based on 8080 architecture. This architecture called SAP for Simple-As-Possible computer. It very useful design which introduces most of the basic and fundamental ideas behind computer operation.

This design could be used for instruction classes for undergraduate classes or specific VHDL classes. This processor is based on the 8080 architecture, therefore, it could be upgraded step by step to integrate further facilities. It is very exciting challenge for the students to do so. Further, they could think about building complete system, i.e. integrating and I/O peripherals to the processor.

The design is proven for ASIC and FPGA. It was implemented using Xilinx FPGA Spartan-3E starter kit. A full documentation for the code and the used resources are attached within the project.

For project details please write to us on info@vlsiencyclopedia.com

Get free daily email updates!

Floating Point Adder and Multiplier

The FP Adder is a single-precision, IEEE-754 compliant, signed adder/substractor. It includes both single-cycle and 6-stage pipelined designs. The design is fully synthesizable and has been tested in a Xilinx Virtex-II XC2V3000 FPGA, occupying 385 CLBs and with a theoretical maximum operating frequency of 6MHz for the single-cycle design and 87MHz for the pipelined design. The design was tested at 33MHz.

The FP Multiplier is a single-precision, IEEE-754 compliant, signed multiplier. It includes both single-cycle and 4-stage pipelined designs. The design is fully synthesizable and has been tested in a Xilinx Virtex-II XC2V3000 FPGA, occupying 119 CLBs and with a theoretical maximum operating frequency of 8MHz for the single-cycle design and 90MHz for the pipelined design. The design was tested at 33MHz.

Features

- IEEE-754 compliant
- 32 bits, single precision
- Works with normalized and un-normalized numbers
- Simple block design, good for FP arithmetic learning
- Adder
- 385 CLBs
- 87 MHz, 6-stage pipelined
- Multiplier
- 119 CLBs
- 90 MHz, 4-stage pipelined

PROGRESSIVE CODING FOR WAVELET-BASED IMAGE COMPRESSION

Description of the Project:-

This paper describes the hardware design flow of lifting based 2-D Forward Discrete Wavelet Transform (FDWT) processor for JPEG 2000. In order to build high quality image of JPEG 2000 codec, an effective 2-D FDWT algorithm has been performed on input image file to get the decomposed image coefficients. The Lifting Scheme reduces the number of operations execution steps to almost one-half of those needed with a conventional convolution approach. Initially, the lifting based 2-D FDWT algorithm has been developed using Mat lab. The FDWT modules were simulated using XPS(8.1i) design tools. The final design was verified with Matlab image processing tools.

Comparison of simulation results Matlab was done to verify the proper functionality of the developed module. The motivation in designing the hardware modules of the FDWT was to reduce its complexity, enhance its performance and to make it suitable development on a reconfigurable FPGA based platform for VLSI implementation. Results of the decomposition for test image validate the design. The entire system runs at 215 MHz clock frequency and reaches a speed performance suitable for several realtime applications. The result of simulation displays that lifting scheme needs less memory requirement.

Introduction :
A majority of today’s Internet bandwidth is estimated to be used for images and video. Recent multimedia applications for handheld and portable devices place a limit on the available wireless bandwidth. The bandwidth is limited even with new connection standards. JPEG image compression that is in widespread use today took several years for it to be perfected. Wavelet based techniques such as JPEG2000 for image compression has a lot more to offer than conventional methods in terms of compression ratio. Currently wavelet implementations are still under development lifecycle and are being perfected. Flexible energy-efficient hardware implementations that can handle multimedia functions such as image processing, coding and decoding are critical, especially in hand-held portable multimedia wireless devices.

Background
Data compression is, of course, a powerful, enabling technology that plays a vital role in the information age. Among the various types of data commonly transferred over networks, image and video data comprises the bulk of the bit traffic. For example, current estimates indicate that image data take up over 40% of the volume on the Internet. The explosive growth in demand for image and video data, coupled with delivery bottlenecks has kept compression technology at a premium.
Among the several compression standards available, the JPEG image compression standard is in wide spread use today. JPEG uses the Discrete Cosine Transform (DCT) as the transform, applied to 8-by-8 blocks of image data. The newer standard JPEG2000 is based on the Wavelet Transform (WT). Wavelet Transform offers multi-resolution image analysis, which appears to be well matched to the low level characteristic of human vision. The DCT is essentially unique but WT has many possible realizations. Wavelets provide us with a basis more suitable for representing images.

This is because it cans represent information at a variety of scales, with local contrast changes, as well as larger scale structures and thus is a better fit for image data.

Aim of the project
The main aim of the project is to implement and verify the image compression technique and to investigate the possibility of hardware acceleration of DWT for signal processing applications. A hardware design has to be provided to achieve high performance, in comparison to the software implementation of DWT. The goal of the project is to

. Implement this in a Hardware description language (Here VHDL).
. Perform simulation using tools such as Xilinx ISE 8.1i.
. Check the correctness and to synthesize for a Spartan 3E FPGA Kit.

The STFT represents a sort of compromise between the time- and frequency-based views of a signal. It provides some information about both when and at what frequencies a signal event occurs. However, you can only obtain this information with limited precision, and that precision is determined by the size of the window.

While the STFT compromise between time and frequency information can be useful, the drawback is that once you choose a particular size for the time window, that window is the same for all frequencies. Many signals require a more flexible approach—one where we can vary the window size to determine more accurately either time or frequency.

Problem Present in Fourier Transform : The Fundamental idea behind wavelets is to analyze according to scale. Indeed, some researchers feel that using wavelets means adopting a whole new mind-set or perspective in processing data. Wavelets are functions that satisfy certain mathematical requirements and are used in representing data or other functions. This idea is not new. Approximation using superposition of functions has existed since the early 18OOs, when Joseph Fourier discovered that he could superpose sines and cosines to represent other functions.

However, in wavelet analysis, the scale used to look at data plays a special role. Wavelet algorithms process data at different scales or resolutions. Looking at a signal (or a function) through a large “window,” gross features could be noticed. Similarly, looking at a signal through a small “window,” small features could be noticed. The result in wavelet analysis is to see both the forest and the trees, so to speak.

This makes wavelets interesting and useful. For many decades scientists have wanted more appropriate functions than the sines and cosines, which are the basis of Fourier analysis, to approximate choppy signals.’ By their definition, these functions are non-local (and stretch out to infinity). They therefore do a very poor job in approximating sharp spikes. But with wavelet analysis, we can use approximating functions that are contained neatly in finite domains. Wavelets are well-suited for approximating data with sharp discontinuities.

The wavelet analysis procedure is to adopt a wavelet prototype function, called an analyzing wavelet or mother wavelet. Temporal analysis is performed with a contracted, high-frequency version of the prototype wavelet, while frequency analysis is performed with a dilated, low-frequency version of the same wavelet. Because the original signal or function can be represented in terms of a wavelet expansion (using coefficients in a linear combination of the wavelet functions), data operations can be performed using just the corresponding wavelet coefficients.

And if wavelets best adapted to data are selected, the coefficients below a threshold is truncated, resultant data are sparsely represented. This sparse coding makes wavelets an excellent tool in the field of data compression. Other applied fields that are using wavelets include astronomy, acoustics, nuclear engineering, sub-band coding, signal and image processing, neurophysiology, music, magnetic resonance imaging, speech discrimination, optics, fractals, turbulence, earthquake prediction, radar, human vision, and pure mathematics applications such as solving partial differential equations.

Basically wavelet transform (WT) is used to analyze non-stationary signals, i.e., signals whose frequency response varies in time, as Fourier transform (FT) is not suitable for such signals. To overcome the limitation of FT, short time Fourier transform (STFT) was proposed. There is only a minor difference between STFT and FT. In STFT, the signal is divided into small segments, where these segments (portions) of the signal can be assumed to be stationary. For this purpose, a window function "w" is chosen. The width of this window in time must be equal to the segment of the signal where its still be considered stationary. By STFT, one can get time-frequency response of a signal simultaneously, which can’t be obtained by FT.

Scaling
We’ve seen the interrelation of wavelets and quadrature mirror filters. The wavelet function  is determined by the high pass filter, which also produces the details of the wavelet decomposition.
There is an additional function associated with some, but not all wavelets. This is the so-called scaling function. The scaling function is very similar to the wavelet function. It is determined by the low pass quadrature mirror that iteratively up- sampling and convolving the high pass filter produces a shape approximating the wavelet function, iteratively up-sampling and convolving the low pass filter produces a shape approximating the scaling function.We’ve already alluded to the fact that wavelet analysis produces a time-scale view of a signal and now we’re talking about scaling and shifting wavelets.

What exactly do we mean by scale in this context?
Scaling a wavelet simply means stretching (or compressing) it. To go beyond colloquial descriptions such as “stretching,” we introduce the scale factor, often denoted by the letter a.

If we’re talking about sinusoids, for example the effect of the scale factor is very easy to see:

One-Stage Decomposition
For many signals, the low-frequency content is the most important part. It is what gives the signal its identity. The high-frequency content on the other hand imparts flavor or nuance. Consider the human voice. If you remove the high-frequency components, the voice sounds different but you can still tell what’s being said. However, if you remove enough of the low-frequency components, you hear gibberish. In wavelet analysis, we often speak of approximations and details. The approximations are the high-scale, low-frequency components of the signal. The details are the low-scale, high-frequency components. The filtering process at its most basic level looks like this:

The original signal S passes through two complementary filters and emerges as two signals. Unfortunately, if we actually perform this operation on a real digital signal, we wind up with twice as much data as we started with. Suppose, for instance that the original signal S consists of 1000 samples of data. Then the resulting signals will each have 1000 samples, for a total of 2000. These signals A and D are interesting, but we get 2000 values instead of the 1000 we had. There exists a more subtle way to perform the decomposition using wavelets.

RISC Processor

A small RISC CPU (written in VHDL) that is compatible with the 12 bit opcode PIC family. Single cycle operation normally, two cycles when the program counter is modified. Clock speeds of over 40Mhz are possible when using the Xilinx Virtex optimizations.
Licensed under LGPL.

The core has a single pipeline stage and is run from a single clock, so (ignoring program counter changes) a 40Mhz clock will give 40 MIPS processing speed. Any instruction which modifies the program counter, for example a branch or skip, will result in a pipeline stall and this will only cost one additional clock cycle.

The CPU architecture chosen is not particularly FPGA friendly, for example multiplexers are generally quite expensive. The maximum combinatorial path delay is also long, so to ease the place and route tool's job the core is written at a low level. It instantiates a number of library macros, for example a 4:1 mux. Two versions of these are given, one is generic VHDL and the second is optimized for Xilinx Virtex series (including Spartan devices).