Very Large Scale Integration (VLSI): FPGA

Showing posts with label FPGA. Show all posts

Wednesday, 17 April 2013

Xilinx enables C programmable FPGAs

The Vivado Design Suite 2013.1 includes a new IP-centric design environment designed to accelerate system integration, and a set of libraries to accelerate C/C++ system-level design and high-level synthesis (HLS). The update provides a workflow that does not dictate how a design team works.

Users of Vivado HLS can access video processing functions integrated into an OpenCV environment for embedded vision running on the dual-core ARM processing system.

It delivers, says Xilinx, up to a 100X performance improvement of existing C/C++ algorithms through hardware acceleration.

Read more…

Get free daily email updates!

Tuesday, 12 March 2013

FPGA design heads into the Cloud computing

From online payments and electronic banking transactions to organizing company documents and mission-critical supply chain management systems, cloud computing now plays an ever-present role in both consumer and enterprise applications. In general, cloud computing's "pay-as-you-go" elasticity – requiring little upfront investment – tends to be its main value proposition to IT departments, although security and service disruptions are potential risks that come to mind.
But what does cloud computing mean for the FPGA design engineer? How can seemingly unlimited server resources help engineers in our daily work? This article examines the benefits and potential pitfalls of cloud computing in FPGA design from a practical, day-to-day viewpoint.

Sunday, 10 February 2013

FPGAs: An Alternative To Cloud Computing !!

As complexity intensifies within sophisticated computing, so does the demand for more computing power. On top of that, the need to mine data from the burgeoning mountain of Internet search data has led to huge data centers that must be located close to water to feed their massive equipment cooling systems.

Weather modeling, for instance, continues to drill down into smaller geographical elements to fine-tune accuracy. And longer and more sophisticated encryption keys require greater compute power to crack them.

New tasks are also emerging in fields ranging from advertising to gene sequencing. Companies in the bio-sciences area gain competitive advantage based on the speed they’re able to sequence the genes held in DNA samples. Drug companies rely heavily on computer modeling to identify suitable candidate chemical formulas that may be useful in combating diseases.

In security, the focus has turned to deep packet inspection and application-aware monitoring. Companies now routinely deploy firewalls that are able to break into individual communication streams and identify traffic specific to, say, social networking sites, which can be used to help stop malicious attacks on corporate assets.

The Server Farm Approach

Traditionally, increases in processing requirements are handled with a brute force approach: Develop server farms that simply throw more microprocessor units at a problem. The heightened clamor for these server farms creates new problems, though, such as how to bring enough power to a server room and how to remove the generated heat. Space requirements are another problem, as is the complex management of the server farm to ensure factors like optimal load balancing, in terms of guaranteeing return on investment.

At some point, the rationale for these local server farms runs out as the physical and heat problems become too large. Enter the great savior to these problems, otherwise known as “The Cloud.” In this model, big companies will hire out operating time on huge computing clusters.

In a stroke, companies can make physical problems disappear by offloading this IT requirement onto specialised companies. However, it’s not without problems:

Depending on the data, there may be a requirement for high-bandwidth communications to and from the data centre.
A third party is added into the value chain, and it will try to make money out of the service based on used computing time.
Rather than solving the power problem, it’s simply been moved—exactly the same amount of computing needs to be undertaken, just in a different place.

This final issue, which raises fundamental problems that can’t be solved with traditional processor systems, splits into two parts:

Software: Despite advances in software programming tools, optimization of algorithms for execution on multiple processors is still far away. It’s often easy to break down a problem into a number of parallel computations. However, it’s much harder for the software programmer to handle the concept of pipelining, where the output of one stage of operation is automatically passed to the next stage and acted upon. Instead, processors perform the same operation on a large array of data, pass it to memory, and then call it back from memory to perform the next operation. This creates a huge overhead on power consumption and execution time.
Hardware: Processor systems are designed to be general. A processor’s data path is typically 32 or 64 bits. The data often requires much smaller resolution, leading to large inefficiencies as gates are clocked unnecessarily. Frequently, it becomes possible to pack data to fill more of the available data width, although this is rarely optimal and adds its own overhead. In addition, the execution units of a processor aren’t optimised to the specific mathematical or data-manipulation functions being undertaken, which again leads to huge overheads.

The FPGA Approach

In the world of embedded products, a common computing-power approach is to develop dedicated hardware in an FPGA. These devices are programmed using silicon design techniques to implement processing functions much like a custom-designed chip. Many papers have been written on the relative improvements between processors, FPGAs, and dedicated hardware. Typical speed/power-consumption improvements range between a factor of 100 and 5000.

Following in that vein, a recent study performed by Plextek explored how a single FPGA could accelerate a particular form of gene sequencing. Findings revealed an increase of just under a factor of 500. This can be viewed either as a significantly shorter time period or as an equipment reduction from 500 machines to a single PC. In reality, though, the savings will be a balance of the two.

Previously, these benefits were difficult to achieve for two reasons:

Interfacing: Dedicated engineering was required to develop FPGA systems that could easily access data sets. Once the data set changes, new interfacing requirements arise, which means a renewed engineering effort.
Design cycle time: The time it takes for an algorithm engineer to explain his requirements to a digital design engineer, who must then convert it all into VHDL along with the necessary testbenches to verify the design, simply becomes too long for exploratory algorithm work.

Now both of these problems have largely been solved thanks to modern FPGA devices. The first issue is resolved with embedded processors in the FPGA, which allow for more flexible interfacing. It’s even possible to directly access FPGA devices via Ethernet or even the Internet. For example, Plextek developed FPGA implementations that don’t have to go through interface modifications any time a requirement or data set changes.

To solve the second problem, companies such as Plextek have been working closely with major FPGA manufacturers to exploit new toolsets that can convert algorithmic descriptions defined in high-level mathematical languages (e.g., Matlab) into a form that easily converts into VHDL. As a result, significant time is saved from developing extensive testbenches to verify designs. Although not completely automatic, the design flow becomes much faster and much less prone to errors.

This doesn’t remove the need for a hardware designer, although it’s possible to develop methodologies to enable a hierarchical approach to algorithm exploration. The aim is to shorten the time between initial algorithm development and final solution.

Much of the time spent during algorithm exploration involves running a wide set of parameters through essentially the same algorithm. Plextek came up with a methodology that speeds up the process by providing a parameterised FPGA platform early in the process (see the figure). The approach requires the adoption of new high-level design tools, such as Altera’s DSP Builder or Xilinx’s System Generator.

A major portion of time involved in algorithm exploration revolves around running a wide set of parameters through essentially the same algorithm. Plextek’s methodology provides a parameterized FPGA platform early in the process, which saves a significant amount of time.

A key part of the process is jointly describing the algorithm parameters that are likely to change. After they’re defined, the hardware designer can deliver a platform with super-computing power to the scientist’s local machine, one that’s tailored to the algorithm being studied. At this point, the scientist can very quickly explore parameter changes to the algorithm, often being able to explore previously time-prohibitive ideas. As the algorithm matures, some features may need updating. Though modifications to the FPGA may be required, they can be implemented much faster.

A side benefit of this approach is that the final solution, when achieved, is in a hardware form that can be easily scaled across a business. In the past, algorithm exploration may have used a farm of 100 servers, but when rolled out across a business, the server requirements could increase 10- or 100-fold, or even to thousands of machines. With FPGAs, equipment requirements will experience an orders-of-magnitude reduction.

Ultimately, companies that adopt these methodologies will achieve significant cost and power-consumption savings, as well as speed up their algorithm development flows.

Get free daily email updates!

Sunday, 13 January 2013

Full Speed Ahead For FPGA

In the world of high-frequency trading, where speed matters most, technology that can gain a crucial split-second advantage over a rival is valued above all others.

And in what could be the next phase of HFT, firms are looking more and more to hardware solutions, such as field-programmable gate array (FPGA), as it can offer speed gains on the current software used by HFT firms.

FPGA technology, which allows for an integrated circuit to be designed or configured after manufacturing, has been around for decades but has only been on the radar of HFT firms for a couple of years. But new solutions are beginning to pop up that may eventually see FPGA become more viable and be the latest must-have tool in the high-speed arms race.

For instance, a risk calculation that can take 30 microseconds to perform by a software-based algorithm takes just three microseconds with FPGA.

Current HFT platforms are typically implemented using software on computers with high-performance network adapters. However, the downside of FPGA is that it is generally complicated and time consuming to set up, as well as to re-program, as the programmer has to translate an algorithm into the design of an electronic circuit and describe that design in specialized hardware description language.

The programming space on FPGA is also limited, so programs can’t be too big currently. Although, some tasks such as ‘circuit breakers’ are an ideal current use for FPGA technology.

It is the drawbacks, as well as the costs involved, that are, at present, are holding back trading firms from taking up FPGA in greater numbers. However, because of the speed gains that it offers, much resources are being poured into FPGA in a bid to make the technology more accessible—and some technology firms are now beginning to claim significant speed savings with their products.

Cheetah Solutions, a provider of hardware solutions for financial trading, is one firm that says it can now offer reconfigurable FPGA systems to trading firms. It says its Cheetah Framework provides building blocks which can be configured in real time by a host server and an algorithm can execute entirely in an FPGA-enabled network card with the server software taking only a supervisory role by monitoring the algo’s performance and adapting the hardware algo on the go.

“True low latency will only be achieved through total hardware solutions which guarantee deterministic low latency,” said Peter Fall, chief executive of Cheetah Solutions. “But if the market moves, you want to be able to tweak an algorithm or change it completely to take advantage of current conditions. Traditional FPGA programming may take weeks to make even a simple change whereas Cheetah Framework provides on-the-fly reconfigurability.”

Another technology firm to claim that it can make automated trading strategies even faster and more efficient is U.K.-based Celoxica, which recently debuted its new futures trading platform, based on FPGA technology, which involves a circuit on one small chip that can be programmed by the customer.

Celoxica says the platform is designed to accelerate the flow of market data into trading algorithms to make trading faster. It covers multiple trading strategies and asset classes including fixed income, commodities and foreign exchange.

“For futures trading, processing speed, determinism and throughput continue to play a crucial role in the success of principle trading firms and hedge funds trading on the global futures markets,” said Jean Marc Bouleier, chairman and chief executive at Celoxica. “Our clients and partners can increase focus on their trading strategies for CME, ICE, CFE, Liffe US, Liffe and Eurex.”

While last August, Fixnetix, a U.K. trading technology firm, said that it had signed up a handful of top-tier brokers to use its FPGA hardware chip, which executes deals, compliance and risk checks, suggesting that this niche technology is picking up speed rapidly.

Wednesday, 5 December 2012

Creating a simple FPGA Project with Xilinx ISE

We would like to write this post for our friends who wants to create a simple FPGA Project with Xilinx ISE.

Software

Xilinx ISE as a software package containing a graphical IDE, design entry tools, a simulator, a synthesizer (XST) and implementation tools. Limited version of Xilinx ISE (WebPack) can be downloaded for free from the Xilinx website.

It is not mandatory to use Xilinx software for all tasks (for example, synthesis can be done with Synplify, simulation - with Modelsim etc.), but it is the easier option to start off.

The information in this article applies to Xilinx ISE version 9.2.03i, but other versions (since 8.x) shouldn't be very different. If your version is older than 8.x, you'd better upgrade.

Creating a project

To create a project, start a Project Navigator and select File->New Project. You will be asked for project name and folder. Leave "top-level source type" as HDL.

Now we should choose a target device (we will use a Spartan-3A xc3s50a device as an example) as well as set up some other options:

A dialog of creating project in Xilinx ISE

The Project Navigator window contains a sidebar, which is on the left side by default. The upper part of this sidebar lists all project files, and the lower part lists tasks that are applicable for the file selected in the upper part.

Design Entry

Now, let's add a new source file to our project. We'll start from a simple 8-bit counter, which adds 1 to its value every clock cycle. This counter will have the following ports:

CLK - input clock signal;
CLR - input asynchronous clear signal (set counter value to 0);
DOUT - output counter value (8-bit bus).

We'll define our counter as a VHDL module. VHDL language will be covered in more details in further chapters.

To create a new source file, choose "Create New Source" task and select "VHDL module" source type. The name of our module will becounter.vhd. Then you will be asked which module to associate the testbench with; choose counter.

A dialog of creating a new source file in Xilinx ISE

Let's write the following code in counter.vhd:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;

entity counter is
    Port ( CLK : in STD_LOGIC;
           CLR : in STD_LOGIC;
           DOUT : out STD_LOGIC_VECTOR (7 downto 0));
end counter;

architecture Behavioral of counter is

signal val: std_logic_vector(7 downto 0);

begin

process (CLK,CLR) is
begin
    if CLR='1' then
        val<="00000000";
    elsif rising_edge(CLK) then
        val<=val+1;
    end if;
end process;

DOUT<=val;

end Behavioral;

ISE inserted library and ports declarations automatically, we only need to write an essential part of VHDL description (inside thearchitecture block).

To check VHDL syntax, select "Synthesize - XST => Check Syntax" task for our module.

Simulation

In order to check that our code works as intended, we need to define input signals and check that output signals are correct. It can be done by creating a testbench.

To create a testbench for our counter, select "Create New Source" task, choose "VHDL Test Bench" module type and name it, for instance, counter_tb.vhd.

VHDL test bench is written in VHDL, just like a hardware device description. The difference is that a testbench can utilize some additional language constructs that aren't synthesizable and therefore cannot be used in real hardware (for example wait statements for delay definition).

In order for testbench file to be visible, choose "Behavioral Simulation" in the combobox in the upper part of the sidebar.

ISE automatically generates most of the testbench code, we need only to add our "stimulus":

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.std_logic_unsigned.all;
USE ieee.numeric_std.ALL;

ENTITY counter_tb_vhd IS
END counter_tb_vhd;

ARCHITECTURE behavior OF counter_tb_vhd IS

    -- Component Declaration for the Unit Under Test (UUT)
    COMPONENT counter
    PORT(
        CLK : IN std_logic;
        CLR : IN std_logic;
        DOUT : OUT std_logic_vector(7 downto 0)
        );
    END COMPONENT;

    --Inputs
    SIGNAL CLK : std_logic := '0';
    SIGNAL CLR : std_logic := '0';

--Outputs
SIGNAL DOUT : std_logic_vector(7 downto 0);

BEGIN

    -- Instantiate the Unit Under Test (UUT)
    uut: counter PORT MAP(
        CLK => CLK,
        CLR => CLR,
        DOUT => DOUT
    );
    -- Clock generation
    process is
    begin
        CLK<='1';
        wait for 5 ns;
        CLK<='0';
        wait for 5 ns;
    end process;

tb : PROCESS
BEGIN

CLR<='1';
wait for 100 ns;

CLR<='0';

wait; -- will wait forever
END PROCESS;

END;

We have added the clock generation process (which generates 100MHz frequency clock) and reset stimulus.

Now select a test bench source file and apply "Xilinx ISE Simulator => Simulate Behavioral Model" task. We should get something like this:

Xilinx ISE Simulator window

It can be seen that our counter works properly.

Synthesis

The next step is to convert our VHDL code into a gate-level netlist (represented in the terms of the UNISIM component library, which contains basic primitives). This process is called "synthesis". By default Xilinx ISE uses built-in synthesizer XST (Xilinx Synthesis Technology).

In order to run synthesis, one should select "Synthesis/Implementation" in the combobox in the upper part of the sidebar, select a top-level module and apply a "Synthesize - XST" task. If the code is correct, there shouldn't be any pproblems during the synthesis.

Synthesizer report contains many useful information. There is a maximum frequency estimate in the "timing summary" chapter. One should also pay attention to warnings since they can indicate hidden problems.

After a successful synthesis one can run "View RTL Schematic" task (RTL stands for register transfer level) to view a gate-level schematic produced by a synthesizer:

RTL schematic view in Xilinx ISE

Notice that an RTL schematic in question contains only one primitive: a counter, which is directly an element from the UNISIM library.

Synthesizer output is stored in NGC format.

Implementation

Implementation design flow

Translate - convert NGC netlist (represented in the terms of the UNISIM library) to NGD netlist (represented in the terms of the SIMPRIM library). The difference between these libraries is that UNISIM is intended for behavioral simulation, and SIMPRIM is a physically-oriented library (containing information about delays etc.) This conversion is performed by the program NGDBUILD and is rather straightforward. The main reason for it to be included is to convert netlist generated by different design entry methods (e.g. schematic entry, different synthesizers etc.) into one unified format.
Map is a process of mapping the NGD netlist onto the specific resources of the particular device (logic cells, RAM modules, etc.) This operation is performed by the MAP program with resutls being stored in NCD format. For Virtex-5 MAP also does placement (see below).
Place and route - as can be inferred from its name, this stage is responsible for the layout. It performs placement (logic resources distribution) and routing (connectivity resources distribution). Place and route is performed by a PAR program. For Virtex-5 devices, though, placement is performed by MAP program (and routing still by PAR program). The output of PAR is stored, again, in NCD format.

Implementation Constraints

Constraints are very important during the implementation. They define pin assignments, clocking requirements and other parameters influencing implementation. Constraints are stored in UCF format (user constraints file).

In order to add constraints one need to add a new source (using "Create New Source" task) and choose "Implementation constraints file" source type. UCF file is a text file that can be directly edited by a user, however, simple consraints can be defined with graphical interface.

When a constraints file is selected in the upper part of the sidebar, the specific tasks become available. These include "Create Timing Constraints" and "Assign Package Pins".

For example, if we specify a frequency requirement on CLK as 100 MHz, the corresponding section of the constraints file will be:

NET "CLK" TNM_NET = CLK;
TIMESPEC TS_CLK = PERIOD "CLK" 100 MHz;

When timing requirements are specified in the constraints file, the implementation tools will strive to meet them (and report an error in the case it can't be met).

Package pins constraints must also be set (according to the board layout).

MAP program converts UCF constraints to the PCF format which is later used by PAR.

There are also synthesis constraints stored in XCF files. They are used rarely and shouldn't be confused with implementation constraints.

Programming file generation

After placement and routing, a file should be generated that will be loaded into the FPGA device to program it. This task is performed by a BITGEN program.

The programming file has .bit extension.

The programming file is loaded to the FPGA using iMPACT.

Get free daily email updates!

Wednesday, 10 October 2012

Digital Logic in Analog Block - How To Test It?

Analog IP blocks these days have increasing amounts of digital control logic. With very small amounts of digital logic, it's possible to just draw gates on the schematic and run targeted tests that will hopefully catch any errors. But when you have several thousand digital gates, a new approach is needed, as I discovered in a recent discussion.

"2,000 gates is probably a good transition point where people switch from manually inserting gates in a schematic to synthesis," said Bob Melchiorre, director of field operations for digital implementation at Cadence. "Beyond 2,000 gates you have to start thinking about how you're going to test it, because you cannot guarantee that simulation or targeted testing will catch all the issues. Around 10,000 gates it starts getting completely unbearable, and you cannot write enough targeted tests to find all the faults in your digital logic."

Click here to read more ...

Get free daily email updates!

Wednesday, 19 September 2012

Programming FPGAs with Python

For everyone who wants to get started with developing for FPGA’s and dont want to waste time in learning a new programming language or they just want to use their current knowledge of programming with Python to program an FPGA, this tool is just made for you!

The Python Hardware Processsor is written in Myhdl which makes it possible to run a very small subset of python on an FPGA, it converts a very simple Python code into either VHDL or Verilog and then a hardware description can be uploaded to the FPGA:

“Due to the python nature of Myhdl and the Python Hardware Prozessor written with it, it allows you, to write a Programm for the prozessor, to simulate the Hardware Processor and to convert the Processor to a valid hardware description (VHDL or Verilog) inside a single python environment.”

MyHDL surely won’t replace VHDL or Verilog but it could be a great tool for simulation and to test the behavior of your design and it’s certainly a tool worth looking into if you want to jump into the world of FPGAs.

Feel free to discuss in the comments thread.

Get free daily email updates!

Thursday, 6 September 2012

Choosing FPGA or DSP for your Application

FPGA or DSP - The Two Solutions

The DSP is a specialised microprocessor - typically programmed in C, perhaps with assembly code for performance. It is well suited to extremely complex maths-intensive tasks, with conditional processing. It is limited in performance by the clock rate, and the number of useful operations it can do per clock. As an example, a TMS320C6201 has two multipliers and a 200MHz clock – so can achieve 400M multiplies per second.

In contrast, an FPGA is an uncommitted "sea of gates". The device is programmed by connecting the gates together to form multipliers, registers, adders and so forth. Using the Xilinx Core Generator this can be done at a block-diagram level. Many blocks can be very high level – ranging from a single gate to an FIR or FFT. Their performance is limited by the number of gates they have and the clock rate. Recent FPGAs have included Multipliers especially for performing DSP tasks more efficiently. – For example, a 1M-gate Virtex-II™ device has 40 multipliers that can operate at more than 100MHz. In comparison with the DSP this gives 4000M multiplies per second.

Where They Excel

When sample rates grow above a few Mhz, a DSP has to work very hard to transfer the data without any loss. This is because the processor must use shared resources like memory busses, or even the processor core which can be prevented from taking interrupts for some time. An FPGA on the other hand dedicates logic for receiving the data, so can maintain high rates of I/O.

A DSP is optimised for use of external memory, so a large data set can be used in the processing. FPGAs have a limited amount of internal storage so need to operate on smaller data sets. However FPGA modules with external memory can be used to eliminate this restriction.

A DSP is designed to offer simple re-use of the processing units, for example a multiplier used for calculating an FIR can be re-used by another routine that calculates FFTs. This is much more difficult to achieve in an FPGA, but in general there will be more multipliers available in the FPGA.

If a major context switch is required, the DSP can implement this by branching to a new part of the program. In contrast, an FPGA needs to build dedicated resources for each configuration. If the configurations are small, then several can exist in the FPGA at the same time. Larger configurations mean the FPGA needs to be reconfigured – a process which can take some time.

The DSP can take a standard C program and run it. This C code can have a high level of branching and decision making – for example, the protocol stacks of communications systems. This is difficult to implement within an FPGA.

Most signal processing systems start life as a block diagram of some sort. Actually translating the block diagram to the FPGA may well be simpler than converting it to C code for the DSP.

Making a Choice

There are a number of elements to the design of most signal processing systems, not least the expertise and background of the engineers working on the project. These all have an impact on the best choice of implementation. In addition, consider the resources available – in many cases, I/O modules have FPGAs on board. Using these with a DSP processor may provide an ideal split.

As a rough guideline, try answering these questions:

What is the sampling rate of this part of the system? If it is more than a few MHz, FPGA is the natural choice.
Is your system already coded in C? If so, a DSP may implement it directly. It may not be the highest performance solution, but it will be quick to develop.
What is the data rate of the system? If it is more than perhaps 20-30Mbyte/second, then FPGA will handle it better.
How many conditional operations are there? If there are none, FPGA is perfect. If there are many, a software implementation may be better.
Does your system use floating point? If so, this is a factor in favour of the programmable DSP. None of the Xilinx cores support floating point today, although you can construct your own.
Are libraries available for what you want to do? Both DSP & FPGA offer libraries for basic building blocks like FIRs or FFTs. However, more complex components may not be available, and this could sway your decision to one approach or the other.

In reality, most systems are made up of many blocks. Some of those blocks are best implemented in FPGA, others in DSP. Lower sampling rates and increased complexity suit the DSP approach; higher sampling rates, especially combined with rigid, repetitive tasks, suit the FPGA.

Some Examples

Here are a few examples of signal processing blocks, along with how we would implement them:

First decimation filter in a digital wireless receiver. Typically, this is a CIC filter, operating at a sample rate of 50-100MHz. A 5-stage CIC has 10 registers & 10 adds, giving an "add rate" of 500-1000MHz.
At these rates any DSP processor would find it extremely difficult to do anything. However, the CIC has an extremely simple structure, and implementing it in an FPGA would be easy. A sample rate of 100MHz should be achievable, and even the smallest FPGA will have a lot of resource left for further processing.
Communications Protocol Stack – ISDN, IEEE1394 etc; these are complex large pieces of C code, completely unsuitable for the FPGA. However the DSP will implement them easily. Not only that, a single code base can be maintained, allowing the code stack to be implemented on a DSP in one product, or a separate control processor in another; and bringing the opportunity to licence the code stack from a specialist supplier.
Digital radio receiver – baseband processing. Some receiver types would require FFTs for signal acquisition, then matched filters once a signal is acquired. Both blocks can be easily implemented by either approach. However, there is a mode change – from signal acquisition to signal reception.
It may well be that this is better suited to the DSP, as the FPGA would need to implement both blocks simultaneously. Note that the RF processing is better in an FPGA, so this is likely to be a mixed system.
(Note – with today’s larger FPGAs, both modes of this system could be included in the FPGA at the same time.)
Image processing. Here, most of the operations on an image are simple and very repetitive – best implemented in an FPGA. However, an imaging pipeline is often used to identify "blobs" or "Regions of Interest" in an object being inspected. These blobs can be of varying sizes, and subsequent processing tends to be more complex. The algorithms used are often adaptive, depending on what the blob turns out to be… so a DSP-based approach may be better for the back end of the imaging pipeline.

Summary

FPGA and DSP represent two very different approaches to signal processing – each good at different things. There are many high sampling rate applications that an FPGA does easily, while the DSP could not. Equally, there are many complex software problems that the FPGA cannot address.

Feature Detection on FPGA

Autonomous landing and roving on the Moon require fast computation. Space tolerant processors are insufficient for processing computer vision algorithms to land safely on the surface of the Moon. Field Programmable Gate Arrays (FPGAs) are capable of parallelizing the computations that a processor executes serially. An FPGA programmer has the ability of directly programming the logic fabric of the board. A processor is restricted to a set of assembly commands which it fetches, decodes, and executes in order to run a program. The video above executes several blurs on FPGA. Blurring is required for the Scale Invariant Feature Transform (SIFT) feature detection algorithm that is currently being developed.

Wednesday, 29 August 2012

Altera’s 28-nm Stratix V FPGAs- Contains Industry’s Fastest Backplane-capable Transceivers

San Jose, Calif., July 31, 2012—Altera Corporation (Nasdaq: ALTR) today announced it is shipping in volume production the FPGA industry’s highest performance backplane-capable transceivers. Altera’s Stratix® V FPGAs are the industry’s only FPGAs to offer 14.1 Gbps transceiver bandwidth and are the only FPGAs capable of supporting the latest generation of the Fibre Channel protocol (16GFC). Developers of backplanes, switches, data centers, cloud computing applications, test and measurement systems and storage area networks can achieve significantly higher data rate speeds as well as rapid storage and retrieval of information by leveraging Altera’s latest generation 28-nm high-performance FPGA. For OTN (optical transport network) applications, Stratix V FPGAs allow carriers to scale quickly to support the tremendous growth of traffic on their networks.

Altera started shipping engineering samples of 28-nm FPGAs featuring integrated 14.1 Gbps transceivers over one year ago. These high-performance devices are the latest in Altera’s 28-nm FPGA portfolio to ship in volume production. The transceivers in Stratix V GX and Stratix V GS FPGAs deliver high system bandwidth (up to 66 lanes operating up to 14.1 Gbps) at the lowest power consumption (under 200 mW per channel). Transceivers in Altera’s FPGAs are equipped with advanced equalization circuit blocks, including low-power CTLE (continuous time linear equalization), DFE (decision feedback equalization), and a variety of other signal conditioning features for optimal signal integrity to support backplane, optical module, and chip-to-chip applications. This advanced signal conditioning circuitry enables direct drive of 10GBASE-KR backplanes using Stratix V FPGAs.

“Developers of next-generation protocols need to leverage the latest test equipment that integrates the latest technologies,” said Michael Romm, vice president of product development, at LeCroy Protocol Solutions Group, a leading manufacturer of test and measurement equipment. “Altera’s latest family of 28-nm FPGAs gives us the capability to build the most sophisticated and advanced test equipment so our customers can rapidly develop and bring to market their next-generation systems.”

Published by Altera, click here to read the whole article.

Monday, 2 January 2012

The glitch that stole the FPGA's energy efficiency

Field-programmable gate arrays (FPGAs) are notorious for high power consumption. They are hard to power down in the same way as custom logic - so they have considerable static power consumption - and they use a lot more gates to achieve the same job with their greater flexibility.

However, a good proportion of an FPGA's power consumption is avoidable. A 2007 study carried out by researchers at the University of British Columbia and published in IEEE Transactions on VLSI Systems found that up to three quarters of the dynamic power consumption could be ascribed to glitches rather than actual functional state transitions for some types of circuit.

The heart of the problem lies with timing: early-arriving signals can drive outputs to the wrong state before the situation is 'corrected' by later signals and before the final state is ready for sampling at the next clock transition. When you consider the large die size of FPGAs relative to custom logic, it is not hard to see why delays can be so large between signals.

UBC's Julien Lamoureux and colleagues recommended the use of delay elements to align signals in time to reduce these glitch events.

At the International Symposium on Low Power Electronic Design earlier this year, Warren Shum and Jason Anderson of the University of Toronto proposed an alternative: making use of the don't care conditions used in logic synthesis to also filter out potential glitches:

"This process is performed after placement and routing, using timing simulation data to guide the algorithm...Since the placement and routing are maintained, this optimization has zero cost in terms of area and delay, and can be executed after timing closure is completed."

The alterations are made in the LUTs iteratively to create new truth tables that will reduce the number of glitch transitions during operation, borrowing some concepts from asynchronous design where glitches are considered actively dangerous rather than inconvenient. On benchmark circuits, the technique reduced glitch power by around 14 per cent on average and up to half in some cases.

Very Large Scale Integration (VLSI)

Featured post

Top 5 books to refer for a VHDL beginner

Wednesday, 17 April 2013

Xilinx enables C programmable FPGAs

Tuesday, 12 March 2013

FPGA design heads into the Cloud computing

Sunday, 10 February 2013

FPGAs: An Alternative To Cloud Computing !!

Sunday, 13 January 2013

Full Speed Ahead For FPGA

Wednesday, 5 December 2012

Creating a simple FPGA Project with Xilinx ISE

Software

Creating a project

Design Entry

Simulation

Synthesis

Implementation

Implementation design flow

Implementation Constraints

Programming file generation

Wednesday, 10 October 2012

Digital Logic in Analog Block - How To Test It?

Wednesday, 19 September 2012

Programming FPGAs with Python

Thursday, 6 September 2012

Choosing FPGA or DSP for your Application

FPGA or DSP - The Two Solutions

Where They Excel

Making a Choice

As a rough guideline, try answering these questions:

Some Examples

Summary

Feature Detection on FPGA

Wednesday, 29 August 2012

Altera’s 28-nm Stratix V FPGAs- Contains Industry’s Fastest Backplane-capable Transceivers

Monday, 2 January 2012

The glitch that stole the FPGA's energy efficiency

Followers