Very Large Scale Integration (VLSI)

Monday, 5 November 2012

Intel's 335 Series SSD reviewed

SSDs have come a long way since Intel released its first, the X25-M, a little more than four years ago. That drive was a revelation, but it wasn't universally faster than the mechanical hard drives of the era. The X25-M was also horrendously expensive; it cost nearly $600 yet offered just 80GB of capacity, which works out to about $7.50 per gigabyte.

My, how things have changed.

Solid-state drives gotten a lot faster in the last few years. They're already pushing up against the throughput ceiling of the 6Gbps Serial ATA interface, leaving mechanical hard drives in the dust. I can't remember the last time we saw an HDD score better than an SSD in one of our performance tests.

More importantly, SSDs have become a lot more affordable. Today, you can get 80GB by spending $100. The sweet spot in the market is the 240-256GB range, where SSDs can be had for around $200—less than a dollar per gigabyte. Rabid competition between drive makers deserves some credit for falling prices, particularly in recent years. Moore's Law is the real driving factor behind the trend, though. The X25-M's NAND chips were built using a 50-nm process, while the new Intel 335 Series uses flash fabricated on a much smaller 20-nm process.

Designed for enthusiasts and DIY system builders, the 335 Series is aimed squarely at the sweet spot in the market with a 240GB model priced at $184. That's just 77 cents per gig, a tenfold reduction in cost in just four years. The price is right, but what about the performance? We've run Intel's latest through our usual gauntlet of tests to see how it stacks up against the most popular SSDs around.

Die shrinkin'
Intel and Micron have been jointly manufacturing flash memory since 2006 under the name IM Flash Technologies. The pair started with 72-nm NAND flash before moving on to the 50-nm chips used in the X25-M. The next fabrication node was 34 nm, which produced the chips used in the second-generation X25-M and the Intel 510 Series. 25-nm NAND found its way into the third-gen X25-M, otherwise known as the 320 Series, in addition to the 330 and 520 Series. Now, the Intel 335 Series has become the first SSD to use IMFT's 20-nm MLC NAND.

Building NAND on finer fabrication nodes allows more transistors to be squeezed into the same unit area. In addition to accommodating more dies per wafer, this shrinkage can allow more capacity per die. The 34-nm NAND used in the Intel 510 Series offered 4GB per die, with each die measuring 172 mm². When IMFT moved to 25-nm production for the 320 Series, the per-die capacity doubled to 8GB, while the die size shrunk slightly to 167 mm².

Two 4GB 34-nm dies, one 8GB 25-nm die, and the new 8GB 20-nm die. Source: Intel

The Intel 335 Series' 20-nm NAND crams 8GB onto a die measuring just 118 mm². That's not the doubling of bit density we enjoyed in the last transition, but it still amounts to a 29% reduction in die size for the same capacity. Based on how those dies fit onto each wafer, Intel says 20-nm production increases the "gigabyte capacity" of its flash fabs by approximately 50%. IMFT has been mass-producing these chips since December of last year.

As NAND processes shrink, the individual cells holding 1s and 0s get closer together. Closer proximity can increase the interference between the cells, which can degrade both the performance and the endurance of the NAND. Intel's solution to this problem is a planar cell structure with a floating, high-k/metal gate stack. This advanced cell design is purportedly the first of its kind in the flash industry, and Intel claims it delivers performance and reliability comparable to IMFT's 25-nm NAND. Indeed, Intel's performance and endurance specifications for the 335 Series 240GB exactly match those of its 25-nm sibling in the 330 Series.

Intel says the 335 Series 240GB can push sequential read and write speeds of 500 and 450MB/s, respectively. 4KB random read/write IOps are pegged at 42,000/52,000. Thanks to the lower power consumption of its 20-nm flash, the new drive should be able to hit those targets while consuming less power than its predecessor. The 335 Series is rated for power consumption of 275 mW at idle and 350 mW when active, less than half the 600/850 mW ratings of its 25-nm counterpart.

On the endurance front, Intel's new hotness can supposedly withstand 20GB of writes per day for three years, just like the 330 Series. As one might expect, the drive is covered by a three-year warranty. Intel reserves its five-year SSD warranties for the 320 and 520 Series, whose high-endurance NAND is cherry-picked off the standard 25-nm production line. I suspect it will take Intel some time to bin enough higher-grade, 20-nm NAND to fuel upgrades to those other models.

Our performance results will illustrate how the 335 Series compares up to those other Intel SSDs. Expect the 320 Series to be much slower due to its 3Gbps Serial ATA interface. That drive's Intel flash controller can trace its roots back to the original X25-M, so the design is a little long in the tooth. The 520 Series, however, has a 6Gbps interface and higher performance specifications than the 335 Series. The two are based on the same SandForce controller silicon, though.

Get free daily email updates!

Intel’s 335 Series SSD

Intel has revved up its mainstream SSD line from the Series 330 to the Series 335, and the company sent over a 240GB model for evaluation (and 240GB is apparently the only capacity it is launching this series with). The new drives feature 20nm NAND flash memory, compared with the 25nm chips in the older series, but Intel continues to use an LSI/SandForce SF-2281 controller with custom Intel firmware. The company uses the same controller in its Series 330 and Series 520 drives.

But what may be of most interest to consumers is that the Series 335 is significantly cheaper per gigabyte: Intel expects this 240GB drive to cost about the same as a 180GB Series 330. And while the product was officially embargoed until 8:30 a.m. on October 29, we saw it listed for sale online the evening of October 28 at prices between $184 and $225, including shipping.

Like its most recent predecessors, the Series 335 is outfitted with a SATA revision 3.0 (6gbits/s) interface, and the drive comes housed inside a 2.5-inch enclosure that is 9.5mm thick. That thick profile renders it unsuitable for many current ultraportables; however, the stout of heart can easily remove the board from its enclosure and fit it inside a thinner case or install it directly into a vacant drive bay (although doing either will likely void Intel’s three-year warranty).

Here are some results of 10GB copy and read tests. Keeping in mind that our current test bed uses a 7200-rpm hard drive to feed and read data from our test subjects, the 335 performed very well. It wrote our 10GB mix of files and folders at 93.2MBps and read them at 57.9MBps; and it wrote our single 10GB file at 124.1MBps while reading it at 129.8MBps.

Intel 335 Series SSD Features and Specifications:

CAPACITY: 240GB
COMPONENTS:
- Intel 20nm NAND Flash Memory
- Multi-Level Cell (MLC)
FORM FACTOR: 2.4-inch
THICKNESS: 9.5mm
WEIGHT: Up to 78 grams
SATA 6Gbps BANDWIDTH PERFORMANCE (IOM QD32):
- SUSTAINED SEQ READ: 500 MB/s
- SUSTAINED SEQ WRITE: 450 MB/s
READ & WRITE IOPS (IOM QD32):
- RANDOM 4KB READS: Up to 42,000 IOPS
- RANDOM 4KB WRITES: Up to 52,000 IOPS
COMPATIBILITY:
- Intel SSD Toolbox w/SSD Optimizer
- Intel Data Migration Software
- Intel Rapid Storage Technology
- Intel 6 Series Express Chipsets (w/ SATA 6Gpbs)
- SATA Revision 3.0
- ACS-2 (ATA/ATAPI Command Set 2)
- Limited SMART ATA Feature Set
- Native Command Queuing (NCQ) Command Set
- Data Set Management Command Trim Attribute
POWER MANAGEMENT:
- 5 V SATA Supply Rail
- SATA Link Power Management (LPM)
POWER:
- Active (MobileMark 2007 Workload: 350 mW (TYP)
- Idle: 275 mW (TYP)
TEMPERATURE:
- Operating: 0°C to 70°C
- Non-Operating: -55°C to 95°C
CERTIFICATIONS & DECLARATION:
- UL
- CE
- C-Tick
- BSMI
- KCC
- Microsoft WHQL
- VCCI
- SATA-IO
PRODUCT ECOLOGICAL COMPLIANCE:
- RoHS

Get free daily email updates!

Tuesday, 30 October 2012

IBM's New Chip Tech.

IBM has put the chip industry on notice by inventing a new technology that would replace silicon with a new material, carbon nanotubes.

IBM has found a new way to put what seems like an impossibly large number of transistors into an insanely small area, the width of only a few atoms. That's 10,000 times thinner than a strand of human hair and less than half the size of the leading silicon technology.

Or as IBM explains:

Carbon nanotubes are single atomic sheets of carbon rolled up into a tube. The carbon nanotube forms the core of a transistor device that will work in a fashion similar to the current silicon transistor, but will be better performing. They could be used to replace the transistors in chips that power our data-crunching servers, high performing computers and ultra fast smart phones.

Inventing the tech is one thing, being able to manufacture it at scale is another. And that's the real breakthrough IBM announced. It has put more than 10,000 of these "nano-sized tubes of carbon" onto single chip using a standard fabricating method.

It will still be years, maybe even a decade, before carbon nanotubes would really replace silicon-based chips in our servers and our smartphones. But this breakthrough is important because the chip industry is reaching a point where it physically can't squeeze much more processing power onto existing forms of chips. Some have predicted that we'll soon reach an end to Moore's Law which tries to double the density of chips on a wafer every two years.

Chip transistors are already super tiny—or nanoscale.

This is what a nanotube looks like under a microscope.

Earlier this year Intel dumped $4.1 billion into two new techniques to help the chip industry continue to get more powerful at smaller scales. These two new technologies are not the same as what IBM is working on.

IBM's carbon-based method may represent a whole new beginning for Moore's Law, the industry maxim that chips keep getting cheaper, more powerful, and smaller.

Wednesday, 24 October 2012

Combinational Loop in Design

Combinational loops are logical structures that contain no synchronous feedback element. This kind of loops cause stability and reliability problemas, as we will see in this article, violating the synchronous principles by making feedback with no register in the loop.

WHY? HOW? IS GENERATED A COMBINATIONAL LOOP?

Basically, a combinational loop es implemented in hardware (gates) when in the written VHDL code describing combinational logic a signal that is in the left side of an assignment statement (that is, to the left of the <= symbol) it also is on the expression at the right side of the signal assignment statement (right of <=). For example the following lines of code generate a combinational loop, as long as they are written in a combinational process or in a concurrent signal assignment statement.

1 acc <= acc + data;
2
3 Z <= Z nand B;
4
5 cnt <= cnt + 1;

However, it's important to point out that if these same statements are written in a clocked process, each of them will generate the respective sequential logic. This is due to the fact that the signal assignment statement in clocked process will generate a register for the assigned signal, therefore the loop will be registered in this case, therefore no combinational loop is generated.

HARDWARE

The following figure shows a diagram of a combinational loop.

As it is shown in the figure, the combinational logic output is fedbacked to the same combinational logic without any register in the loop. The logic between the first input an the last output can be made up of one or several levels of combinational logic. It can also have different signals coming in and coming out of that piece of logic, but at least one of the signal is going back (feedback) to the first logic level, as it can be seen in the following figure.

This kind of logic circuit usually is not desired, no wanted to be implemented. Hence, when the synthesis tool finds out about this combinational loop generates a warning message.

Here is an example of VHDL code that generate a combinational loop when is implemented.

1 library ieee;
2 use ieee.std_logic_1164.all;
3
4 entity lazo_comb is
5 port(
6     a: in std_logic;
7     z: out std_logic);
8 end lazo_comb;
9
10 architecture beh of lazo_comb is
11
12 signal y: std_logic;
13
14 begin
15     z <= y;
16
17 process(a,y)
18 begin
19     y <= y nand a;
20 end process;
21
22 end beh;

The synthesis tool, Synplify in this case, generates the following warning regarding the combinational loop.

The warning message "found combinational loop at 'y'" means that the signal 'y' is feed-backed to the input of the combinational logic without any register in the loop. This loop can be easily found when seeing the RTL view of the synthesized system, as it can be seen in the following figure.

SIMULATION

The simulation of the system (very simple system) is shown in the following figure.

The ModelSim windows details a lot of information that deserve a detailed analysis. First of all, the top window plots the waveforms of the signals from the described system, whose main expression is in the line 24 of the middle window. The bottom window, Transcript window, generates an error message, saying that the limit of iterations has been reached at 50ns and no stable value has been gotten. In other words, this mean that the system has began to oscillated and remained oscillating. The maximum number of iterations is configurable in ModelSim (Simulate->Runtime Options); as it's in most simulators. By default this value is set to 5000. Another important piece of information can be found in the bottom of the waveform window. There you can read that the number of Delta reached 5000, which is exactly the number maximum of iterations set in the runtime options, and even after that amount of deltas the system is not stable.

Why this simple logic is oscillating?
Well, analyzing the true table of the nand gate, while one of the input is stuck at '0', the output will be always '1'. That is happening in the simulation shown above. Whereas, when the input (signal a in the simulation) tries to change to '1', due to the fact the other input is still at '1' the output change to '0', then since the feedback input is '0', the output should go to '1', then that '1' is going back with the other input at '1', the output will go to '0' again, and so on...This is what is called an "unstable combinational loop". This kind of loop should NEVER be used in a real design.

Other point to bring out on this example is the importance of simulating a system. Assuming that we configured the FPGA without any simulation, based on the fact that the synthesis tool just gave us a 'warning'), we'd see an no stable output, spending some time (maybe a lot of time) trying to find out why the output is not stable. Conversely, by doing the simulation the problem would appear at first shot.

CODE STYLE

In designs with a large, very large, amount of code lines it is very easy to make mistakes and generate a combinational loop with no intention (as it can be seen in the example above). So, follow certain order when writting the code, trying to maintain a certain flow of data. Also, take a close look at the warnings generated by the synthesis tool.

In case you deliberately want to implement a combinational loop, write a detailed description of the reason for doing that, and also write a comment in the constraint file. The reason for this last point is due to the fact that the Static Timing Analysis tool (STA) usually increase the minimum period of the system when it founds a combinational loop. Therefore, in this case you should tell to the STA tool to 'ignore' that particular path. The syntax for ignoring a path is 'set_false_path' for the Quartus (Altera) software, and for the ISE (Xilinx) you should use TIG with its resepctivs syntax in both cases.

Get free daily email updates!

Intel's Haswell chips coming into your PC in first half of next year

Laptops and desktops with Intel's next-generation Core processor, code-named Haswell, will be available in the first half of next year, Intel CEO Paul Otellini said during a financial conference call on Tuesday.

The Haswell chip will succeed current Core processors code-named Ivy Bridge,which became widely available in April. Intel has said that Haswell will deliver twice the performance of Ivy Bridge, and in some cases will double the battery life of ultrabooks, which are a new category of thin and light laptops with battery life of roughly six to eight hours.

Intel shed some light on Haswell at its Intel Developer Forum trade show in September, saying its power consumption had been cut to the point where the chips could be used in tablets. Haswell chips will draw a minimum of 10 watts of power, while Ivy Bridge's lowest power draw is 17 watts. Intel has splintered future Haswell chips into two families: 10-watt chips for ultrabooks that double as tablets, and 15-watt and 17-watt chips designed for other ultrabooks and laptops.

Haswell will be "qualified for sale" in the first half of 2013, said Stacy Smith, chief financial officer at Intel, during the conference call. Chips go through a qualification process internally and externally, after which Intel can put the chip into production.

The Haswell chip could provide a spark to the ultrabook segment, which has stagnated in a slumping PC market. Worldwide PC shipments dropped between 8 percent and 9 percent during the third quarter, according to research firms IDC and Gartner. They said ultrabook sales were lower than expected due to high prices and soft demand for consumer products.

Many ultrabook models with Ivy Bridge processors are expected to ship in the coming weeks with the launch of Windows 8, which is Microsoft's first touch-centric OS. Otellini said more than 140 Core-based ultrabooks will be in the market, of which 40 will have touch capabilities. A few models -- between five and eight -- will be convertible ultrabooks that can also function as tablets. A majority of the ultrabooks will have prices either at or above US$699, with a few models perhaps priced lower, Otellini said.

The new graphics processor in Haswell will support 4K graphics, allowing for a resolution of 4096 by 3072 pixels. Ultrabooks with Haswell will also include wireless charging, NFC capabilities, voice interaction and more security features.

Otellini said Intel can't tell how the segment will perform in the coming quarter. A number of factors needed to be considered including Microsoft's Windows 8 and the launch of new ultrabooks, he said. Intel reported a profit and revenue decline in the third fiscal quarter of 2012.

"We saw a softening in the consumer segments" in the third fiscal quarter, Otellini said. "The surprise there was China, which was strong, [but] turned weak on us."

Tablets have changed the way people use computers, and Microsoft is bringing touch to mainstream PCs for the first time with Windows 8, Otellini said. PCs with Windows 8 are expected to ship later this month, and it's hard to predict what the response will be until people go out and play with the devices and the OS, Otellini said.

"I see the computing market in a period of transition," with an opportunity for breakthroughs in research and creativity, Otellini said. New usage models for laptops are emerging with detachable touchscreens, voice recognition and other features, and Intel is trying to tap into those opportunities, Otellini said.

The company has a history of overcoming slumps through research and innovation, Otellini said.

Get free daily email updates!

Sunday, 14 October 2012

Cyclic Redundancy Check - CRC

CRC Example

Error detection is an important part of communication systems when there is a chance of data getting corrupted. Whether it’s a piece of stored code or a data transmission, you can add a piece of redundant information to validate the data and protect it against corruption. Cyclic redundancy checking is a robust error-checking algorithm, which is commonly used to detect errors either in data transmission or data storage. In this multipart article we explain a few basic principles.

Modulo two arithmetic is simple single-bit binary arithmetic with all carries or borrows ignored. Each digit is considered independently. This article talks about how modulo two addition is equivalent to modulo two subtraction, and can be performed using an exclusive OR operation followed by a brief on Polynomial division where remainder forms the CRC checksum.

For example, we can add two binary numbers X and Y as follows:

10101001 (X) + 00111010 (Y) = 10010011 (Z)

From this example the modulo two addition is equivalent to an exclusive OR operation. What is less obvious is that modulo two subtraction gives the same results as an addition.

From the previous example let’s add X and Z:
10101001 (X) + 10010011 (Z) = 00111010 (Y)

In our previous example we have seen how X + Y = Z therefore Y = Z – X, but the example above shows that Z+X = Y also, hence modulo two addition is equivalent to modulo two subtraction, and can be performed using an exclusive OR operation.

In integer division dividing A by B will result in a quotient Q, and a remainder R. Polynomial division is similar except that when A and B are polynomials, the remainder is a polynomial, whose degree is less than B.

The key point here is that any change to the polynomial A causes a change to the remainder R. This behavior forms the basis of the cyclic redundancy checking.

If we consider a polynomial, whose coefficients are zeros and ones (modulo two), this polynomial can be easily represented by its coefficients as binary powers of two.

In terms of cyclic redundancy calculations, the polynomial A would be the binary message string or data and polynomial B would be the generator polynomial. The remainder R would be the cyclic redundancy checksum. If the data changed or became corrupt, then a different remainder would be calculated.

Although the algorithm for cyclic redundancy calculations looks complicated, it only involves shifting and exclusive OR operations. Using modulo two arithmetic, division is just a shift operation and subtraction is an exclusive OR operation.

Cyclic redundancy calculations can therefore be efficiently implemented in hardware, using a shift register modified with XOR gates. The shift register should have the same number of bits as the degree of the generator polynomial and an XOR gate at each bit, where the generator polynomial coefficient is one.

Augmentation is a technique used to produce a null CRC result, while preserving both the original data and the CRC checksum. In communication systems using cyclic redundancy checking, it would be desirable to obtain a null CRC result for each transmission, as the simplified verification will help to speed up the data handling.

Traditionally, a null CRC result is generated by adding the cyclic redundancy checksum to the data, and calculating the CRC on the new data. While this simplifies the verification, it has the unfortunate side effect of changing the data. Any node receiving the data+CRC result will be able to verify that no corruption has occurred, but will be unable to extract the original data, because the checksum is not known. This can be overcome by transmitting the checksum along with the modified data, but any data-handling advantage gained in the verification process is offset by the additional steps needed to recover the original data.

Augmentation allows the data to be transmitted along with its checksum, and still obtain a null CRC result. As explained before when obtain a null CRC result, the data changes, when the checksum is added. Augmentation avoids this by shifting the data left or augmenting it with a number of zeros, equivalent to the degree of the generator polynomial. When the CRC result for the shifted data is added, both the original data and the checksum are preserved.

In this example, our generator polynomial (x3 + x2 + 1 or 1101) is of degree 3, so the data (0xD6B5) is shifted to the left by three places or augmented by three zeros.
0xD6B5 = 1101011010110101 becomes 0x6B5A8 = 1101011010110101000.

Note that the original data is still present within the augmented data.

0x6B5A8 = 1101011010110101000
Data = D6B5 Augmentation = 000

Calculating the CRC result for the augmented data (0x6B5A8) using our generator polynomial (1101), gives a remainder of 101 (degree 2). If we add this to the augmented data, we get:

0x6B5A8 + 0b101 = 1101011010110101000 + 101
= 1101011010110101101
= 0x6B5AD

As discussed before, calculating the cyclic redundancy checksum for 0x6B5AD will result in a null checksum, simplifying the verification. What is less apparent is that the original data is still preserved intact.

0x6B5AD = 1101011010110101101
Data = D6B5 CRC = 101

The degree of the remainder or cyclic redundancy checksum is always less than the degree of the generator polynomial. By augmenting the data with a number of zeros equivalent to the degree of the generator polynomial, we ensure that the addition of the checksum does not affect the augmented data.

In any communications system using cyclic redundancy checking, the same generator polynomial will be used by both transmitting and receiving nodes to generate checksums and verify data. As the receiving node knows the degree of the generator polynomial, it is a simple task for it to verify the transmission by calculating the checksum and testing for zero, and then extract the data by discarding the last three bits.

Thus augmentation preserves the data, while allowing a null cyclic redundancy checksum for faster verification and data handling.

Get free daily email updates!

Saturday, 13 October 2012

VHDL- Delta Delay

VHDL allows the designer to describe systems at various levels of abstraction. As such, timing and delay information may not always be included in a VHDL description.

A delta (or delta cycle) is essentially an infinitesimal, but quantized, unit of time. The delta delay mechanism is used to provide a minimum delay in a signal assignment statement so that the simulation cycle described earlier can operate correctly when signal assignment statements do not include explicitly specified delays. That is:

1) all active processes can execute in the same simulation cycle

2) each active process will suspend at wait statement

3) when all processes are suspended simulation is advanced the minimum time necessary so that some signals can take on their new values

4) processes then determine if the new signal values satisfy the conditions to proceed from the wait statement at which they are suspended

Get free daily email updates!

Very Large Scale Integration (VLSI)

Featured post

Top 5 books to refer for a VHDL beginner

Monday, 5 November 2012

Intel's 335 Series SSD reviewed

Intel’s 335 Series SSD

Intel 335 Series SSD Features and Specifications:

Tuesday, 30 October 2012

IBM's New Chip Tech.

Wednesday, 24 October 2012

Combinational Loop in Design

HARDWARE

SIMULATION

CODE STYLE

Intel's Haswell chips coming into your PC in first half of next year

Sunday, 14 October 2012

Cyclic Redundancy Check - CRC

Saturday, 13 October 2012

VHDL- Delta Delay

Followers