Very Large Scale Integration (VLSI): DDR4

Showing posts with label DDR4. Show all posts

Thursday, 14 February 2013

How and why DDR4 timing is important

JEDEC's DDR4 DRAM standard is compatible with 3DIC architectures and is capable of data transfer rates up to 3.2 gigatransfers per second, Kristin Lewotsky notes in this article. "We've got a broad population of folks who really haven't had the time or the business need to learn about DDR4," says Perry Keller of Agilent Technologies. "What we hope to do is familiarize them with DDR4: What it is, why it exists, what it can bring to their products, and how to do something practical with it." EE Times

Sunday, 7 October 2012

DDR4 SDRAM Standards published by JEDEC

The PC industry hasn't seen an updated memory spec in a while, and it was long past due. That upgrade came last week, as the memory standards group JEDEC revealed that it had published a spec for DDR4 SDRAM, defining "features, functionalities, AC and DC characteristics, packages and ball/signal assignments," that builds on the DDR3 spec, first published in 2007. The DDR4 spec applies to SDRAM devices from 2 GB through 16 GB for x4, x8 and x16 buses. Here's a look at some of the particulars.

“The new standard will enable next generation systems to achieve greater performance, significantly increased packaging density and improved reliability, with lower power consumption,” Macri said.

Double Data Rate

First and foremost, DDR4 memory doubles the maximum transfer rate of DDR3. The new spec supports a per-pin data rate of up to 3.2 giga transfers per second (GT/s), twice that of its predecessor's eventual maximum of 1.6 GT/s (the ceiling was raised over time). And, DDR4's max could likewise go higher, as necessary, to accommodate faster components and bus speeds. So far, the only processor roadmap we've seen in support of DDR4 has been Intel's, with its Haswell server processor slated for 2014; consumer-platform support isn't expected until sometime in 2015.

Meanwhile, JEDEC member company Samsung announced in July that it had begun sampling the "industry's first" 16-GB DDR4 RDIMMs, and that it will also offer a 32-GB module; and Samsung, Micron and other companies already offer smaller-denomination DIMMs that comply with the spec.

Lower Power

The DDR4 spec defines memory that operates on 1.2V, compared with DDR3's 1.5V and 1.35V low-voltage spec. According to Samsung, its DDR4 RDIMMs consume about 40 percent less power than DDR3 memory modules operating at 1.35V. We're not sure what math they used to arrive at that finding, but in a world increasingly mindful of power consumption and rising energy costs, 1.2V is better than 1.35V.

More, Wider Memory

While DDR3 supported DIMM sizes between 512 MB and 8 GB in as many as eight banks, DDR4 quadruples memory top-end by doubling the module maximum to 16 GB (with a 2-GB minimum) in as many as 16 banks. That's math we can handle. What's more, DDR4 can arrange memory banks into as many as four groups, providing faster burst access to memory and separate read, write, activation and refresh operations for each group.

Incidentally, memory speeds of DDR4 will start at 1,600MHz and balloon to 3,200MHz. DDR3 mobiles are available mostly at frequencies between 800MHz and 1,600MHz, although the spec supports 1,866MHz and 2,133MHz memory, according to a comparison chart published by memory maker Micron.

DOWNLOAD

Get free daily email updates!

Tuesday, 25 September 2012

DesignWare DDR4 Memory Interface IP from Synopsys

Highlights:

Synopsys expands its industry-leading DesignWare® DDR Memory Interface IP family to include support for DDR4 SDRAMs
Backward compatibility with DDR3 and LPDDR2/3 mobile SDRAMs gives SoC designers flexibility as they transition from one SDRAM standard to the next
New DDR4 IP offers more features with up to 50 percent lower latency than the previous generation
DDR4 memory controller and PHY are connected by a standard DFI 3.1 interface to streamline connections to custom PHYs and controllers

Synopsys, Inc. (SNPS), a world leader in software and IP used in the design, verification and manufacture of electronic components and systems, today announced the expansion of its DesignWare DDR interface IP portfolio to include support for next-generation SDRAMs based on the emerging DDR4 standard. By supporting DDR4 as well as DDR3 and LPDDR2/3 in a single core, the DesignWare DDR solution enables designers to interface with either high-performance or low-power SDRAMs in the same system-on-chip (SoC), which is a key requirement of many SoCs such as applications processors for smartphones and tablets.

"Synopsys' support for DDR4 memory is an important contribution to building a robust DDR4 ecosystem," said Robert Feurle, vice president of DRAM marketing for Micron Technology, Inc. "DDR4 brings substantial power and performance benefits to the industry, and Micron is aggressively driving its introduction. By implementing their DesignWare DDR Interface IP with backward compatibility in mind, Synopsys is enabling chip developers to bridge the transition from today's DDR3-based SoCs to the upcoming DDR4 designs."

Synopsys' DesignWare DDR4 IP solution consists of the DDR4 multiPHY and Enhanced Universal DDR Memory Controller (uMCTL2) that connect through a commonly used DFI 3.1 interface. The new DDR4 IP supports all key DDR4 features planned for the upcoming JEDEC standard and, compared to the previous version, includes a 13 percent increase in raw bandwidth, up to a 50 percent reduction in overall latency and new low-power features that provide intelligent system monitoring and control to power down elements of the IP as determined by the system's traffic patterns. Real-time scheduling features in Synopsys' unique CAM-based DDR controller can optimize the scheduling of data read/write traffic from multiple hosts, maximizing performance and minimizing latency.

"While the initial target markets for DDR4 are networking, server, and compute platforms, engineers designing for digital TVs, set-top-boxes, multi-function printing, smartphone and tablet applications will also adopt DDR4 DRAM as prices drop and performance improves," said Desi Rhoden, executive vice president, Montage Technology, and JEDEC memory chairman. "Synopsys has leveraged their participation at JEDEC to develop DDR4-compatible products before the actual standard has been released, which is a key benefit of JEDEC membership."

"Synopsys' complete DDR interface IP portfolio includes support for LPDDR, LPDDR2, LPDDR3, DDR, DDR2, and DDR3," said John Koeter, vice president of marketing for IP and systems at Synopsys. "With this announcement, we are broadening our portfolio to include support for DDR4 while maintaining backward compatibility with existing JEDEC standard SDRAMs. As new DDR standards evolve, designers look for reliable solutions. Synopsys' track record of over 320 DDR IP design wins demonstrates that we offer a low-risk path to silicon success."

Availability for the DesignWare DDR4 multiPHY and uMCTL2 with support for DDR4 is planned for Q4 2012.

Get free daily email updates!

Thursday, 16 August 2012

Refreshing DDR SDRAM

Internally, computer memory is arranged as a matrix of "memory cells" in rows and columns, like the squares on a checkerboard, with each column being further divided by the I/O width of the memory chip. The entire organization of rows and columns is called a DRAM "array." For example, a 2Mx8 DRAM has roughly 2000 rows , 1000 columns, and 8 bits per column -- a total capacity of 16Mb, or 16 million bits.

Each memory cell is used to store a bit of data - stored as an electrical charge - which can be instantaneously retrieved by indicating the data's row and column location; however, DRAM is a volatile form of memory, which means that it must have power in order to retain data. When the power is turned off, data in RAM is lost.

DRAM is called "dynamic" RAM because it must be refreshed, or re-energized, hundreds of times each second in order to retain data. It has to be refreshed because its memory cells are designed around tiny capacitors that store electrical charges. These capacitors work like very tiny batteries and will gradually lose their stored charges if they are not re-energized. Also, the process of retrieving, or reading, data from the memory array tends to drain these charges, so the memory cells must be precharged before reading the data.

Refresh is the process of recharging, or re-energizing, the cells in a memory chip. Cells are refreshed one row at a time (usually one row per refresh cycle). The term "refresh rate" refers, not to the time it takes to refresh the memory, but to the total number of rows that it takes to refresh the entire DRAM array -- e.g. 2000 (2K) or 4000 (4K) rows. The term "refresh cycle" refers to the time it takes to refresh one row or, alternatively, to the time it takes to refresh the entire DRAM array. Refresh can be accomplished in many different ways, which is one of the reasons it can be a confusing topic.

Why are there different types of refresh? How are they different?
Refresh rate is determined by the total number of rows that have to be refreshed in a memory chip. Memory chips are designed for a particular type of refresh. For example, chips using 4K refresh will have about 4000 rows, which means that it will take about 4000 cycles to refresh the entire array. Chips using 2K refresh will have about 2000 rows, and chips with 1K refresh will have about 1000 rows. All of the chips in the chart below have the same total capacity* (16Mb, or 16 million cells), but different numbers of rows and columns depending on the type of refresh used.

	4K refresh	2K refresh	1K refresh
4Mx4	4000 rows / 1000 columns	2000 rows / 2000 columns
2Mx8	4000 rows / 500 columns	2000 rows / 1000 columns
1Mx16	4000 rows / 250 columns		1000 rows / 1000 columns

* Capacity = rows x columns x width. For example, a 4Mx4 chip is 4 Megabits "deep" and 4 bits "wide" (Total = 16Mb). If this chip uses 4K refresh, it will be organized into 4000 rows x 1000 columns x 4 bits per column (Total = 16Mb). If this chip uses 2K refresh, it will be internally organized into 2000 rows x 2000 columns x 4 bits per column (Total = 16Mb). The capacity is the same, but the organization and refresh are different.

The major refresh rates in use today are 1K, 2K, 4K, and 8K. The primary reason for these different types of refresh is decreased power consumption. Since column address circuitry requires more power than row address circuitry, using a type of refresh that selects fewer columns per row draws less current -- e.g. 4K versus 2K.

In addition to various refresh rates, there are several different refresh methods. The two most basic methods are distributed and burst. Distributed refresh charges one row at a time, in sequential order. Burst refresh charges a whole group of rows in one burst.

Normally, the refresh operation is initiated by the system's memory controller, but some chips are designed for "self refresh." This means that the DRAM chip has its own refresh circuitry and does not require intervention from the CPU or external refresh circuitry. Self refresh dramatically reduces power consumption and is often used in portable computers.

Two other refresh techniques are hidden and extended refresh. Both of these techniques rely on capacitors (memory cells) that discharge more slowly. Hidden refresh combines the refresh operation with read/write operations. Extended refresh extends the length of time it takes to refresh the entire memory array. The advantage of both is that they do not have to refresh as often.

Why does 4K refresh consume less power than 2K refresh?
It seems logical to think that 4K refresh would consume more power than 2K refresh because the number is larger, but that is not the case. The numbers do not refer to the size of the refresh area but to the number of rows that it takes to refresh the entire DRAM. 2K refresh charges about 2000 rows to refresh a DRAM chip; 4K refresh charges twice as many rows. The tradeoff is that while 4K refresh takes longer, it consumes less power.

In actuality, 4K refresh charges a smaller section of the total array per cycle than 2K refresh. Looking at the chart above, you can see a 4Mx4 chip with 4K refresh charges about 1000 columns for every row, but the same chip with 2K refresh charges about 2000 columns for every row. Remember that column address circuitry requires more power than row address circuitry, so a refresh that charges fewer columns per row draws less current. 4K refresh charges fewer columns per row than 2K refresh and therefore uses less power -- about 1.2x less power.

Is the performance any different for one type of refresh versus another?
Performance differences are miniscule, but a 2K version of one DRAM chip will perform slightly better than a 4K version. The tradeoff between the number of rows and columns in the internal organization affects what is known as the "page depth" of the DRAM chip, which can impact particular applications. On the other hand, a 4K chip consumes much less power. Deciding which type of refresh to use depends on the specific system and application. These specifications are usually detailed by the system manufacturers.

Where are the different types of refresh used?
Most memory chips today use 1K or 2K refresh and can be found in the majority of PCs. At first, 4K refresh was used in portables, workstations, and PC servers because it consumes less power and generates less heat, but 4K refresh is also increasing in desktop PCs. 8K refresh is fairly new and is exclusive to 64Mb chips at present (mostly high-end applications).

Is memory with different refresh rates interchangeable?
The memory controller in your system determines the type of refresh it can support. Some controllers have only enough drivers to support 2K refresh (2000 rows). Others have been designed to support both types of refresh (2K and 4K) using a technique called "redundant addressing." Some support only 4K refresh. It all depends on the system itself.

Friday, 13 July 2012

Difference between RDIMM and UDIMM

There are some differences between UDIMMs and RDIMMs that are important in choosing the best options for memory performance. First, let’s talk about the differences between them.

RDIMMs have a register on-board the DIMM (hence the name “registered” DIMM). The register/PLL is used to buffer the address and control lines and clocks only. Consequently, none of the data goes through the register /PLL on an RDIMM (PLL is Phase Locked Loop. On prior generations (DDR2), the Register - for buffer the address and control lines - and the PLL for generating extra copies of the clock were separate, but for DDR3 they are in a single part).

There is about a one clock cycle delay through the register which means that with only one DIMM per channel, UDIMMs will have slightly less latency (better bandwidth). But when you go to 2 DIMMs per memory channel, due to the high electrical loading on the address and control lines, the memory controller use something called a “2T” or “2N” timing for UDIMMs.

Consequently every command that normally takes a single clock cycle is stretched to two clock cycles to allow for settling time. Therefore, for two or more DIMMs per channel, RDIMMs will have lower latency and better bandwidth than UDIMMs.

Based on guidance from Intel and internal testing, RDIMMs have better bandwidth when using more than one DIMM per memory channel (recall that Nehalem has up to 3 memory channels per socket). But, based on results from Intel, for a single DIMM per channel, UDIMMs produce approximately 0.5% better memory bandwidth than RDIMMs for the same processor frequency and memory frequency (and rank). For two DIMMs per channel, RDIMMs are about 8.7% faster than UDIMMs.

For the same capacity, RDIMMs will be require about 0.5 to 1.0W per DIMM more power due to the Register/PLL power. The reduction in memory controller power to drive the DIMMs on the channel is small in comparison to the RDIMM Register/PLL power adder.

RDIMMs also provide an extra measure of RAS. They provide address/control parity detection at the Register/PLL such that if an address or control signal has an issue, the RDIMM will detect it and send a parity error signal back to the memory controller. It does not prevent data corruption on a write, but the system will know that it has occurred, whereas on UDIMMs, the same address/control issue would not be caught (at least not when the corruption occurs).

Another difference is that server UDIMMs support only x8 wide DRAMs, whereas RDIMMs can use x8 or x4 wide DRAMs. Using x4 DRAMs allows the system to correct all possible DRAM device errors (SDDC, or “Chip Kill”), which is not possible with x8 DRAMs unless channels are run in Lockstep mode (huge loss in bandwidth and capacity on Nehalem). So if SDDC is important, x4 RDIMMs are the way to go.

In addition, please note that UDIMMs are limited to 2 DIMMs per channel so RDIMMs must be used if greater than 2 DIMMs per channel (some of Dell’s servers will have 3 DIMMs per channel capability).
In summary the comparison between UDIMMs and RDIMMs is

Typically UDIMMs are a bit cheaper than RDIMMs
For one DIMM per memory channel UDIMMs have slightly better memory bandwidth than RDIMMs (0.5%)
For two DIMMs per memory channel RDIMMs have better memory bandwidth (8.7%) than UDIMMs
For the same capacity, RDIMMs will be require about 0.5 to 1.0W per DIMM than UDIMMs
RDIMMs also provide an extra measure of RAS
- Address / control signal parity detection
- RDIMMs can use x4 DRAMs so SDDC can correct all DRAM device errors even in independent channel mode
UDIMMs are currently limited to 1GB and 2GB DIMM sizes from Dell
UDIMMs are limited to two DIMMs per memory channel

DIMM Count and Memory Configurations

Recall that you are allowed up to 3 DIMMs per memory channel (i.e. 3 banks) per socket (a total of 9 DIMMs per socket). With Nehalem the actually memory speed depends upon the speed of the DIMM itself, the number of DIMMs in each channel, the CPU speed itself. Here are some simple rules for determining DIMM speed.

If you put only 1 DIMM in each memory channel you can run the DIMMs at 1333 MHz (maximum speed). This assumes that the processor supports 1333 MHz (currently, the 2.66 GHz, 2.80 GHz, and 2.93 GHz processors support 1333 MHz memory) and the memory is capable of 1333 MHz
As soon as you put one more DIMM in any memory channel (two DIMMs in that memory channel) on any socket, the speed of the memory drops to 1066 MHz (basically the memory runs at the fastest common speed for all DIMMs)
As soon as you put more than two DIMMs in any one memory channel, the speed of all the memory drops to 800 MHz

So as you add more DIMMs to any memory channel, the memory speed drops. This is due to the electrical loading of the DRAMs that reduces timing margin, not power constraints.
If you don’t completely fill all memory channels there is a reduction in the memory bandwidth performance. Think of these configurations as “unbalanced” configurations from a memory perspective.

Wednesday, 11 July 2012

Key Features of upcoming DDR4 memory

Can you believe DDR3 has been present in home PC systems for three years already? It still has another two years as king of the hill before DDR4 will be introduced, and the industry currently isn’t expecting volume shipments of DDR4 until 2013.

JEDEC isn’t due to confirm the DDR4 standard until next year, but following on from the MemCon Tokyo 2010, Japanese website PC Watch has combined the roadmaps of several memory companies on what they expect DDR4 to offer.

Following are the new features proposed for DDR4:

Three data width offerings: x4, x8 and x16
New JEDEC POD12 interface standard for DDR4 (1.2V)
Differential signaling for the clock and strobes
New termination scheme versus prior DDR versions: In DDR4, the DQ bus shifts termination to VDDQ, which should remain stable even if the VDD voltage is reduced over time.
Nominal and dynamic ODT: Improvements to the ODT protocol and a new Park Mode allow for a nominal termination and dynamic write termination without having to drive the ODT pin
Burst length of 8 and burst chop of 4
Data masking
DBI: to help reduce power consumption and improve data signal integrity, this feature informs the DRAM as to whether the true or inverted data should be stored
New CRC for data bus: Enabling error detection capability for data transfers – especially beneficial during write operations and in non-ECC memory applications.
New CA parity for command/address bus: Providing a low-cost method of verifying the integrity of command and address transfers over a link, for all operations.
DLL off mode supported

It looks like we should expect frequencies introduced at 2,133MHz and it will scale to over 4.2GHz with DDR4. 1,600MHz (10ns) could still well be the base spec for sever DIMMs that require reliability, but it’s expected that JEDEC will create new standard DDR3 frequency specifications all the way up to 2,133MHz, which is where DDR4 should jump off.

As the prefetch per clock should extend to 16 bits (up from 8 bits in DDR3), this means the internal cell frequency only has to scale the same as DDR2 and DDR3 in order to achieve the 4+GHz target.

The downside of frequency scaling is that voltage isn’t dropping fast enough and the power consumption is increasing relative to PC-133. DDR4 at 4.2GHz and 1.2V actually uses 4x the power of SDRAM at 133MHz at 3.3V. 1.1V and 1.05V are currently being discussed, which brings the power down to just over 3x, but it depends on the quality of future manufacturing nodes – an unknown factor.

While 4.2GHz at 1.2V might require 4x the power it’s also a 2.75x drop in voltage for a 32 fold increase in frequency: that seems like a very worthy trade off to us – put that against the evolution of power use in graphics cards for a comparison and it looks very favorable.

One area where this design might cause problems is enterprise computing. If you’re using a lot of DIMMs, considerably higher power, higher heat and higher cost aren’t exactly attractive. It’s unlikely that DDR4 4.2GHz will reach a server rack near you though: remember most servers today are only using 1,066MHz DDR3 whereas enthusiast PC memory now exceeds twice that.

Server technology will be slightly different and use high performance digital switches to add additional DIMM slots per channel (much like PCI-Express switches we expect, but with some form of error prevention), and we expect it to be used with the latest buffered LR-DIMM technology as well, although the underlying DDR4 topology will remain the same.

This is the same process the PCI bus went through in its transition to PCI-Express: replacing anything parallel nature with a serial approach. DDR4 will become a point-to-point bus and the parallelism is being left with the memory controller itself with multiple memory channels.

If we look at Intel’s upcoming LGA2011 socket that is anticipated to use a quad-channel memory interface and a single DIMM per channel, it’s now quite obvious that future CPUs using this socket stand a good chance of using DDR4, especially as LGA1366 has had a well defined three year lifespan. In the same timeframe DDR4 could see considerable market acceptance so it’s a smart move by Intel.

The big questions remain to be answered then: is it a consumer (cost) friendly process and how well does TSV cope with overclocking? We’ll have to wait for the first samples in 2011-2012 to find out.

Wednesday, 4 July 2012

Difference between DDR3 and DDR4 RAM

DDR4 is the next evolution in DRAM, bringing even higher performance and more robust control features while improving energy economy for enterprise, micro-server, tablet, and ultrathin client applications. The following table compares some of the key feature differences between DDR3 and DDR4.

Feature/Option	DDR3	DDR4	DDR4 Advantage
Voltage (core and I/O)	1.5V	1.2V	Reduces memory power demand
V_REF inputs	2 – DQs and CMD/ADDR	1 – CMD/ADDR	V_REFDQ now internal
Low voltage standard	Yes (DDR3L at 1.35V)	Anticipated (likely 1.05V)	Memory power reductions
Data rate (Mb/s)	800, 1066, 1333, 1600, 1866, 2133	1600, 1866, 2133, 2400, 2667, 3200	Migration to higher‐speed I/O
Densities	512Mb–8Gb	2Gb–16Gb	Better enablement for large-capacity memory subsystems
Internal banks	8	16	More banks
Bank groups (BG)	0	4	Faster burst accesses
^tCK – DLL enabled	300 MHz to 800 MHz	667 MHz to 1.6 GHz	Higher data rates
^tCK – DLL disabled	10 MHz to 125 MHz (optional)	Undefined to 125 MHz	DLL-off now fully supported
Read latency	AL + CL	AL + CL	Expanded values
Write latency	AL + CWL	AL + CWL	Expanded values
DQ driver (ALT)	40Ω	48Ω	Optimized for PtP (point-to-point) applications
DQ bus	SSTL15	POD12	Mitigate I/O noise and power
R_TT values (in Ω)	120, 60, 40, 30, 20	240, 120, 80, 60, 48, 40, 34	Support higher data rates
R_TT not allowed	READ bursts	Disables during READ bursts	Ease-of-use
ODT modes	Nominal, dynamic	Nominal, dynamic, park	Additional control mode; supports OTF value change
ODT control	ODT signaling required	ODT signaling not required	Ease of ODT control, allows non-ODT routing on PtP applications
Multipurpose register (MPR)	Four registers – 1 defined, 3 RFU	Four registers – 3 defined, 1 RFU	Provides additional specialty readout

Very Large Scale Integration (VLSI)

Featured post

Top 5 books to refer for a VHDL beginner