Very Large Scale Integration (VLSI)

Friday, 21 August 2015

Resistive Memory - ReRam

The memory tech that will eventually replace NAND flash, finally in market

What is ReRam?

ReRam is Resistive random-access memory (RRAM or ReRAM) is a type of non-volatile (NV) random-access (RAM) computer memory that works by changing the resistance across a dielectric solid-state material often referred to as a memristor. The biggest advantage of ReRAM technology is its good compatibility with CMOS technologies.

It is under development by a number of companies, and some have already patented their own versions of the technology. The memory operates by changing the resistance of special dielectric material called a memresistor (memory resistor) whose resistance varies depending on the applied voltage.

What makes ReRam?

From the viewpoint of the material choice, the advantage of ReRAM is evident. It is possible to fabricate MOM structures easily by using the oxides widely used in the current semiconductor technologies. Low-current ReRAM operation was reported in the CuOx-based MOM structure. The CuOx layer was grown by the thermal oxidation of the 0.18-μm Cu. NiO and CoO are being intensively studied as oxide materials for ReRAM, and these transition metal elements are also used in metal silicides employed as gate materials. Recently, the good scaling feasibility of ReRAM was demonstrated in an HfOx-based memory with a cell size of 30 nm. The devices in a 1-kbit array exhibited a high device yield (~100%) and robust cycling endurance (>106) with a pulse width of 40 ns. The memory cell consisted of a TiN/Ti/HfOx/TiN structure. Here, the Ti overlayer played the role of oxygen gettering for better ReRAM operation. The gettering effect has already been investigated in HfOx as a high-k material for the gate dielectric films in CMOS devices. The academic and technological knowledge about high-k materials will be very useful in the design of the stacking structure for a ReRAM device.

How ReRam Works?

RRAM is the result of a new kind of dielectric material which is not permanently damaged and fails when dielectric breakdown occurs; for a memresistor, the dielectric breakdown is temporary and reversible. When voltage is deliberately applied to a memresistor, microscopic conductive paths called filaments are created in the material. The filaments are caused by phenomena like metal migration or even physical defects. Filaments can be broken and reversed by applying different external voltages. It is this creation and destruction of filaments in large quantities that allows for storage of digital data. Materials that have memresistor characteristics include oxides of titanium and nickel, some electrolytes, semiconductor materials, and even a few organic compounds have been tested to have these characteristics.

The principal advantage of RRAM over other non-volatile technology is high switching speed. Because of the thinness of the memresistors, it has a great potential for high storage density, greater read and write speeds, lower power usage, and cheaper cost than flash memory. Flash memory cannot continue to scale because of the limits of the materials, so RRAM will soon replace flash memory.

Monday, 29 June 2015

Difference between simulation and emulation

A simulation is a system that behaves similar to something else, but is implemented in an entirely different way. It provides the basic behaviour of a system but may not necessarily abide by all of the rules of the system being simulated. It is there to give you an idea about how something works.

Think of a flight simulator as an example. It looks and feels like you are flying an airplane, but you are completely disconnected from the reality of flying the plane, and you can bend or break those rules as you see fit. E.g.; Fly an Airbus A380 upside down between London and Sydney without breaking it.

An emulation is a system that behaves exactly like something else, and abides by all of the rules of the system being emulated. It’s like duplicating every aspect of the original device’s behaviour. It is effectively a complete replication of another system, right down to being binary compatible with the emulated system's inputs and outputs, but operating in a different environment to the environment of the original emulated system. The rules are fixed, and cannot be changed or the system fails.

Today hardware emulation has become an very popular tool for verification because of following reasons:

In the past few years, the emulation user community has expanded exponentially by the addition of software developers to the traditional base of hardware designers and verification engineers.

Also, uses of hardware emulation have multiplied because of its versatility as a resource for debugging both the hardware and software of complex system-on-chip (SoC) designs. Hardware emulation is the only verification tool that can be deployed in more than one mode. In fact, it can be used in four main modes, some of which can be combined for added versatility. Because of this resourcefulness, hardware emulation can be used to achieve several verification objectives.

Following are the deployment modes for hardware emulator. These are characterized by type of stimulus applied to DUT:

In Circuit Emulation (ICE) : This was considered to be the traditional method when hardware emulation was deployed. In this case, the DUT is mapped inside the emulator and connected in in-circuit emulation (ICE) mode to the target system in place of a chip or processor for debug prior to silicon availability.
Transaction Based Acceleration (TBX) : Transaction-based emulation moves verification up a level of abstraction from the register transfer level (RTL), improving performance and debug productivity. It’s gaining popularity over the ICE mode because the physical target system is replaced by a virtual target system using a hardware verification language (HVL) such as
SystemVerilog, SystemC, or C++.
Simulation Testbench Acceleration : In this mode, an RTL testbench drives the DUT in the emulator via a programmable logic interface (PLI). In general, this is the slowest performance mode, but it has some advantages, such as the fact that it does not require changes to the testbench.
Embeded Software Acceleration : In this mode, the software code is executed on the DUT processor mapped inside the emulator. This is the fastest performance mode, making it the choice for processing billions of verification cycles necessary to boot an operating system.

It is possible to mix some of the above modes, such as processing embedded software together with a virtual testbench driving the DUT via verification IP or even in ICE mode.

Wednesday, 13 May 2015

IEEE Standards

IEEE 802 refers to a family of IEEE standards dealing with local area networks and metropolitan area networks.

More specifically, the IEEE 802 standards are restricted to networks carrying variable-size packets. By contrast, in cell relay networks data is transmitted in short, uniformly sized units called cells. Isochronous networks, where data is transmitted as a steady stream of octets, or groups of octets, at regular time intervals, are also out of the scope of this standard. The number 802 was simply the next free number IEEE could assign, though “802” is sometimes associated with the date the first meeting was held — February 1980.

The services and protocols specified in IEEE 802 map to the lower two layers (Data Link and Physical) of the seven-layer OSI networking reference model. In fact, IEEE 802 splits the OSI Data Link Layer into two sub-layers named Logical Link Control (LLC) and Media Access Control (MAC), so that the layers can be listed like this:

Data link layer
LLC Sublayer
MAC Sublayer
Physical layer

The IEEE 802 family of standards is maintained by the IEEE 802 LAN/MAN Standards Committee (LMSC). The most widely used standards are for the Ethernet family, Token Ring, Wireless LAN, Bridging and Virtual Bridged LANs. An individual Working Group provides the focus for each area.

IEEE developed a set of 802 network standards. They include:

IEEE 802.1: Standards related to network management.
IEEE 802.2: General standard for the data link layer in the OSI Reference Model. The IEEE divides this layer into two sub-layers -- the logical link control (LLC) layer and the media access control (MAC) layer. The MAC layer varies for different network types and is defined by standards IEEE 802.3 through IEEE 802.5.
IEEE 802.3: Defines the MAC layer for bus networks that use CSMA/CD. This is the basis of the Ethernet standard.
IEEE 802.4: Defines the MAC layer for bus networks that use a token-passing mechanism (token bus networks).
IEEE 802.5: Defines the MAC layer for token-ring networks.
IEEE 802.6: Standard for Metropolitan Area Networks (MANs).

Tuesday, 12 May 2015

Ethernet–Introduction

In today's business world, reliable and efficient access to information has become an important asset in the quest to achieve a competitive advantage. File cabinets and mountains of papers have given way to computers that store and manage information electronically.

Computer networking technologies are the glue that binds these elements together. Networking allows one computer to send information to and receive information from another. We can classify network technologies as belonging to one of two basic groups. Local area network (LAN) technologies connect many devices that are relatively close to each other, usually in the same building. The library terminals that display book information would connect over a local area network. Wide area network (WAN) technologies connect a smaller number of devices that can be many kilometers apart.

In comparison to WANs, LANs are faster and more reliable, but improvements in technology continue to blur the line of demarcation. Fiber optic cables have allowed LAN
technologies to connect devices tens of kilometers apart, while at the same time greatly
improving the speed and reliability of WANs.

Ethernet

Ethernet has been a relatively inexpensive, reasonably fast, and very popular LAN technology for several decades. Two individuals at Xerox PARC -- Bob Metcalfe and D.R. Boggs -- developed Ethernet beginning in 1972 and specifications based on this work appeared in IEEE 802.3 in 1980. Ethernet has since become the most popular and most widely deployed network technology in the world. Many of the issues involved with Ethernet are common to many network technologies, and understanding how Ethernet addressed these issues can provide a foundation that will improve your understanding of networking in general.

The Ethernet standard has grown to encompass new technologies as computer networking has matured. Specified in a standard, IEEE 802.3, an Ethernet LAN typically uses coaxial cable or special grades of twisted pair wires. Ethernet is also used in wireless LANs. Ethernet uses the CSMA/CD access method to handle simultaneous demands. The most commonly installed Ethernet systems are called 10BASE-T and provide transmission speeds up to 10 Mbps. Devices are connected to the cable and compete for access using a Carrier Sense Multiple Access with Collision Detection (CSMA/CD) protocol. Fast Ethernet or 100BASE-T provides transmission speeds up to 100 megabits per second and is typically used for LAN backbone systems, supporting workstations with 10BASE-T cards. Gigabit Ethernet provides an even higher level of backbone support at 1000 megabits per second (1 gigabit or 1 billion bits per second). 10-Gigabit Ethernet provides up to 10 billion bits per second.

Sunday, 15 March 2015

A Cache Memory

Today we feel to revise what we know about cache memory. A cache is a memory device that improves performance of the processor by transparently storing data such that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere.

Access to cache can result in either one of the following: cache miss or cache hit.Cache hit means that the requested data is contained in the cache and cache miss means data is not found there in cache.On cache hit processor takes data from cache itself for processing.On cache miss the data is fetched from the original memory location.Cache memories are volatile and small in storage size.Since the storage size is small the address decoding takes less time and hence caches are faster then normal physical memories(RAM's) in computers.

As I said the data is stored transparently in cache.This means that the user who is requesting data from the cache need not know whether data is stored in cache or system memory.It is handled by the processor.The word cache means "conceal" in French.

A simple cache contains three fields.

1. An index which is local to the cache.

2. A tag which is the index with reference to the main memory.This will let the processor know the location in main memory where an exact copy of data is stored.

3. Data, which is actual data needed by the processor.

When processor needs some data from the memory it first checks in cache.It sees all the tag fields in the cache to see whether same data is available in cache.If the tag is found then the corresponding data is taken.Otherwise a cache miss error is asserted and the main memory is accessed.Also the cache memory is updated with the recent memory access.This is called cache update on cache miss.

During a cache update if the cache is full, then it has to delete a row.This is decided on a cache replacement algorithm.Some algorithms are:

1. LRU - Least recently used data is replaced.

2. MRU - Most recently used data is replaced.

3. Random replacement - Simple, used in ARM processors.

4. Belady's Algorithm - discards the data which may not be used for the longest time in future.Not perfectly implementable in practice.

The average memory access time of a cache enabled system can be calculated using the hit and miss ratio of a cache.

Average memory access time = (Time_cache * Hit_counts ) + ( (Time_cache + Time_mm) * Miss_counts)

where,

Time_cache and Time_mm is the time needed to access a location for cache and main memory respectively.
Hit_counts and Miss_counts are the hit and miss probabilities.

There are two types cache writing: write back(copy-back) and write through.

When the data at a particular memory location is updated then this data must be written back to cache.If the data is updated only in the cache then it is called write back.If the updating of data happens both in cache and main memory then it is called write through.Write through keeps the cache and memory synchronized.In the write back operation since the cache data is not same as the main memory data it is marked as "dirty" data.These dirty data will be written back into main memory when the particular data is cleared from the cache.If a miss happens in a write-back cache it may sometimes require two memory accesses to service : one to first write the dirty location to memory and then another to read the new location from memory.

The main memory locations may be altered without proper updating in cache by peripherals using DMA or by a multi core processor.This results in a out of date data in cache.These type of data is called "stale" data.To solve these stale data problems we have to use cache coherence protocols between the cache managers to keep the data consistent.

All caches are CAM(content accessible memory).And for efficiency we have to scan all the memory contents in one cycle.This requires parallel hardware.Also higher the memory size the more is the memory access time.

Let us see now,how a cache is made.Say we have a 32 bit main memory in our system and the cache chip size is 4 Kb.Also say each line in cache stores 32 bytes so that there are totally 128 lines.Each line in cache have two fields. Address(4 bytes) and Data.The address is further divided into two fields- Tag(27 bits) and offset(5 bits for indexing a particular byte among the data).Remember that the tag contain the MSB 27 bits of the address here.These kind of caches are called Fully associative caches.Since the tag is 27 bits(relatively long) it takes more time to read data from Fully associative cache.Also more hardware circuit is required for parallel reading of tags from the CAM type cache.So they are expensive but more efficient.

Fully associative cache

Another type of cache architecture is known as direct mapped cache.In this the address is divided into three fields named tag(20 bits),index(7 bits used as an index the 128 lines in cache) and offset(5 bits).The problem with this type of cache is that the cache is less efficient since the main memory cannot be copied to any line in cache as in fully associative cache.This is because the addresses with the same index will be mapped to the same line in cache.But the cache access time is less here.In certain situations you may get a cache miss for almost every access.So they are cheap but less efficient.

Direct mapped Cache

Another type of cache is called set associative cache which has the advantages of both direct mapped and fully associative caches.These are again subdivided based on the number of bits in the index field.

2. 2 way set associative cache - In this type of cache we have two group of lines,each containing 64 lines.The cache has the same number of fields as direct mapped cache but tag has 21 bits and index has 6 bits here.

2. 4 way set associative cache - Here we have 4 groups each contains 32 lines.index has 5 bits and tag has 22 bits.

2-way and 4-way set associative caches

Sunday, 1 March 2015

SystemVerilog Associative Arrays

In previous post we learn in detail about SystemVerilog Dynamic arrays which is useful for dealing with contiguous collections of variables whose number changes dynamically. Now consider if the size of array is unknown then how much size will you allocate to array?

An Associative array is one to use when the size of the collection is unknown or the data space is sparse. So the associative arrays are mainly used to model the sparse memories. In the associative arrays the storage is allocated only when we use it not initially like in dynamic arrays. Associative arrays can be assigned only to another Associative array of a compatible type and with the same index type.

Another main difference between Associative array and normal arrays is in that in assoc arrays the arrays index can be any scalar value.

Properties of Associative arrays:

Dynamically allocated, non-contiguous elements
Accessed with integer, or string index, single dimension
Great for sparse arrays with wide ranging index
Array functions: exists, first, last, next, prev

Declaration Syntax:

data_type array_name [ index_type ];

Where,
data_type : data type of the array element. This can be any type that is allowed for Fixed Arrays
array_name : name of the array being declared.
index_type : data type to be used as index

Example : Associative array declaration

int array_name[*];//Wildcard index. can be indexed by any integral datatype.
int array_name [string];// String index
int array_name [some_Class];// Class index
int array_name [integer];// Integer index
typedef bit signed [4:1] Nibble;
int array_name [Nibble]; // Signed packed array

Accessing the Associative arrays
SystemVerilog provides various in-built methods to access, analyze and manipulate the associative arrays.

num() or size() returns the number of entries in the associative arrays.
delete() removes the entry from specified index.
exist() checks weather an element exists at specified index of the given associative array.
first() assigns to the given index variable the value of the smallest/first index in the associative array. Returns 0 if array is empty; else returns 1.
last() assigns to the given index variable the value of the largest/last index in the associative array. Returns 0 if array is empty; else returns 1.
next() finds the entry whose index is greater than the given index. If next entry exists then the index variable is assigned to the index of next entry and returns 1. Otherwise the index is unchanged and the function returns 0.
previous() finds the entry whose index is smaller than the given index. If previous entry exists then the index variable is assigned to the index of previous entry and returns 1. Otherwise the index is unchanged and the function returns 0.

Example : Associative Arrays in-built methods

// SystemVerilog Associative arrays
module associative_array ();
 
 integer associative_array [integer];
 
 integer i;
 
 initial begin
   // Add element array
   associative_array[100] = 101;
   $display ("value stored in 100 is %d", associative_array[100]);
   associative_array[1]   = 100;
   $display ("value stored in 1   is %d", associative_array[1]);
   associative_array[50]   = 99;
   $display ("value stored in 50  is %d", associative_array[50]);
   associative_array[250] = 22;
   $display ("value stored in 250 is %d", associative_array[250]);
   // Print the size of array
   $display ("size of array is %d", associative_array.num());
   // Check if index 2 exists
   $display ("index 2 exists   %d", associative_array.exists(2));
   // Check if index 100 exists
   $display ("index 100 exists %d", associative_array.exists(100));
   // Value stored in first index
   if (associative_array.first(i)) begin
     $display ("value at first index %d value %d", i, associative_array[i]);
   end
   // Value stored in last index
   if (associative_array.last(i)) begin
     $display ("value at last index  %d value %d", i,  associative_array[i]);
   end
   // Delete the first index
   associative_array.delete(100);
   $display ("Deleted index 100");
   // Value stored in first index
   if (associative_array.first(i)) begin
     $display ("value at first index %d value %d", i, associative_array[i]);
   end
    #1  $finish;
 end
 
 endmodule

Simulation Result:
value stored in 100 is 101
value stored in 1 is 100
value stored in 50 is 99
value stored in 250 is 22
size of array is 4
index 2 exists 0
index 100 exists 1
value at first index 1 value 100
value at last index 250 value 22
Deleted index 100
value at first index 1 value 100

Try simulation yourself here

Now as you know that we can use any data type as index of associative arrays, below are the things to keep in mind while using different index datatypes.

1. Integer or int index : While using integer in associative arrays, following rules need to be kept in mind.

A 4-state index containing X or Z is invalid.
Indices smaller than integer are sign extended to 32 bits.
The ordering is signed numerical.
Indices can be any integral expression.
Indices are signed.
Example: int array_name [ integer ];

2. String index : While using string in associative arrays, following rules need to be kept in mind.

An empty string "" index is valid.
The ordering is lexicographical (lesser to greater).
Indices can be strings or string literals of any length.
Example: int array_name [ string ];

3. Class index : While using class in associative arrays, following rules need to be kept in mind.

A null index is valid.
The ordering is deterministic but arbitrary.
Indices can be objects of that particular type or derived from that type.
Example: int array_name [ some_Class ];

4. Wild Character index : While using wild characters in associative arrays, following rules need to be kept in mind.

A 4-state Index containing X or Z is invalid.
Indices are unsigned.
Indexing expressions are self-determined; signed indices are not sign extended.
The ordering is numerical (smallest to largest).
A string literal index is auto-cast to a bit-vector of equivalent size.
The array can be indexed by any integral data type.
Example: int array_name [*];

Previous : SystemVerilog Dynamic Arrays

Next : SystemVerilog Queues