Very Large Scale Integration (VLSI)

Wednesday, 2 September 2015

Intel's Skylarke Processors for PCs, Tablets and Servers

Intel is launching a full portfolio of "Skylake" processors that company officials expect will combine with Microsoft's Windows 10 operating system to help jump start a stagnant global PC market.

Executives with the chip maker for more than a year have been talking about the 14-nanometer Skylake architecture and the advanced features that are contained within it, touching on everything from graphics and imaging to security, memory, performance and wireless connectivity. In early August, Intel rolled out two Skylake chips—the Core i7-6700K and i5-6600K desktop processors—for gaming machines, and later in the month officials gave out a few more details during the Intel Developer Forum (IDF).

While Intel Corp. is going to release its code-named “Skylake” processors a little later than expected, the company keeps the plan to introduce its new micro-architecture for virtually all segments of the market continuum this year. Intel will roll-out “Skylake” central processing units for tablets, 2-in-1s, personal computers and servers this year, the chip giant confirmed this week.

“When I look at the range of what Skylake’s able to deliver from the Core M level all up to the i7 and Xeon, it’s just going to be a fantastic product,” said Intel CEO Brian Krzanich, in an interview with the IDG News Service at Mobile World Congress in Barcelona.

Intel ran into problems with production of its code-named “Broadwell” processors using 14nm manufacturing technology last year. Due to insufficient yields, the world’s largest maker of microprocessors had to delay introduction of its latest chips by about a year. However, since “Skylake” brings a lot of innovations, Intel did not want to delay it significantly. As a result, “Broadwell” products will have a relatively short lifecycle.

Intel will introduce the first “Skylake” processors in the form of dual-core Core M chips in the third quarter of this calendar year. The CPUs will power high-performance tablets, hybrid 2-in-1 personal computers and ultra-thin notebooks. It is expected that many mobile devices powered by Intel Core M “Skylake” will support Rezence wireless charging and WiGig short-range transmission technology.

In late Q3 or early Q4 the Santa Clara, California-based chip designer will introduce its first Core i3, Core i5 and Core i7 chips featuring “Skylake” micro-architecture for mainstream personal computers, including desktops and laptops. The lineup is projected to include chips with unlocked multiplier designed for enthusiast-class desktop PCs. Systems featuring the new “Skylake” processors will have improved storage performance thanks to native support of SATA Express. In addition, many “Skylake”-powered PCs will use DDR4 memory and support a variety of other innovations.

Intel also plans to introduce Xeon processors with “Skylake” cores for uniprocessor servers later this year. While there are plans to bring “Skylake” architecture to Xeon chips for dual-processor and multi-processor servers, Intel yet has to outline exact plans concerning the move.

Intel “Skylake” processors will be made using 14nm process technology and will feature a brand new micro-architecture that is designed to improve performance and power efficiency of central processing units. Unfortunately, not all “Skylake” processors will support 512-bit AVX 3.2 instructions, according to unofficial information.

Friday, 21 August 2015

Resistive Memory - ReRam

The memory tech that will eventually replace NAND flash, finally in market

What is ReRam?

ReRam is Resistive random-access memory (RRAM or ReRAM) is a type of non-volatile (NV) random-access (RAM) computer memory that works by changing the resistance across a dielectric solid-state material often referred to as a memristor. The biggest advantage of ReRAM technology is its good compatibility with CMOS technologies.

It is under development by a number of companies, and some have already patented their own versions of the technology. The memory operates by changing the resistance of special dielectric material called a memresistor (memory resistor) whose resistance varies depending on the applied voltage.

What makes ReRam?

From the viewpoint of the material choice, the advantage of ReRAM is evident. It is possible to fabricate MOM structures easily by using the oxides widely used in the current semiconductor technologies. Low-current ReRAM operation was reported in the CuOx-based MOM structure. The CuOx layer was grown by the thermal oxidation of the 0.18-μm Cu. NiO and CoO are being intensively studied as oxide materials for ReRAM, and these transition metal elements are also used in metal silicides employed as gate materials. Recently, the good scaling feasibility of ReRAM was demonstrated in an HfOx-based memory with a cell size of 30 nm. The devices in a 1-kbit array exhibited a high device yield (~100%) and robust cycling endurance (>106) with a pulse width of 40 ns. The memory cell consisted of a TiN/Ti/HfOx/TiN structure. Here, the Ti overlayer played the role of oxygen gettering for better ReRAM operation. The gettering effect has already been investigated in HfOx as a high-k material for the gate dielectric films in CMOS devices. The academic and technological knowledge about high-k materials will be very useful in the design of the stacking structure for a ReRAM device.

How ReRam Works?

RRAM is the result of a new kind of dielectric material which is not permanently damaged and fails when dielectric breakdown occurs; for a memresistor, the dielectric breakdown is temporary and reversible. When voltage is deliberately applied to a memresistor, microscopic conductive paths called filaments are created in the material. The filaments are caused by phenomena like metal migration or even physical defects. Filaments can be broken and reversed by applying different external voltages. It is this creation and destruction of filaments in large quantities that allows for storage of digital data. Materials that have memresistor characteristics include oxides of titanium and nickel, some electrolytes, semiconductor materials, and even a few organic compounds have been tested to have these characteristics.

The principal advantage of RRAM over other non-volatile technology is high switching speed. Because of the thinness of the memresistors, it has a great potential for high storage density, greater read and write speeds, lower power usage, and cheaper cost than flash memory. Flash memory cannot continue to scale because of the limits of the materials, so RRAM will soon replace flash memory.

Monday, 29 June 2015

Difference between simulation and emulation

A simulation is a system that behaves similar to something else, but is implemented in an entirely different way. It provides the basic behaviour of a system but may not necessarily abide by all of the rules of the system being simulated. It is there to give you an idea about how something works.

Think of a flight simulator as an example. It looks and feels like you are flying an airplane, but you are completely disconnected from the reality of flying the plane, and you can bend or break those rules as you see fit. E.g.; Fly an Airbus A380 upside down between London and Sydney without breaking it.

An emulation is a system that behaves exactly like something else, and abides by all of the rules of the system being emulated. It’s like duplicating every aspect of the original device’s behaviour. It is effectively a complete replication of another system, right down to being binary compatible with the emulated system's inputs and outputs, but operating in a different environment to the environment of the original emulated system. The rules are fixed, and cannot be changed or the system fails.

Today hardware emulation has become an very popular tool for verification because of following reasons:

In the past few years, the emulation user community has expanded exponentially by the addition of software developers to the traditional base of hardware designers and verification engineers.

Also, uses of hardware emulation have multiplied because of its versatility as a resource for debugging both the hardware and software of complex system-on-chip (SoC) designs. Hardware emulation is the only verification tool that can be deployed in more than one mode. In fact, it can be used in four main modes, some of which can be combined for added versatility. Because of this resourcefulness, hardware emulation can be used to achieve several verification objectives.

Following are the deployment modes for hardware emulator. These are characterized by type of stimulus applied to DUT:

In Circuit Emulation (ICE) : This was considered to be the traditional method when hardware emulation was deployed. In this case, the DUT is mapped inside the emulator and connected in in-circuit emulation (ICE) mode to the target system in place of a chip or processor for debug prior to silicon availability.
Transaction Based Acceleration (TBX) : Transaction-based emulation moves verification up a level of abstraction from the register transfer level (RTL), improving performance and debug productivity. It’s gaining popularity over the ICE mode because the physical target system is replaced by a virtual target system using a hardware verification language (HVL) such as
SystemVerilog, SystemC, or C++.
Simulation Testbench Acceleration : In this mode, an RTL testbench drives the DUT in the emulator via a programmable logic interface (PLI). In general, this is the slowest performance mode, but it has some advantages, such as the fact that it does not require changes to the testbench.
Embeded Software Acceleration : In this mode, the software code is executed on the DUT processor mapped inside the emulator. This is the fastest performance mode, making it the choice for processing billions of verification cycles necessary to boot an operating system.

It is possible to mix some of the above modes, such as processing embedded software together with a virtual testbench driving the DUT via verification IP or even in ICE mode.

Wednesday, 13 May 2015

IEEE Standards

IEEE 802 refers to a family of IEEE standards dealing with local area networks and metropolitan area networks.

More specifically, the IEEE 802 standards are restricted to networks carrying variable-size packets. By contrast, in cell relay networks data is transmitted in short, uniformly sized units called cells. Isochronous networks, where data is transmitted as a steady stream of octets, or groups of octets, at regular time intervals, are also out of the scope of this standard. The number 802 was simply the next free number IEEE could assign, though “802” is sometimes associated with the date the first meeting was held — February 1980.

The services and protocols specified in IEEE 802 map to the lower two layers (Data Link and Physical) of the seven-layer OSI networking reference model. In fact, IEEE 802 splits the OSI Data Link Layer into two sub-layers named Logical Link Control (LLC) and Media Access Control (MAC), so that the layers can be listed like this:

Data link layer
LLC Sublayer
MAC Sublayer
Physical layer

The IEEE 802 family of standards is maintained by the IEEE 802 LAN/MAN Standards Committee (LMSC). The most widely used standards are for the Ethernet family, Token Ring, Wireless LAN, Bridging and Virtual Bridged LANs. An individual Working Group provides the focus for each area.

IEEE developed a set of 802 network standards. They include:

IEEE 802.1: Standards related to network management.
IEEE 802.2: General standard for the data link layer in the OSI Reference Model. The IEEE divides this layer into two sub-layers -- the logical link control (LLC) layer and the media access control (MAC) layer. The MAC layer varies for different network types and is defined by standards IEEE 802.3 through IEEE 802.5.
IEEE 802.3: Defines the MAC layer for bus networks that use CSMA/CD. This is the basis of the Ethernet standard.
IEEE 802.4: Defines the MAC layer for bus networks that use a token-passing mechanism (token bus networks).
IEEE 802.5: Defines the MAC layer for token-ring networks.
IEEE 802.6: Standard for Metropolitan Area Networks (MANs).

Tuesday, 12 May 2015

Ethernet–Introduction

In today's business world, reliable and efficient access to information has become an important asset in the quest to achieve a competitive advantage. File cabinets and mountains of papers have given way to computers that store and manage information electronically.

Computer networking technologies are the glue that binds these elements together. Networking allows one computer to send information to and receive information from another. We can classify network technologies as belonging to one of two basic groups. Local area network (LAN) technologies connect many devices that are relatively close to each other, usually in the same building. The library terminals that display book information would connect over a local area network. Wide area network (WAN) technologies connect a smaller number of devices that can be many kilometers apart.

In comparison to WANs, LANs are faster and more reliable, but improvements in technology continue to blur the line of demarcation. Fiber optic cables have allowed LAN
technologies to connect devices tens of kilometers apart, while at the same time greatly
improving the speed and reliability of WANs.

Ethernet

Ethernet has been a relatively inexpensive, reasonably fast, and very popular LAN technology for several decades. Two individuals at Xerox PARC -- Bob Metcalfe and D.R. Boggs -- developed Ethernet beginning in 1972 and specifications based on this work appeared in IEEE 802.3 in 1980. Ethernet has since become the most popular and most widely deployed network technology in the world. Many of the issues involved with Ethernet are common to many network technologies, and understanding how Ethernet addressed these issues can provide a foundation that will improve your understanding of networking in general.

The Ethernet standard has grown to encompass new technologies as computer networking has matured. Specified in a standard, IEEE 802.3, an Ethernet LAN typically uses coaxial cable or special grades of twisted pair wires. Ethernet is also used in wireless LANs. Ethernet uses the CSMA/CD access method to handle simultaneous demands. The most commonly installed Ethernet systems are called 10BASE-T and provide transmission speeds up to 10 Mbps. Devices are connected to the cable and compete for access using a Carrier Sense Multiple Access with Collision Detection (CSMA/CD) protocol. Fast Ethernet or 100BASE-T provides transmission speeds up to 100 megabits per second and is typically used for LAN backbone systems, supporting workstations with 10BASE-T cards. Gigabit Ethernet provides an even higher level of backbone support at 1000 megabits per second (1 gigabit or 1 billion bits per second). 10-Gigabit Ethernet provides up to 10 billion bits per second.

Sunday, 15 March 2015

A Cache Memory

Today we feel to revise what we know about cache memory. A cache is a memory device that improves performance of the processor by transparently storing data such that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere.

Access to cache can result in either one of the following: cache miss or cache hit.Cache hit means that the requested data is contained in the cache and cache miss means data is not found there in cache.On cache hit processor takes data from cache itself for processing.On cache miss the data is fetched from the original memory location.Cache memories are volatile and small in storage size.Since the storage size is small the address decoding takes less time and hence caches are faster then normal physical memories(RAM's) in computers.

As I said the data is stored transparently in cache.This means that the user who is requesting data from the cache need not know whether data is stored in cache or system memory.It is handled by the processor.The word cache means "conceal" in French.

A simple cache contains three fields.

1. An index which is local to the cache.

2. A tag which is the index with reference to the main memory.This will let the processor know the location in main memory where an exact copy of data is stored.

3. Data, which is actual data needed by the processor.

When processor needs some data from the memory it first checks in cache.It sees all the tag fields in the cache to see whether same data is available in cache.If the tag is found then the corresponding data is taken.Otherwise a cache miss error is asserted and the main memory is accessed.Also the cache memory is updated with the recent memory access.This is called cache update on cache miss.

During a cache update if the cache is full, then it has to delete a row.This is decided on a cache replacement algorithm.Some algorithms are:

1. LRU - Least recently used data is replaced.

2. MRU - Most recently used data is replaced.

3. Random replacement - Simple, used in ARM processors.

4. Belady's Algorithm - discards the data which may not be used for the longest time in future.Not perfectly implementable in practice.

The average memory access time of a cache enabled system can be calculated using the hit and miss ratio of a cache.

Average memory access time = (Time_cache * Hit_counts ) + ( (Time_cache + Time_mm) * Miss_counts)

where,

Time_cache and Time_mm is the time needed to access a location for cache and main memory respectively.
Hit_counts and Miss_counts are the hit and miss probabilities.

There are two types cache writing: write back(copy-back) and write through.

When the data at a particular memory location is updated then this data must be written back to cache.If the data is updated only in the cache then it is called write back.If the updating of data happens both in cache and main memory then it is called write through.Write through keeps the cache and memory synchronized.In the write back operation since the cache data is not same as the main memory data it is marked as "dirty" data.These dirty data will be written back into main memory when the particular data is cleared from the cache.If a miss happens in a write-back cache it may sometimes require two memory accesses to service : one to first write the dirty location to memory and then another to read the new location from memory.

The main memory locations may be altered without proper updating in cache by peripherals using DMA or by a multi core processor.This results in a out of date data in cache.These type of data is called "stale" data.To solve these stale data problems we have to use cache coherence protocols between the cache managers to keep the data consistent.

All caches are CAM(content accessible memory).And for efficiency we have to scan all the memory contents in one cycle.This requires parallel hardware.Also higher the memory size the more is the memory access time.

Let us see now,how a cache is made.Say we have a 32 bit main memory in our system and the cache chip size is 4 Kb.Also say each line in cache stores 32 bytes so that there are totally 128 lines.Each line in cache have two fields. Address(4 bytes) and Data.The address is further divided into two fields- Tag(27 bits) and offset(5 bits for indexing a particular byte among the data).Remember that the tag contain the MSB 27 bits of the address here.These kind of caches are called Fully associative caches.Since the tag is 27 bits(relatively long) it takes more time to read data from Fully associative cache.Also more hardware circuit is required for parallel reading of tags from the CAM type cache.So they are expensive but more efficient.

Fully associative cache

Another type of cache architecture is known as direct mapped cache.In this the address is divided into three fields named tag(20 bits),index(7 bits used as an index the 128 lines in cache) and offset(5 bits).The problem with this type of cache is that the cache is less efficient since the main memory cannot be copied to any line in cache as in fully associative cache.This is because the addresses with the same index will be mapped to the same line in cache.But the cache access time is less here.In certain situations you may get a cache miss for almost every access.So they are cheap but less efficient.

Direct mapped Cache

Another type of cache is called set associative cache which has the advantages of both direct mapped and fully associative caches.These are again subdivided based on the number of bits in the index field.

2. 2 way set associative cache - In this type of cache we have two group of lines,each containing 64 lines.The cache has the same number of fields as direct mapped cache but tag has 21 bits and index has 6 bits here.

2. 4 way set associative cache - Here we have 4 groups each contains 32 lines.index has 5 bits and tag has 22 bits.

2-way and 4-way set associative caches