High Bandwidth Memory: The greatest innovation that you don’t care about.
Not long ago, even as late as 2009, the desktop computer was dead. Its eulogy was being written by multiple popular tech blogs. The microprocessor transistor process was seemingly stuck at 45 nanometers, prompting the IEEE to declare the limit to Moore’s Law had been reached. Major developers such as EA and Konami were largely ignoring the platform. Sales had hit new lows for many PC manufacturers, such as Dell, and even the staunchest of the PC faithful were starting to wonder whether or not the desktop computer could survive the onslaught of touchscreen devices, smartphones and netbooks (yes, those netbooks). Even nVidia, long the purveyor of state-of-the-art GPUs, was taking heat for their 2009 release of their 300-series GeForce cards, which were just a rebrand of their 200-series cards.
Enter Nehalem. While the Core series of processors were technically introduced in 2006, 2009’s Nehalem, named after a river in Oregon, brought with it a new type of silicon (nicknamed Hi-k) and, with it, higher frequencies and better performance in the same power and thermal envelope as its predecessors. For the first time, a major leap in computing power took place without a change in manufacturing process. Shortly after Intel released Nehalem, nVidia followed with the GTX 400 and 500 series cards; each capable of exceeding 1GB of VRAM for the first time. Since then, both Intel and nVidia have dominated the processor and GPU markets and have led the way in terms of pure performance, performance per watt and thermal efficiency. In 2015, Intel moved to a 14 nanometer manufacturing process with 10 nanometers not far behind. Intel has crashed through the supposed wall built by Moore’s Law and today, with Skylake, offers powerful processors that are affordable and efficient.
It seems, however, that CPUs are beginning to level off in terms of generational improvements. While the move to DDR4 memory has improved latency, the differences in benchmarks between the 2011 Sandy Bridge i7-2600K and the 2015 Skylake i7-6700k aren’t all that impressive. GPUs, on the other hand, have exploded in recent years. nVidia’s 900-series cards from 2014-15 are widely considered to be the best ever in price-to-performance. The GTX 970, released in 2014, is 40%+ faster than its predecessor, 2013’s GTX 770. As GPUs continue to increase in power while CPUs level off, the demand to shift computing tasks to the GPU continues to grow. System-on-a-chip (SoC) offerings in smartphones continue to get more powerful, and GPU-outsourcing has become an extremely important factor in mobile programming. Introducing the GPU into mobile device task management saves battery life and decreases processing time. There is, however, a fundamental hurdle to merging the CPU and GPU: memory.
To fully understand the importance of volatile memory, you must first understand how it’s used by a CPU and a GPU. A CPU, in layman’s terms, consists of circuits performing mathematical operations on data supplied primarily by addressable memory, or RAM. RAM is shared throughout a PCs systems and “loaned out” to the processor when needed. RAM, however, isn’t a CPUs only means of storing data. CPUs contain caches which are typically plenty large enough to store copies of data from frequently used memory locations. Simply: CPUs have their own memory source for most operations. In a GPU, external memory serves a much greater purpose. GPUs need to manipulate as much memory as possible as quickly as possible. Consider most 3D games are being refreshed 30-60 times per second while being played. Every second, a GPU must write 60 frames worth of pixel data to memory. This rendering is being done continuously with all of the immediately-accessible visual data being stored in the GPUs accompanying memory. Just like in a CPU, this updating is being done in a cycle between the GPUs memory banks and its cores – except that in a GPU, there are hundreds of slower cores operating in parallel versus a handful of superfast cores operating in serial in a CPU. These hundreds of cores greatly increase the memory demand and create the system bottlenecks and limitations presented by many lower-end graphics cards. This memory limitation, and not actual processing power, is the biggest difference separating graphics cards. In addition to the volatile memory demand, the width of the memory bus – the “road” connecting the memory to the controller giving the instructions – limits data transfer rates.
High bandwidth memory (HBM), developed by fledgling AMD in collaboration with Hynix, has already produced dramatic bandwidth gains over the aging GDDR5 standard. While GDDR5 is still about three times faster than even DDR4, largely thanks to a bus that can be up to five times as wide, this speed comes at the expense of higher latencies. GDDR5 also takes up space and pulls power away from the GPU – in fact, we’re beginning to reach a point where the “power drain” of GDDR5 modules on GPUs will start to become detrimental to performance. HBM, instead of surrounding the GPU with power-independent memory modules, stacks DRAM on top of a substrate with the GPU (or CPU) in the middle. This stack not only allows the die to live closer to the GPU, but also increases the memory bus to 1024-bits. Imagine if your boss was yelling instructions to you through a megaphone – that’s what a 1024-bit wide lane between memory and controller means for efficiency. Since the chips are sitting on top of one another and connected through a tunnel, the stacks can be powered as individual units, increasing power efficiency by as much as a factor of four. While these chips will run at a lower frequency than GDDR5, their tremendously higher bandwidth (>1,000GB/s versus up to 28 GB/s) is more than enough to compensate. Plus, since GPUs rely on parallel processing, this frequency reduction isn’t nearly as impactful as it would be in the CPU-DDR relationship.
In addition to GPUs, HBM has exciting applications in mobile computing and desktop processors. In 2015, Intel released a largely-unsung series of desktop processors based on the Broadwell architecture. The series’ flagship i7, the 5775C, was quickly replaced by the new Skylake processors and was simultaneously overshadowed by the pre-existing and popular Haswell and Haswell-E processors. It never stood a chance. Why do I bring up the 5775C? Because it’s likely the best SoC ever released by Intel and it likely wasn’t even meant to be. The graphics-rendering power of the 5775C and its accompanying Iris Pro 6200 architecture blew out of the water anything Haswell had to offer. This was due to the 128MB of embedded dynamic RAM (eDRAM) included on the die. This eDRAM has yet to resurface, probably due to how expensive eDRAM is and the dominance of dedicated graphics cards in the enthusiast space. But the 5775C was an important landmark for Intel, even if no one was paying attention. Here was a powerful i7 with enough graphical muscle to keep up with low-to-mid-range graphics cards. Even AMDs fastest CPU-GPU combo, the A10-7800, couldn’t keep up. This foray into eDRAM by Intel will likely serve as a precursor to the power of HBM on a CPU die. Just like L3 caches used to be off-chip, eventually eDRAM and HBM caches will be standard on the die. Within a few years and a couple of generations, we could very well see i7s with enough graphical power to render 3D games at high resolutions. The mobile space will benefit in much the same way. While Samsung recently had a breakthrough by squeezing 6GB onto a mobile DRAM chip, HBM is likely the next step for smartphones. With HBM, new benchmarks in terms of battery life, processing power and 3D performance will be set on mobile devices. Gaming-centric devices like the nVidia Shield will maybe, just maybe, be able to move mobile gaming beyond the pay-to-play, repetitive funk it’s been stuck in for years.
So, should you care about HBM? Right now, unless you’re the techiest of techies, the answer is likely no. But HBM is changing the computing game. AMD’s Fury Nano, easily the most impressive small form-factor graphics card ever made, has already been showing off the power of HBM. nVidia’s 1000-series cards (codename: Pascal) are due out this year with the enthusiast models showcasing HBM as their primary selling point. Soon, HBM will be in your pocket. Or on your wrist. Or in your tablet.
2016 is primed to be a huge year for the PC industry. Enthusiast and gaming PCs are more cost-effective than ever. Developers have come back to the platform in droves over the last few years and Steam has given consoles a run for their money. HBM will only further the reality that the PC is back and it isn’t going anywhere this time.