|We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor. We ask that you disable ad blocking while on Silicon Investor in the best interests of our community. If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.|
NVIDIA makes 3-D graphics processors that are built into products made by PC original equipment manufacturers (OEMs) and add-in board makers. The company's RIVA128 graphics processor combines 3-D and 2-D graphics on a single chip and is designed to provide a lower cost alternative to multi-chip or multi-board graphics systems. Customers include STB Systems (63% of sales) and Diamond Multimedia Systems (31%). These two companies incorporate NVIDIA's processors into add-in boards that are then sold to OEMs such as Compaq, Dell, Gateway, and Micron Technology. NVIDIA is fighting patent-infringement suits filed by 3DFX Interactive, Silicon Graphics and S3 that seek to block the sale of its RIVA processors.
Web Site: nvidia.com
Expected IPO date: Week of Jan. 18, 1999
Update August 12, 2021
NVIDIA still makes its bread and butter with graphics chips (graphics processing units or GPUs) and it is dominant (NVIDIA's share of the graphics chip market grew from 75% in Q1 2020 to 81% in Q1 2021, far ahead of closest competitor AMD). Nvidia's Graphics segment includes the GeForce GPUs for gaming and PCs, the GeForce NOW game-streaming service and related infrastructure, and solutions for gaming platforms. It also includes the Quadro/NVIDIA RTX GPUs for enterprise workstation graphics, vGPU software for cloud-based visual and virtual computing, and automotive platforms for infotainment systems. In 2020, the Graphics segment generated $9.8 billion, or about 59%, of Nvidia's total revenue. This was up 28.7% compared to the previous year. The segment's operating income grew 41.2% to $4.6 billion, comprising about 64% of the total. The Compute and Networking segment includes Nvidia's Data Center platforms as well as systems for AI, high-performance computing, and accelerated computing. It also includes Mellanox networking and interconnected solutions, automotive AI Cockpit, autonomous driving development agreements, autonomous vehicle solutions, and Jetson for robotics and other embedded platforms. The Compute and Networking segment delivered revenue of $6.8 billion in 2020, up 108.6% from the previous year. The segment accounts for about 41% of Nvidia's total revenue. Operating income grew 239.3% to $2.5 billion. Compute & Networking accounts for about 36% of the company's total operating income.
AI is considered by management and observers to be the future, and NVIDIA even incorporates AI into its RTX graphics chips with DLSS (DLSS stands for deep learning super sampling. It's a type of video rendering technique that looks to boost frame rates by rendering frames at a lower resolution than displayed and using deep learning, a type of AI, to upscale the frames so that they look as sharp as expected at the native resolution.) Data centers is a fast-growing area and last year it introduced the data processing unit (DPU). One of Nvidia’s newer concepts in AI hardware for data centers is the BlueField DPU (data processing unit) for data centers, first revealed at the GTC in October 2020. In April 2021 the company unveiled BlueField-3, a DPU it said was designed specifically for “AI and accelerated computing.” Like Nvidia GPUs, its DPUs are accelerators, meaning they are meant to offload compute-heavy tasks from a system’s CPU, leaving the latter with more capacity to tackle other workloads. DPUs are powered by Arm chips. Nvidia DPUs, based on the BlueField SmartNICs by Mellanox (acquired by Nvidia in 2019), take on things like software-defined networking, storage management, and security workloads. They’re also eventually expected to offload server virtualization, via a partnership with VMware as part of VMware’s Project Monterey.
NVIDIA is actively involved in AI supercomputers. NVIDIA technologies power 342 systems on the TOP500 list released at the ISC High Performance event in June 2021, including 70 percent of all new systems and eight of the top 10.The latest ranking of the world’s most powerful systems shows high performance computing centers are increasingly adopting AI. It also demonstrates that users continue to embrace the combination of NVIDIA AI, accelerated computing and networking technologies to run their scientific and commercial workloads. NVIDIA is at the forefront of developing autonomous vehicles. NVIDIA is at the forefront of virtual reality and AI applications with its Omniverse.
Here is a one and a half hour video of the May 2021 GTC Keynote by NVIDIA CEO Jensen Huang, discussing the latest developments in graphics chips, data centers, supercomputers, autonomous vehicles and the Omniverse. Just this week it was revealed that the entire GTC Keynote, including Jensen Huang and his kitchen, was simulated in Omniverse. Scroll to 13:00 for start.
Nvidia formed in 1993 and immediately began work on its first product, the NV1. Taking two years to develop, the NV1 was officially launched in 1995. An innovative chipset for its time, the NV1 was capable of handling both 2D and 3D video, along with included audio processing hardware. Following Sega's decision to use the NV1 inside of its Saturn game console, Nvidia also incorporated support for the Saturn controller, which enabled desktop graphics cards to also use the controller. A unique aspect of the NV1's graphics accelerator is that it used quadratic surfaces as the most basic geometric primitive. This created difficulties for game designers to add support for the NV1 or to design games for it. This became increasingly problematic when Microsoft released its first revision of the DirectX gaming API, which was designed with polygons as the most basic geometric primitive.
Nvidia started work on the NV2 as a successor to the NV1, but after a series of disagreements with Sega, Sega opted to use PowerVR technology inside of its Dreamcast console and the NV2 was cancelled. The Riva 128, also known as the NV3, launched in 1997 and was considerably more successful. It switched from using quadrilaterals as the most basic geometric primitive to the far more common polygon. This made it easier to add support for the Riva 128 in games. The GPU also used polygon texture mapping with mixed results. This allowed the GPU to render frames more quickly, but it had reduced image quality.
In 1998, Nvidia introduced its most explosive card to date, the Riva TNT (code named "NV4"). Similar to the NV3, the NV4 was capable of rendering both 2D and 3D graphics. Nvidia improved over the NV3 by enabling support for 32-bit "True Color," expanding the RAM to 16MB of SDR SDRAM and increasing performance. Although the AGP slot was becoming increasingly popular, a large number of systems didn't contain one, so Nvidia sold the NV4 primarily as a PCI graphics accelerator and produced a relatively small number of AGP-compatible cards. Starting with the Riva TNT, Nvidia made a strong effort to regularly update its drivers in order to improve compatibility and performance. In 1999, Nvidia made another grab for the performance crown with the Riva TNT2 (codenamed "NV5"). The Riva TNT2 was architecturally similar to the original Riva TNT, but thanks to an improved rendering engine it was able to perform about 10 to 17 percent faster than its predecessor at the same clock speed. Nvidia also added support for AGP 4X slots, which provided more bandwidth to the card, and doubled the amount of VRAM to 32MB. Probably the most significant improvement was the transition to 250 nm, which allowed Nvidia to clock the Riva TNT2 up to 175 MHz.
In late 1999, Nvidia announced the GeForce 256 (code-named "NV10"). Prior to the GeForce 256, essentially all video cards were referred to as "graphics accelerators" or simply as "video cards," but Nvidia opted to call the GeForce 256 a "GPU." Nvidia packed in several new features with this card including hardware T&L (Transform and Lighting) processing, which allowed the GPU to perform calculations that were typically relegated to the CPU. Since the T&L Engine was fixed-function hardware designed specifically for this task, its throughput was roughly five times higher than a then high-end Pentium III processor clocked at 550 MHz. Nvidia followed the NV10 GeForce 256 up with the GeForce2. The architecture of the GeForce2 was similar to the its predecessor, but Nvidia was able to double the TMUs attached to each pixel pipeline by further shrinking the die with 180 nm transistors. Nvidia used three different cores, codenamed NV11, NV15, and NV16 inside of GeForce2-branded cards. All of these cores used the same architecture, but NV11 contained just two pixel pipelines while the NV15 and NV16 cores had four, and NV16 operated at higher clock rates than NV15.
In 2001, the GeForce3 (codenamed "NV20") arrived as Nvidia's first DirectX 8-compatible card. The core contained 60 million transistors manufactured at 150 nm, which could be clocked up to 250 MHz. Nvidia introduced a new memory subsystem on the GeForce3 called "Lightspeed Memory Architecture" (LMA), which was designed to compress the Z-buffer and reduce the overall demand on the memory's limited bandwidth. It was also designed to accelerate FSAA using a special algorithm called "Quincunx." Overall performance was higher than the GeForce2, but due to the complexity of the GPU it was fairly expensive to produce, and thus carried a high price tag in comparison. NVIDIA would once again find itself back in the home console market as a key component of Microsoft's original Xbox in 2001. The Xbox used hardware nearly identical to what you would find inside of modern PCs at that time, and the GPU designed by Nvidia was essentially a tweaked GeForce3. Just like the NV20 GPU, the NV2A inside of the Xbox contained four pixel pipelines with two TMUs each. Nvidia also created the Xbox's audio hardware known as MCPX, or "SoundStorm".
Nvidia started to shake things up in 2002 by introducing several GPUs based on different architectures. All of these were branded as GeForce4. At the low-end of the GeForce4 stack was the NV17, which was essentially an NV11 GeForce2 die that had been shrunk using 150 nm transistors and clocked between 250 and 300 MHz. It was a drastically simpler design compared to the NV20, which made it an affordable product that Nvidia could push to both mobile and desktop markets. With the NV17 covering the lower-half of the market, Nvidia launched NV25 to cover the high-end. The NV25 was developed as an improvement upon the GeForce3's architecture, and essentially had the same resources with four pixel pipelines, eight TMUs, and four ROPs. The NV25 did have twice as many vertex shaders (an increase from one to two), however, and it featured the updated LMA-II system. Overall, the NV25 contained 63 million transistors, just 3 million more than the GeForce3. The GeForce4 NV25 also had a clock speed advantage over the GeForce3, ranging between 225 and 300 MHz. The 128MB DDR memory was clocked between 500 to 650 MHz. In 2002, the gaming world welcomed the arrival of Microsoft's DirectX 9 API, which was one of the most heavily used and influential gaming APIs for several years. ATI and Nvidia both scrambled to develop DX9-compliant hardware, which meant the new GPUs had to support Pixel Shader 2.0. ATI beat Nvidia to the market in August 2002 with its first DX9-capable cards, but by the end of 2002 Nvidia launched its FX 5000 series.
Just one year after the launch of the FX 5000 series, Nvidia released the 6000 series. The GeForce 6800 Ultra was Nvidia's flagship powered by the NV40. With 222 million transistors, 16 pixel superscalar pipelines (with one pixel shader, TMU, and ROP on each), six vertex shaders, Pixel Shader 3.0 support, and 32-bit floating-point precision, the NV40 had vastly more resources at its disposal than the NV30. This is also not counting native support for up to 512MB of GDDR3 over a 256-bit bus, giving the GPU more memory and better memory performance than its predecessor. These GPUs were produced with the same 130nm technology as the FX 5000 series. After Nvidia secured its position at the high-end of the GPU market, it turned its attention to producing a new mid-range graphics chip known as the NV43. This GPU was used inside of the Nvidia GeForce 6600, and it had essentially half of the execution resources of the NV40. It also relied on a narrower 128-bit bus. The NV43 had one key advantage, however, as it was shrunk using 110 nm transistors. The reduced number of resources made the NV43 relatively inexpensive to produce, while the new fabrication technology helped reduce power consumption and boost clock speeds by roughly 20% compared to the GeForce 6600. The GeForce 6800 was succeeded by the GeForce 7800 GTX, which used a new GPU code-named G70. Based on the same 110 nm technology as NV43, the G70 contained a total of 24 pixel pipelines with 24 TMUs, eight vertex shaders, and 16 ROPs. The GPU could access to up to 256MB of GDDR3 clocked at up to 600 MHz (1.2 GHz DDR) over a 256-bit bus. The core itself operated at 430 MHz.
Nvidia introduced its Tesla microarchitecture with the GeForce 8000 series - the company's first unified shader design. Tesla would become one of Nvidia's longest-running architectures, as it was used inside the GeForce 8000, GeForce 9000, GeForce 100, GeForce 200, and GeForce 300 series of GPUs. Nvidia continued to use the Tesla architecture in its GeForce 9000 series products, but with a few revisions. Nvidia's G92 core inside the 9000-series flagship was essentially just a die shrink of G80. By fabricating G92 at 65 nm, Nvidia was able to hit clock speeds ranging from 600 to 675 MHz all while reducing overall power consumption. Nvidia later released the GeForce 9800 GTX with a single G92 core clocked at 675 MHz and 512MB of GDDR3. This 9800 GTX was slightly faster than the 8800 Ultra thanks to its higher clock speed, but it also ran into issues due to its limited RAM capacity. Eventually, Nvidia created the GeForce 9800 GTX+ with a new 55 nm chip code-named G92B. This allowed Nvidia to push clock speed up to 738 MHz, but the most significant improvement that the 9800 GTX+ possessed was its 1GB of memory. Nvidia introduced the GT200 core based on an improved Tesla architecture in 2008. Changes made to the architecture included an improved scheduler and instruction set, a wider memory interface, and an altered core ratio. Whereas the G92 had eight Texture Processor Clusters (TPC) with 16 EUs and eight TMUs, the GT200 used ten TPCs with 24 EUs and eight TMUs each. Nvidia also doubled the number of ROPs from 16 in the G92 to 32 in the GT200. The memory bus was extended from a 256-bit interface to a 512-bit wide connection to the GDDR3 memory pool.
Tesla and the GeForce 8000, 9000, 100, 200, and 300 series were followed by Nvidia's Fermi architecture and the GeForce 400 series in 2010. The largest Fermi chip ever produced was the GF100, which contained four GPCs. Each GPC had four Streaming Multiprocessors, with 32 CUDA cores, four TMUs, three ROPs, and a PolyMorph Engine. A perfect GF100 core shipped with a total of 512 CUDA cores, 64 TMUs, 48 ROPs, and 16 PolyMorph Engines. The GeForce GTX 580 was succeeded by the GTX 680, which used a GK104 based on the Kepler architecture. This marked a transition to 28 nm manufacturing, which is partially responsible for the GK104 being far more efficient than GF110. Compared to the GF110, GK104 also has twice as many TMUs and three times as many CUDA cores. The increase in resources didn't triple performance, but it did increase performance by between 10 and 30% depending on the game. Overall efficiency increased even more.Nvidia introduced its Maxwell architecture in 2014 with a focus on efficiency. The initial flagship, the GM204, launched inside the GeForce GTX 980. A key difference between Maxwell and Kepler is the memory sub-system. GM204 has a narrower 256-bit bus, but Nvidia achieved greater utilization of the available bandwidth by implementing a powerful memory compression algorithm. The GM204 also utilizes a large 2MB L2 cache that further reduced the impact of the narrower memory interface.
The Pascal architecture succeeded Maxwell, and marked Nvidia's transition to a new 16 nm FinFET process. This helped to increase the architectural efficiency and drive up clock speed. The 314 mm square GP104 used inside the GeForce GTX 1080 contains a whopping 7.2 billion transistors. With 2560 CUDA cores, 160 TMUs, 64 ROPs, and 20 PolyMorph engines, the GeForce GTX 1080 was far more powerful than the GeForce GTX 980 Ti. Nvidia pushed the performance of the 1000 series further with the release of its GP102 GPU. This part features 3,840 CUDA cores with a 352-bit memory interface, and it is also produced on a 16nm process. It first appeared inside of the Titan X with a partially disabled die that left 3,584 cores clocked at 1,531MHz. It was equipped with 12GB of GDDR5X memory clocked at 10Gbps and had a max TDP of 250W.
Once again, Nvidia veered in a different direction with its Turing architecture. The addition of dedicated hardware for ray tracing (RTX) and A.I. (Tensor Cores) brings real-time ray tracing to the gaming world for the first time. It’s a quantum leap in terms of realistic lighting and reflection effects in games, and a rendering technique Nvidia’s CEO Jensen Huang calls the “Holy Grail” of the graphics industry. Nvidia first announced Turing as the foundation of new professional Quadro cards, but followed that up the next week with a trio of gaming-focused GeForce RTX cards, the 2070, 2080, and 2080 Ti. While the initial focus was all about ray tracing and AI-assisted super-sampling, Nvidia also promised the RTX 2080 would deliver performance improvements of between 35 and 125 percent compared to the previous-generation GTX 1080. But there was also a fair bit of initial backlash over significant generation-over-generation price increases, which pushed the RTX 1080 Ti Founders Edition card to an MSRP of $1,199, compared to the $699 launch price of the GTX 1080 Ti.
2020 would turn out to be a year of disparate fortunes. Against the background of a global pandemic, AMD, Intel, and Nvidia all released new graphics cards containing new architectures and product designs. Nvidia introduced an improved version of DLSS in March, which used a very different process to the first iteration. Now, the tensor cores in users' graphics cards would process the inference algorithm to upscale the image, and overall, the new system was well received. Desktop PC enthusiasts would have to wait to later in the year for a new batch of GPUs, but their patience was rewarded by the GeForce RTX 3000 and Radeon RX 6000 series of cards. Nvidia's models brought Ampere to the masses, although there were significant differences between the GA100 chip in the A100 and the GA102 that drove the RTX line-up. The latter was essentially an update of Turing, featuring improvements to the CUDA, Tensor, and RT cores.
NVIDIA GeForce RTX 3080
Taking a sample of AMD and Nvidia's largest GPUs over the years shows a vaguely linear trend in the growth of die sizes, but it also highlights how altering the process node can make an enormous difference (for example, compare Vega 10 and 20 sizes). However, there's far too much variation in the data for it to be used to reliably estimate what size of processor one could be seeing over the next ten years. Perhaps a better approach would be to look at the processing power the above GPUs offered, for the given unit density (i.e. millions of transistors per square millimetre). While peak FP32 throughput, measured in billions of floating point operations per second, isn't the only metric that should be used to judge the capability of a GPU, it is a comparable one. This is because general shader operations form the bulk of the processing load and will continue to do so for a while. When we look at a graph of those figures (below), it paints a rather different picture. There are outliers that affect the trends somewhat, but even with them removed, the overall pattern is broadly the same.
It shows us that Nvidia has consistently focused on increasing raw processing power with each new design -- something that makes sense given how the same chips are used in general and professional models. The same was true of AMD until they released RDNA, where the product is solely aimed at gaming.
AI And GPUs
Shall we play a game? How video games transformed AI Message 33443736
After 40 years in the wilderness, two huge breakthroughs are fueling an AI renaissance. The internet handed us a near unlimited amount of data. A recent IBM paper found 90% of the world’s data has been created in just the last two years. From the 290+ billion photos shared on Facebook, to millions of e-books, billions of online articles and images, we now have endless fodder for neural networks. The breathtaking jump in computing power is the other half of the equation. RiskHedge readers know computer chips are the “brains” of electronics like your phone and laptop. Chips contain billions of “brain cells” called transistors. The more transistors on a chip, the faster it is. And in the past decade, a special type of computer chip emerged as the perfect fit for neural networks. Do you remember the blocky graphics on video games like Mario and Sonic from the ‘90s? If you have kids who are gamers, you’ll know graphics have gotten far more realistic since then.
This incredible jump is due to chips called graphics processing units (GPUs). GPUs can perform thousands of calculations all at once, which helps create these movie-like graphics. That’s different from how traditional chips work, which calculate one by one. Around 2006, Stanford researchers discovered GPUs “parallel processing” abilities were perfect for AI training. For example, do you remember Google’s Brain project? The machine taught itself to recognize cats and people by watching YouTube videos. It was powered by one of Google’s giant data centers, running on 2,000 traditional computer chips. In fact, the project cost a hefty $5 billion. Stanford researchers then built the same machine with GPUs instead. A dozen GPUs delivered the same data crunching performance of 2,000 traditional chips. And it slashed costs from $5 billion to $33,000! The huge leap in computing power and explosion of data means we finally have the “lifeblood” of AI.
The one company with a booming AI business is NVIDIA (NVDA). NVIDIA invented graphics processing units back in the 1990s. It’s solely responsible for the realistic video game graphics we have today. And then we discovered these gaming chips were perfect for training neural networks. NVIDIA stumbled into AI by accident, but early on, it realized it was a huge opportunity. Soon after, NVIDIA started building chips specifically optimized for machine learning. And in the first half of 2020, AI-related sales topped $2.8 billion.
In fact, more than 90% of neural network training runs on NVIDIA GPUs today. Its AI-chips are light years ahead of the competition. Its newest system, the A100, is described as an “AI supercomputer in a box.” With more than 54 billion transistors, it’s the most powerful chip system ever created. In fact, just one A100 packs the same computing power as 300 data center servers. And it does it for one-tenth the cost, takes up one-sixtieth the space, and runs on one-twentieth the power consumption of a typical server room. A single A100 reduces a whole room of servers to one rack. NVIDIA has a virtual monopoly on neural network training. And every breakthrough worth mentioning has been powered by its GPUs. Computer vision is one of the world’s most important disruptions. And graphics chips are perfect for helping computers to “see.” NVIDIA crafted its DRIVE chips specially for self-driving cars. These chips power several robocar startups including Zoox, which Amazon just snapped up for $1.2 billion. With NVIDIA’s backing, vision disruptor Trigo is transforming grocery stores into giant supercomputers.
AI And Data Centers
AI And Autonomous Vehicles
AI And Robotics
AI And The Omniverse
In 2020 NVIDIA announced a $40 billion bid for AI chip designer ARM. This year there have been various regulatory inquiries, particularly by China and Great Britain due to monopoly concerns. Win or lose NVIDIA will thrive.
The future of AI chips is application-specific integrated circuit (ASIC), e.g., Google's Tensor Processing Unit (TPU).
The following video discusses the advantages of GPU over X86 CPU, and the advantages of ASIC over GPU.
The Future - Competition
Besides the obvious competition from Intel, AMD and Google's TPU, there are start-ups in both China and the West which want to dethrone NVIDIA as Emperor of AI Chips.
For a comprehensive discussion of AI and AI companies in general, see the Artificial Intelligence, Robotics and Automation board moderated by my friend Glenn Petersen. Subject 59856
Modern AI is based on Deep Learning algorithms. Deep learning is a subset of machine learning, which is essentially a neural network with three or more layers. These neural networks attempt to simulate the behavior of the human brain—albeit far from matching its ability—allowing it to “learn” from large amounts of data.
Deep Learning Algorithms for AI
The first one-hour video explains how this works. Amazingly, it is just least squares minimization of the neural network loss function using multi-dimensional Newton Raphson (gradient descent). See second one half hour video. Who thought Calculus would come in handy?
1. MIT Introduction to Deep Learning
2. Gradient Descent, Step-by-Step
Math Issues: Optimizing With Multiple Peaks Or Valleys
A problem with gradient descent optimization is that it can find minima of functions as well as maxima of functions. Worse, there can be multiple peaks and valleys, so more properly gradient descent finds local extrema. One is interested in machine learning, e.g., in global minima. This makes the problem considerably more difficult. This is particularly true since loss functions for deep learning neural networks can have millions or even billions of parameters.
Another problem has to do with the size of the data sets used to train deep learning neural networks, which can be huge. Since gradient descent is an iterative process, it becomes prohibitively time-consuming to evaluate the loss function at each and every data point, even with high-performance AI chips. This leads to stochastic gradient descent: the loss function is evaluated at a relatively small random sample of the data at each iterative step.
Stochastic Gradient Descent:
Exponential Growth Of NVIDIA
|© 2021 Knight Sac Media. Data provided by IEX, Alpha Vantage, Coinbase, Binance, Fintel and CityFALCON News|