|From: Frank Sully||9/20/2021 8:16:33 PM|
|WHERE CHINA’S LONG ROAD TO DATACENTER COMPUTE INDEPENDENCE LEADS|
September 20, 2021 Timothy Prickett Morgan
The Sunway TaihuLight machine has a peak performance of 125.4 petaflops acrpss 1-,649,600 cores. It sports 1.31 petabytes of main memory. To put the peak performance figure in some context, recall that the current (by far top) supercomputer until this announcement had been Tianhe-2 with 33.86 pea petaflop capability. One key difference, other than the clear peak potential, is that TianhuLight came out of the gate with demonstrated high performance on real-world applications, some of which are able to utilize over 8 million of the machine’s 10 million-plus cores.
While we are big fans of laissez faire capitalism like that of the United States and sometimes Europe – right up to the point where monopolies naturally form and therefore competition essentially stops, and thus monopolists need to be regulated in some fashion to promote the common good as well as their own profits – we also see the benefits that accrue from a command economy like that which China has built over the past four decades.
A recently rumored announcement of a GPU designed by Chinese chip maker Jingjia Micro and presumably etched by Semiconductor Manufacturing International Corp (SMIC), the indigenous foundry in China that is playing catch up to Taiwan Semiconductor Manufacturing Co, Intel, GlobalFoundries, and Samsung Semiconductor, got us to thinking about this and what it might mean when – and if – China ever reaches datacenter compute independence.
Taking Steps Five Years At A Time
While China has been successful in many areas, particularly in becoming the manufacturing center of the world, it has not been particularly successful in achieving independence in datacenter compute. Some of that has to do with the immaturity of its chip foundry business, some of it has to do with its experience in making big, wonking, complex CPU and GPU designs that can take on the big loads in the datacenter. China has a bit of a chicken and egg problem here, and as usual, the smartphone and tablet markets is giving the Middle Kingdom’s chip designers and foundries the experience they need to take it up another notch to take on the datacenter.
The motivations are certainly there for China to achieve chip independence. The current supply chain issues in semiconductors as well as the messy geopolitical situation between China and the United States, which draws in Taiwan, South Korea, Japan, and Europe as well. Like every other country on Earth, China has an imbalance between semiconductor production and semiconductor consumption, and that is partly a function of the immense amount of electronics and computer manufacturing that has been moved to China over the past two decades.
According to Dauxe Consulting, which provides research into the Chinese market, back in 2003 China consumed about 18.5 percent of semiconductors (that’s revenue, not shipments), which was a little bit less than the Americas (19.4 percent), Europe (19.4 percent), or Japan (23.4 percent). SMIC was only founded in 2000 and had negligible semiconductor shipment revenue at the time. Fast forward to 2019, which is the last year for which data is publicly available, and China’s chip manufacturing accounts for about 30 percent of chip revenues in the aggregate, but the chips that Chinese companies buy to build stuff account for over 60 percent of semiconductor consumption (which is revenues going to SMIC as well as all of the other foundries, big and small, around the world). This is a huge imbalance, and it is not surprising that the Chinese government wants to achieve chip independence.
While there may be strong political and economic reasons why Chinese chip independence might mean China’s reach outside of its own markets diminishes in proportion to how much it can take care of its own business. China can compel its own state, regional, and national governments as well as state-controlled businesses to Buy China, but it can’t do that outside of its political borders. It can make companies and governments in Africa and South America attractive orders they probably won’t refuse. It will be a harder sell indeed in the United States and Europe and their cultural and economic satellites.
More about that in a moment.
Let’s start our Chinese datacenter compute overview with that GPU chip from Jingjia Micro that we heard about last week as a starting point because it illustrates the problem China has. We backed through all of the stories and found that a site called MyDrivers is the originator of the story, as far as we can see, and has this table nicked from Jingjia Micro to show how the JM9 series of GPUs stacks up against the Nvidia GeForce GTX 1050 and GTX 1080 GPUs that debuted in late 2015 and that started shipping in 2016 in volume:
There are two of these JM9 series GPUs from Jingjia, and they are equal or better to the Nvidia equivalents. The top end JM9271 is the interesting one as far as we are concerned because it has a PCI-Express 4.0 interface and thanks to HBM2 stacked memory weighing in at 16 GB, it has twice the capacity of the GTX 1080 and at 512 GB/sec of bandwidth has 60 percent more memory bandwidth at 512 GB/sec while burning 11.1 percent more power and delivering 9.8 percent lower performance at 8 teraflops at FP32 single precision.
This Jingjia card is puny compared to the top-of-the-line “Ampere” GA100 GPU engine from Nvidia, which runs at 1.41 GHz, has 40 GB or 80 GB of HBM2E stacked memory, and 19.49 teraflops at single precision. The cheaper Ampere GA102 processor used in the GeForce RTX 3090 gamer GPU (as well as the slower RTX 3080) runs at 1.71 GHz, has 24 GB of GDDR6X memory, and delivers an incredible 35.69 teraflops at FP32 precision– and has ray tracing accelerators that can also be used to boost machine learning inference. The Ampere A100 and RTX 3090 devices burn 400 watts and 350 watts, respectively, because the laws of physics must be obeyed. If you want to run faster these days, you also have to run hotter because Moore’s Law transistor shrinks are harder to come by.
Architecturally speaking, the JM9 series is about five years behind Nvidia, with the exception of the HBM memory and the PCI-Express 4.0 interface. The chip is implemented in SMIC’s 28 nanometer processes, which is not even close to the 14 nanometer process that SMIC has working or its follow-on, which is akin to TSMC’s 10 nanometer node and Samsung’s 8 nanometer node (the latter process being used to make the Ampere RTX GPUs). Jingjia is hanging back, getting its architecture out there and tested before it jumps to a process shrink. TSMC has had 28 nanometer in the field for a decade now.
This is not even close to China’s best effort. Tianshu Zhixin is working on a 7 nanometer GPU accelerator called “Big Island” that looks to be etched by TSMC and including its CoWoS packaging (the same one used by Nvidia for its GPU accelerator cards). The Big Island GPU is aimed squarely at HPC and AI acceleration in the datacenter, not gaming, and it will absolutely be competitive if the reports (on very thin data and a lot of big talk it looks like) pan out. Another company called Biren Technology is working on its own GPU accelerator for the datacenter, and thin reports out of China say the Biren chip, etched using TSMC 7 nanometer processes, will compete with Nvidia’s next-gen “Hopper” GPUs. We shall see when Biren ships its GPU next year.
We are skeptical of such claims, and reasonably so. If you looked at the plan for the “Godson” family of MIPS-derived and X86-eumlating processors that were created by the Institute of Computing Technology at the Chinese Academy of Sciences. (You know CAS, they are the largest shareholder in Chinese IT gear maker Lenovo.) We reported with great interest on the Godson processors (also known by the synonymous name Loongson) and the roadmap to span them from handhelds to supercomputers way back in February 2011. These processors made their way into the Dawning 6000 supercomputers made by Sugon, but as far as we know they did not really get any of the traction that Sugon had hoped in the datacenter.
It remains to be seen if the Loongson 3A5000 clone of the AMD Epyc processor, which is derived from the four-core Ryzen chiplet used in the “Naples” Epyc processor from 2017 and which is said to have its own “in-house” GS464V microarchitecture (oh, give me a break. . . .), will do better in the broader Chinese datacenter market. With the licensing limited to the original Zen 1 cores and the four-core chiplets, the AMD-China joint venture, called Tianjin Haiguang Advanced Technology Investment Co, has the Chinese Academy of Sciences as a big (but not majority) shareholder, and it is expected that a variant of this processor will be at the heart of at least one of China’s exascale HPC systems.
By the way, the old VIA Technologies (the third company with an X86 license) has partnered with the Shanghai Municipal Government to create the Zhaoxin Semiconductor partnership, which makes client devices based on the X86 architecture. Zhaoxin could be tapped to make a big, bad X86 processor at some point. Why not?
Thanks to being blacklisted by the US government, Huawei Technologies, one of the dominant IT equipment suppliers on Earth, has every motivation to help create an indigenous and healthy market for CPUs, GPUs, and other kinds of ASICs in China, and has a good footing with the design efforts of its arm’s length (pun intended) fabless semiconductor division, HiSilicon. The HiSilicon Kunpeng CPUs and Kirin GPUs hew pretty close to the Arm Holdings roadmaps, which is fine, and there is no reason to believe that if properly motivated – meaning enough money is thrown at it and China takes an attitude that it is going to be very aggressive with Hauwei sales outside of the United States and Europe – it could do more custom CPUs and even GPUs. It could acquire Jingia, Tianshu Zhixin, or Biren, for that matter.
For a while there, it looks like Suzhou PowerCore, a revamped PowerPC re-implementer that joined IBM’s OpenPower Consortium and that delivered a variant of the Power8 processor for the Chinese market, might try to extend into the Power9 and Power10 eras with its own Power chip designs. But that does not seem to have happened, or if it did, it is being done secretly.
The future Sunway exascale supercomputer at the National Supercomputing Center in Wuxi, which is one of the three exascale systems being funded by the Chinese government. It has a custom processor, a kicker to the SW26010 processor used in the original Sunway TaihuLight supercomputer, which also dates from 2016. The SW26010 had 260 cores, 256 of them skinny cores for doing math and four of the fat cores for managing data that feeds the cores, and we think that the Sunway exascale machine won’t have a big architectural change, but have some tweaks, add more compute element blocks to the die, and ride down the die shrink to reach exascale. The SW26010 and its kicker, which we have jokingly called the SW52020 because it has double of everything, mixes architectural elements of CPUs and math accelerators, much as Fujitsu’s A64FX Arm chips do. The A64FX is used in the “Fugaku” pre-exascale supercomputer at the RIKEN lab in Japan. Hewlett Packard Enterprise is reselling the A64FX in Apollo supercomputer clusters, but as far as we know, no one is reselling SW26010 in any commercial machines.
Arm server chip maker Phytium made a lot of noise back in 2016 with its four-core “Earth” and 64-core “Mars” Arm server chips, but almost immediately went mostly dark thanks to the trade war between the US and China that really got going in 2018.
The most successful indigenous accelerator to be developed and manufactured in China is the Matrix2000 DSP accelerator used at the National Super Computer Center in Guangzhou. That Matrix2000 chip, which uses DPs to do single-precision and double-precision math acceleration in an offload model from CPU hosts, just like GPUs and FPGAs, was created because Intel’s “Knights” many-core X86 accelerators were blocked for sale to China back in 2013 for supercomputers. The Matrix2000 DSP engines, along with the proprietary TH-Express 2+ interconnect, were deployed in the Tianhe-2A supercomputer with 4.8 teraflops of oomph each at FP32 single precision. That was back in 2015, mind you, when the GTX 1080 was being unveiled by Nvidia, for comparison.
As far as we know, these Matrix2000 DSP engines were not commercialized beyond this system and the upcoming Tianhe-3 exascale system, which will use a 64-core Phytium 2000+ CPU and a Matrix2000+ DSP accelerator. One-off or two-off compute engines are interesting, of course, but they don’t change the world except inasmuch as they show what can be done with a particular technology. But the real point is to bring such compute engines to the masses, thereby lowering their unit costs as volumes increase.
And China surely has masses. But a lot of Chinese organizations, both in government and in industry, have free will when it comes to architectures. But that could change. China could whittle down the choices for datacenter compute to a few architectures, all of them homegrown and all of them isolated from the rest of the world. It has enough money – and enough market of its own – to do that.
|RecommendKeepReplyMark as Last Read|
|From: Frank Sully||9/21/2021 11:32:00 AM|
|3 Top Artificial Intelligence Stocks to Buy in September|
Nvidia, Palantir, and Salesforce are all solid AI stocks.
Sep 21, 2021 at 8:45AM
- Nvidia’s data center GPU sales will surge as the world’s software platforms process more machine learning and AI tasks.
- Palantir’s margin expansion rate suggests its platforms have impressive pricing power.
Many investors might think of sentient robots when tech pundits discuss the booming artificial intelligence (AI) market. However, intelligent robots only represent a tiny silver of a worldwide AI market that is projected to grow at a compound annual growth rate (CAGR) of 35.6% from 2021 to 2026, according to Facts and Factors.
- Salesforce expects its annual revenue to more than double in five years as companies automate.
A large portion of that market actually revolves around algorithms and software platforms that help companies make data-driven decisions, automate repetitive tasks, streamline their operations, and cut costs.
Let's examine three top AI stocks that will benefit from the market's expansion.
IMAGE SOURCE: GETTY IMAGES.
Nvidia ( NASDAQ:NVDA) is the world's top producer of discrete GPUs. It controlled 83% of the market in the second quarter of 2021, according to JPR, while its rival Advanced Micro Devices controlled the remaining 17%.
Nvidia's discrete GPUs are usually associated with high-end PC gaming, but it also sells high-end GPUs for data centers that process AI and machine learning tasks more efficiently than stand-alone CPUs.
Nvidia's main data center products include its A100 Tensor Core GPU and the DGX A100 AI system, which bundles together eight A100 GPUs. All three of the public cloud leaders -- Amazon, Microsoft, and Alphabet's Google -- currently use Nvidia's A100 GPUs to power some of their AI services.
Nvidia also acquired the data center networking equipment maker Mellanox last April to further strengthen that core business. Nvidia's data center revenue surged 124% to $6.7 billion, or 40% of its top line, in fiscal 2021 (which ended in January). Its total revenue rose 53% to $16.7 billion.
Analysts expect Nvidia's revenue and earnings to rise 54% and 65%, respectively, this year, as it sells more gaming and data center GPUs. It faces some near-term headwinds with the ongoing chip shortage and its delayed takeover of Arm Holdings, but its stock still looks reasonably valued at 47 times forward earnings.
Palantir ( NYSE:PLTR) is a data mining and analytics company that operates two main platforms: Gotham for government agencies and Foundry for large enterprise customers.
IMAGE SOURCE: GETTY IMAGES.
Palantir's platforms collect data from disparate sources, process it with AI algorithms, and help organizations make informed decisions. The U.S. military uses Gotham to plan missions, while the CIA -- one of Palantir's earliest investors -- uses it to gather intel. Palantir leverages that hardened reputation to attract big enterprise customers like BP and Rio Tinto to its Foundry platform.
Palantir's revenue rose 47% to $1.1 billion in 2020, and it expects its revenue to grow at least 30% annually from 2021 to 2025. That ambitious forecast suggests it will generate more than $4 billion in revenue in 2025. Palantir isn't profitable yet, but its adjusted gross and operating margins are expanding and suggest its platforms still have impressive pricing power.
Palantir's stock isn't cheap at 37 times this year's sales, but its ambitious growth targets and its ultimate goal of becoming the "default operating system for data across the U.S. government" make it a top AI stock to buy.
Salesforce ( NYSE:CRM) is the world's largest cloud-based customer relationship management (CRM) service provider. It also provides cloud-based e-commerce, marketing, and analytics services.
Salesforce's services help companies manage their sales teams and customer relationships more efficiently, automate tasks, and reduce their overall dependence on on-site human employees. It unites all those platforms with its data visualization platform Tableau, its newly acquired enterprise communication platform Slack, and its AI-powered Einstein assistant.
Salesforce's revenue rose 24% to $21.25 billion in fiscal 2021 (which ended this January), and it expects to more than double its annual revenue to over $50 billion by fiscal 2026. It expects that growth to be buoyed by the secular expansion of all of its five main end markets -- which include sales, service, marketing & commerce, platform, and analytics & integration.
That's an impressive forecast for a stock that trades at 59 times forward earnings and less than 10 times this year's sales. A few concerns about Salesforce's $27.7 billion takeover of Slack have been depressing the stock's valuations lately, but it's still well-poised to profit from a growing need for cloud-based CRM services and other AI-powered data-crunching tools.
|RecommendKeepReplyMark as Last Read|
|From: Frank Sully||9/22/2021 11:21:14 AM|
|NVIDIA Calls UK AI Strategy “Important Step,” Will Open Cambridge-1 Supercomputer to UK Healthcare Startups September 22, 2021|
Sept. 22, 2021 — NVIDIA today called the U.K. government’s launch of its AI Strategy an important step forward, and announced a program to open the Cambridge-1 supercomputer to U.K. healthcare startups.
David Hogan, vice president of Enterprise EMEA at NVIDIA, said, “Today is an important step in furthering the U.K.’s strategic advantage as a global leader in AI. NVIDIA is proud to support the U.K.’s AI ecosystem with Cambridge-1, the country’s most powerful supercomputer, and our Inception program that includes more than 500 of the U.K.’s most dynamic AI startups.”
NVIDIA will also today announce the next phase for Cambridge-1, in which U.K.-based startups will be able to submit applications to harness the system’s capabilities, during a talk at the Wired Health: Tech conference by Kimberly Powell, NVIDIA’s vice president of Healthcare.
“AI and digital biology are reshaping the drug discovery process, and startups are by definition on the bleeding edge of innovation,” she said. “Cambridge-1 is the modern instrument for science and we look forward to opening the possibilities for discovery even wider to the U.K. startup ecosystem.”
Powell will also describe work underway with U.K. biotech company and NVIDIA Inception member Peptone, which will have access to Cambridge-1. Peptone is developing a protein engineering system that blends generative AI models and computational molecular physics to discover therapies to fight inflammatory diseases like COPD, psoriasis and asthma.
“Access to the compute power of Cambridge-1 will be a game-changer in our effort to fuse computation with laboratory experiments to change the way protein drugs are engineered,” said Dr. Kamil Tamiola, Peptone CEO and founder. “We plan to use Cambridge-1 to vastly improve the design of antibodies to help treat numerous inflammatory diseases.”
NVIDIA anticipates that giving U.K. startups the opportunity to use Cambridge-1 will accelerate their work, enabling them to bring innovative products and services to market faster, as well as ensure that the U.K. remains a compelling location in which to develop and scale up their businesses.
Startups that are selected for the new program will not only gain access to Cambridge-1. They will also be invited to meet with the system’s founding partners to amplify collaboration potential, and access membership benefits of NVIDIA Inception, a global program designed to nurture startups, which has more U.K. startups as members than from any other country in Europe.
Founding partners of Cambridge-1 are: AstraZeneca, GSK, Guy’s and St Thomas’ NHS Foundation Trust, King’s College London, and Oxford Nanopore Technologies.
NVIDIA Inception provides startups with critical go-to-market support, training, and technology. Benefits include access to hands-on, cloud-based training through the NVIDIA Deep Learning Institute, preferred pricing on hardware, invitations to exclusive networking events, opportunities to engage with venture capital partners and more. Startups in NVIDIA Inception remain supported throughout their entire life cycle, helping them accelerate both platform development and time to market.
Startup applications can be submitted here before December 30 at midnight GMT, with the announcement of those selected expected early in 2022.
Launched in July 2021, Cambridge-1 is the U.K.’s most powerful supercomputer. It is the first NVIDIA supercomputer designed and built for external research access. NVIDIA will collaborate with researchers to make much of this work available to the greater scientific community.
Featuring 80 DGX A100 systems integrating NVIDIA A100 GPUs, Bluefield-2 DPUs and NVIDIA HDR InfiniBand networking, Cambridge-1 is an NVIDIA DGX SuperPOD that delivers more than 400 petaflops of AI performance and 8 petaflops of Linpack performance. The system is located at a facility operated by NVIDIA partner Kao Data.
Cambridge-1 is the first supercomputer NVIDIA has dedicated to advancing industry-specific research in the U.K. The company also intends to build an AI Center for Excellence in Cambridge featuring a new Arm-based supercomputer, which will support more industries across the country.
Cambridge-1 was launched with five founding partners: AstraZeneca, GSK, Guy’s and St Thomas’ NHS Foundation Trust, King’s College London, and Oxford Nanopore.
NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market and has redefined modern computer graphics, high performance computing and artificial intelligence. The company’s pioneering work in accelerated computing and AI is reshaping trillion-dollar industries, such as transportation, healthcare and manufacturing, and fueling the growth of many others. More information at https://nvidianews.nvidia.com.
|RecommendKeepReplyMark as Last Read|
|From: Frank Sully||9/22/2021 12:30:20 PM|
|This Catalyst Could Give Nvidia Stock a Big Boost|
The graphics card specialist is making waves in a lucrative market.
- Cloud gaming adoption is increasing at a terrific pace.
- Nvidia has already made a solid dent in the cloud gaming space with the GeForce NOW service.
The video gaming industry has been a big catalyst for Nvidia ( NASDAQ:NVDA) in recent years, helping the company clock terrific revenue and earnings growth and boosting its stock price as gamers have lapped up its powerful graphics cards to elevate their gaming experience.
- GeForce NOW's increased coverage, affordable pricing, and big game library will be tailwinds for Nvidia in this market.
The good news for Nvidia investors is that graphics card demand is going to boom in the coming years, and the company is in a solid position to take advantage of that thanks to its dominant market share. However, there is an additional catalyst that could give Nvidia's video gaming business a big shot in the arm over the next few years: cloud gaming. Let's take a closer look at the cloud gaming market, and check how Nvidia is looking to make the most of this multibillion-dollar opportunity.
NVDA DATA BY YCHARTS
Cloud gaming adoption is growing rapidly
Newzoo, a provider of market research and analytics for video gaming and esports, estimates that the global cloud gaming market is on track to generate $1.6 billion in revenue this year, with the number of paying users jumping to 23.7 million. That may not look like a big deal for Nvidia right now given that it has generated nearly $22 billion in revenue in the trailing 12 months. However, the pace at which the cloud gaming market is growing means that it could soon reach a point where it moves the needle in a big way for Nvidia.
Newzoo estimates that the cloud gaming market could hit $6.5 billion in revenue by 2024, growing more than four-fold compared to this year's estimated revenue. The research firm also points out that the addressable user market for cloud gaming could be as big as 165 million by the end of 2021, indicating that there are millions of users out there that could buy cloud gaming subscriptions.
In fact, Newzoo points out that 94% of the gamers it surveyed have either tried cloud gaming already or are willing to try it, which means that the market could quickly expand. Nvidia is becoming a dominant player in the cloud gaming space, which could add billions of dollars to its revenue in the long run.
IMAGE SOURCE: GETTY IMAGES
Nvidia is pulling the right strings to tap this massive opportunity
Nvidia pointed out in March this year that its GeForce NOW cloud gaming service was nearing 10 million members. This is impressive considering that the service was launched in February 2020 with a subscription costing $5 per month. The company is now offering a premium subscription service priced at $9.99 per month or $99.99 a year.
The introductory $5-a-month subscription will remain available to members who were already on that plan before the new Priority membership was rolled out. This effectively means that the new GeForce NOW customers will increase Nvidia's revenue per user from the cloud gaming business. It wouldn't be surprising to see the service gain traction among gamers because of the benefits on offer.
The premium subscription will give gamers access to ray-tracing-enabled games, as well as its deep learning super sampling (DLSS) feature that upscales selected games to a higher resolution for a more immersive experience. What's more, Nvidia has a library of 1,000 PC (personal computer) games on the GeForce NOW platform, giving gamers a wide range of titles to choose from.
It is also worth noting that Nvidia is rapidly opening new data centers and upgrading the capacity of existing ones to capture more of the cloud gaming market. The company has 27 data centers that enable GeForce NOW in 75 countries.
Another important insight worth noting is that 65% of Nvidia's 10 million GeForce NOW members play games on underpowered PCs or Chromebooks. Those users wouldn't have been able to run resource-hungry games without Nvidia's data centers, which do the heavy lifting and transmit the gameplay to users' screens. Nvidia says that 80% of the gaming sessions on GeForce NOW take place on devices that wouldn't have been able to run those games locally because of weak hardware or incompatibility.
This explains why the demand for cloud gaming has spiked substantially -- consumers need not invest in expensive hardware, nor do they need to buy game titles separately. They can simply buy subscriptions from Nvidia and choose from over a thousand games that the GeForce NOW library provides.
More importantly, Nvidia is expanding into new markets such as Southeast Asia, while bolstering its presence in other areas such as Latin America and the Middle East. As such, the company's GeForce NOW subscriber count could keep growing at a fast clip in the future.
Gauging the financial impact
With paying users of cloud gaming expected to hit nearly 24 million this year and Nvidia already having scored 10 million GeForce NOW subscribers, the company has got off to a good start in this market.
The addressable market that Nvidia could tap into is also expected to hit 165 million potential subscribers by the end of 2021, as discussed earlier. If Nvidia manages to corner half of those potential paying cloud gaming subscribers in the next few years and get $100 a year from each subscriber (based on the annual GeForce NOW subscription plan), the company could be looking at substantial annual revenue from the cloud gaming business. This should give investors yet another reason to buy this growth stock that is already winning big in graphics cards and data centers.
|RecommendKeepReplyMark as Last Read|
|From: Frank Sully||9/22/2021 2:01:11 PM|
|NVIDIA Extends AI Inference Performance Leadership, with Debut Results on Arm-based Servers|
The latest MLPerf benchmarks show NVIDIA has extended its high watermarks in performance and energy efficiency for AI inference to Arm as well as x86 computers.
September 22, 2021 by DAVE SALVATOR
NVIDIA delivers the best results in AI inference using either x86 or Arm-based CPUs, according to benchmarks released today.
It’s the third consecutive time NVIDIA has set records in performance and energy efficiency on inference tests from MLCommons, an industry benchmarking group formed in May 2018.
And it’s the first time the data-center category tests have run on an Arm-based system, giving users more choice in how they deploy AI, the most transformative technology of our time.
Tale of the Tape
NVIDIA AI platform-powered computers topped all seven performance tests of inference in the latest round with systems from NVIDIA and nine of our ecosystem partners including Alibaba, Dell Technologies, Fujitsu, GIGABYTE, Hewlett Packard Enterprise, Inspur, Lenovo, Nettrix and Supermicro.
And NVIDIA is the only company to report results on all MLPerf tests in this and every round to date.
Inference is what happens when a computer runs AI software to recognize an object or make a prediction. It’s a process that uses a deep learning model to filter data, finding results no human could capture.
MLPerf’s inference benchmarks are based on today’s most popular AI workloads and scenarios, covering computer vision, medical imaging, natural language processing, recommendation systems, reinforcement learning and more.
So, whatever AI applications they deploy, users can set their own records with NVIDIA.
Why Performance Matters
AI models and datasets continue to grow as AI use cases expand from the data center to the edge and beyond. That’s why users need performance that’s both dependable and flexible to deploy.
MLPerf gives users the confidence to make informed buying decisions. It’s backed by dozens of industry leaders, including Alibaba, Arm, Baidu, Google, Intel and NVIDIA, so the tests are transparent and objective.
Flexing Arm for Enterprise AI
The Arm architecture is making headway into data centers around the world, in part thanks to its energy efficiency, performance increases and expanding software ecosystem.
The latest benchmarks show that as a GPU-accelerated platform, Arm-based servers using Ampere Altra CPUs deliver near-equal performance to similarly configured x86-based servers for AI inference jobs. In fact, in one of the tests, the Arm-based server out-performed a similar x86 system.
NVIDIA has a long tradition of supporting every CPU architecture, so we’re proud to see Arm prove its AI prowess in a peer-reviewed industry benchmark.
“Arm, as a founding member of MLCommons, is committed to the process of creating standards and benchmarks to better address challenges and inspire innovation in the accelerated computing industry,” said David Lecomber, a senior director of HPC and tools at Arm.
“The latest inference results demonstrate the readiness of Arm-based systems powered by Arm-based CPUs and NVIDIA GPUs for tackling a broad array of AI workloads in the data center,” he added.
Partners Show Their AI Powers
NVIDIA’s AI technology is backed by a large and growing ecosystem.
Seven OEMs submitted a total of 22 GPU-accelerated platforms in the latest benchmarks.
Most of these server models are NVIDIA-Certified, validated for running a diverse range of accelerated workloads. And many of them support NVIDIA AI Enterprise, software officially released last month.
Our partners participating in this round included Dell Technologies, Fujitsu, Hewlett Packard Enterprise, Inspur, Lenovo, Nettrix and Supermicro as well as cloud-service provider Alibaba.
The Power of Software
A key ingredient of NVIDIA’s AI success across all use cases is our full software stack.
For inference, that includes pre-trained AI models for a wide variety of use cases. The NVIDIA TAO Toolkit customizes those models for specific applications using transfer learning.
Our NVIDIA TensorRT software optimizes AI models so they make best use of memory and run faster. We routinely use it for MLPerf tests, and it’s available for both x86 and Arm-based systems.
We also employed our NVIDIA Triton Inference Server software and Multi-Instance GPU ( MIG) capability in these benchmarks. They deliver for all developers the kind of performance that usually requires expert coders.
Thanks to continuous improvements in this software stack, NVIDIA achieved gains up to 20 percent in performance and 15 percent in energy efficiency from previous MLPerf inference benchmarksjust four months ago.
All the software we used in the latest tests is available from the MLPerf repository, so anyone can reproduce our benchmark results. We continually add this code into our deep learning frameworks and containers available on NGC, our software hub for GPU applications.
It’s part of a full-stack AI offering, supporting every major processor architecture, proven in the latest industry benchmarks and available to tackle real AI jobs today.
To learn more about the NVIDIA inference platform, check out our NVIDIA Inference Technology Overview.
|RecommendKeepReplyMark as Last Read|
|From: Frank Sully||9/22/2021 2:30:09 PM|
|In The Latest AI Benchmarks, Nvidia Remains The Champ, But Qualcomm Is Rising Fast|
NVIDIA rules the performance roost, Qualcomm demonstrates exceptional power efficiency, and Intel demonstrates the power of software.
Every three months, the not-for-profit group MLCommons publishes a slew of peer-reviewed MLPerf benchmark results for deep learning, alternating between training and inference processing. This time around, it was Inference Processing V1.1. Over 50 members agree on a set of benchmarks and data sets they feel are representative of real AI workloads such as image and language processing. And then the fun begins.
From what I hear from vendors, these benchmarks are increasingly being used in Requests for Proposals for AI gear, and also serve as a robust test bed for engineers of new chip designs and optimization software. So everyone wins, whether or not they publish. This time around NVIDIA, Intel, and Qualcomm added new models and configurations, and results were submitted from Dell, HPE, Lenovo, NVIDIA, Inspur, Gigbyte, Supermicro, and Netrix.
And the winner is…
Before we get to the acccelerators, a few comments about inference processing. Unlike training, where the AI job is the job, inference is usually a small part of an application, which of course runs on an x86 server. Consequently, Intel pretty much owns the data center inference market; most models perform quite will on Xeons. Aware of this, Intel has continually updated the hardware and software to run faster to keep those customers happy and on the platform. We will cover more details on this in a moment, but for models requiring more performance or dedicated throughput for more complex models, accelerators are the way to go.
On that front, Nvidia remains the the fastest AI accelerator for every workload, on a single chip basis. On a system level, however, a 16-card Qualcomm delivered the fastest ResNet-50 performance with 342K images per second, over an 8-GPU Inspur server at 329K. While Qualcomm increased their model coverage, Nvidia was the only firm to submit benchmark results for every AI model. Intel submitted results for nearly all models in data center processing.
Qualcomm shared results for both the Snapdragon at the edge, and the Cloud AI100, both offering rock solid performance with absolute leadership power efficiency across the board. Here’s some of the data.
Nvidia, as usual, clobbered the inference[-]incumbant, which is the Intel Xeon CPU. Qualcomm out-performed the NVIDIA A30, however.
Every year, Nvidia demonstrates improved performance, even on the same hardware, thanks to continuous improvements in its software stack. In particular, TensorRT improves performance and efficiency by pre-processing the neural network, performing functions such as quantization to lower-precision formats and arithmetic. But the star of the Nvidia software show for inference is increasingly the Triton Inference Server, which manages the run-time optimizations and workload balancing using Kubernetes. Nvidia has open-sourced Triton, and it now supports x86 CPUs as well as Nvidia GPUs. In fact, since it is open, Triton could be extended with backends for other accelerators and GPUs, saving startups a lot of software development work.
Nvidia demonstrated nearly 50% better performance[+]NVIDIA
In some applications, power efficiency is a critical factor for success, but only if the platform achieves the required performance and latency for the models being deployed. Thanks to years of research in AI and Qualcomm’s mobile processor legacy, the AI engines in Snapdragon and the Cloud AI100 deliver both, with up to half the power per transaction for many models versus the Nvidia A100, and nearly four times the efficiency of the Nvidia A10.
Qualcomm Cloud AI100 offered the best performance[-]per watt of all other submissions.
Back to Intel, the new Ice Lake Xeon performed quite well, with up to 3X improvment over the previous Cooper Lake CPU for DLRM (recommendation engines), and 1.5X on other models. Recommendation engines represent a huge market in which Xeon rules, and for which other contenders are investing heavily, so this is a very good move for Intel.
Intel demonstrated 50%-300% better performance[-]with Ice Lake vs, the previous Cooper Lake results. This was accomplished in part by the improvements the engineering team has realized in the software stack.
Intel also demonstrated significant performance improvements in the development stack for Xeon. The most dramatic improvement was again for DLRM, in which sparce data and weights are common. In this case, Intel delivered over 5X performance improvement on the same hardware.
Intel shared the five-fold performance improvement[-]from sparsity, on the same chip.
As followers of Cambrian-AI know, we believe that MLPerf presents vendors and users with valuable benchmarks, a real-world testing platform, and of course keeps analysts quite busy slicing and dicing all the data. In three months we expect a lot of exciting results, and look forward to sharing our insights and analysis with you then.
|RecommendKeepReplyMark as Last Read|
|From: Frank Sully||9/22/2021 2:48:55 PM|
|Nvidia cosies up to Open Robotics for hardware-accelerated ROS|
Hopes to tempt roboticists over to its Jetson platform with new simulation features, drop-in acceleration code
Gareth HalfacreeWed 22 Sep 2021
Nvidia has linked up with Open Robotics to drive new artificial intelligence capabilities in the Robot Operating System (ROS).
The non-exclusive agreement will see Open Robotics extending ROS 2, the latest version of the open-source robotics framework, to better support Nvidia hardware – and in particular its Jetson range, low-power parts which combine Arm cores with the company's own GPU and deep-learning accelerator cores to drive edge and embedded artificial intelligence applications.
"Our users have been building and simulating robots with Nvidia hardware for years, and we want to make sure that ROS 2 and Ignition work well on those platforms," Brian Gerkey, Open Robotics' chief exec, told The Register.
"We get most excited by two things: robots and open source. This partnership has both. We're working together with Nvidia to improve the developer experience for the global robotics community by extending the open source software on which roboticists rely. We're excited to work directly with Nvidia and have their support as we extend our software to take maximum advantage of their hardware."
The team-up will see Open Robotics working on ROS to improve the data flow between the various processors – CPU, GPU, NVDLA, and Tensor Cores – on Nvidia's Jetson hardware as a means to boost processing of high-bandwidth data.
As part of that, Open Robotics' Ignition and Nvidia's Isaac Sim simulation environments are to gain interoperability – meaning robot and environment models can be moved from one to the other, at least when the software is finished some time early next year.
As for why Nvidia's accelerated computing portfolio, and in particular its embedded Jetson family of products, should appeal to robot-makers, Gerkey said: "Nvidia has invested heavily in compute hardware that's relevant for modern robotics and AI workloads. Robots ingest and process large data volumes from sensors such as cameras and lasers. Nvidia's architecture allows that data flow to happen incredibly efficiently."
Murali Gopalakrishna, head of product management, Intelligent Machines, at Nvidia said of the hookup: "Nvidia's GPU-accelerated computing platform is at the core of many AI robot applications and many of those are developed using ROS, so it is logical that we work closely with open robotics to advance the field of robotics.
The work also brings with it some new Isaac GEMs, hardware-accelerated packages for ROS designed to replace code which would otherwise run on the CPU. The latest GEMs include packages for handling stereo imaging and point cloud data, colour space conversion, lens distortion correction, and the detection and processing of AprilTags – QR Code-style 2D fiducial tags developed at the University of Michigan.
The partnership doesn't mean the two are going steady, though. "We are eager to extend ROS 2 in similar ways on other accelerated hardware," Gerkey told us of planned support for other devices like Intel's Myriad X and Google's TPU– to say nothing of GPU hardware from Nvidia rival AMD.
"In fact, we plan for the work we do together with Nvidia to lay the foundation for additional extensions for additional architectures. To other hardware manufacturers: please contact us to talk about extensions for your platform!"
The latest Isaac GEMs are available on Nvidia's GitHub repository now; the interoperable simulation environments, meanwhile, aren't expected to release until the (northern hemisphere) spring of 2022.
Nvidia's Gopalakrishna said it was possible for ROS developers to begin experimenting before the release date. "The simulator already has a ROS 1 and ROS 2 bridge and has examples of using many of the popular ROS packages for navigation (nav2) and manipulation (MoveIT). Many of these developers are also leveraging Isaac Sim to generate synthetic data to train the perception stack in their robots. Our spring release will bring additional functionality like interoperability between Gazebo Ignition and Isaac Sim."
When we asked what performance uplift could users expect from the new Isaac GEMs compared to CPU-only packages, we were told: "The amount of performance gain will vary depending on how much inherent parallelism exists in a given workload. But we can say that we are seeing an order of magnitude increase in performance for perception and AI related workloads. By using the appropriate processor to accelerate the different tasks, we see increased performance and better power efficiency."
As for additional features in the pipeline, Gopalakrishna said: "Nvidia is working with Open Robotics to make the ROS framework more streamlined for hardware acceleration and we will also continue to release multiple new Isaac GEMs, our hardware accelerated software packages for ROS.
"Some of these will be DNNs which are commonly used in robotics perception stacks. On the simulator side, we are working to add support for more sensors and robots and release more samples that are relevant to the ROS community." ®
|RecommendKeepReplyMark as Last Read|
|From: Frank Sully||9/22/2021 2:55:11 PM|
|Next Generation: ‘Teens in AI’ Takes on the Ada Lovelace Hackathon|
September 22, 2021
by LIZ AUSTIN
Jobs in data science and AI are among the fastest growing in the entire workforce, according to LinkedIn’s 2021 Jobs Report.
Teens in AI, a London-based initiative, is working to inspire the next generation of AI researchers, entrepreneurs and leaders through a combination of hackathons, accelerators, networking events and bootcamps.
In October, the organization, with support from NVIDIA, will host the annual Ada Lovelace Hackathon, created for young women ages 11-18 to get a glimpse of all that can be done in the world of AI.
Inspired By AI
The need to embolden young women to join the tech industry is great.
Only 30 percent of the world’s science researchers are women. And fewer than one in five authors at leading AI conferences are women, about the same ratio of those teaching AI-related subjects, according to the AI Now Institute.
Founded by social entrepreneur Elena Sinel, Teens in AI is trying to change that. It aims to give young people — especially young women — early exposure to AI that’s being developed and deployed to promote social good.
The organization, which was launched at the 2018 AI for Good Global Summit at the United Nations, has an expansive network of mentors from some of the world’s leading companies. These volunteers work with students and inspire them to use AI to address social, humanitarian and environmental challenges.
“A shortage of STEM skills costs businesses billions of dollars every year, impacting UK businesses alone by about £1.5 billion a year,” Sinel said. “Yet with so few girls — especially those from disadvantaged backgrounds — studying STEM, we are?depriving ourselves?of?potential?talent.”
Sinel said that Teens in AI makes STEM education approachable and increases exposure to female role?models,?showing young women that a bright? STEM ?career isn’t reserved for only males.
“We can’t do this on our own, so we’re constantly on the lookout for like-minded corporate partners like NVIDIA who will work with us to grow this community of young people who want to make the world more inclusive and sustainable,” she said.
Ada Lovelace Hackathon
With the company’s support, the Ada Lovelace Hackathon — named for the 19th century mathematician who is often regarded as the first computer programmer — showcases speakers and mentors to encourage young women to pursue a career in AI. This year’s event is expected to reach more than 1,000 girls from 20+ countries.
Participants will have the opportunity to receive prizes and get access to NVIDIA Deep Learning Institute credits for more advanced hands-on training and experience.
NVIDIA employees around the world will serve as mentors and judges.
Kate Kallot, head of emerging areas at NVIDIA, judged last year’s Ada Lovelace Hackathon, as well as August’s Global AI Accelerator Program for Teens in AI.
“I hope to inform and inspire young people in how they can help fuel applications and the AI revolution,” Kallot said. “While there’s a heavy demand for people with technical skills, what’s also needed is a future AI workforce that is truly reflective of our diverse world.”
Kallot talked more about the importance of fighting racial biases in the AI industry on a recent Teens in AI podcast episode.
NVIDIA’s support of Teens in AI is part of our broader commitment to bringing more diversity to tech, expanding access to AI education and championing opportunities for traditionally underrepresented groups.
This year, we announced a partnership with the Boys & Girls Clubs of Western Pennsylvania to develop an open-source AI and robotics curriculum for high school students. The collaboration has given hundreds of Jetson Nano developer kits to educators in schools and nonprofits, through the NVIDIA Jetson Nano 2GB Developer Kit Grant Program.
NVIDIA also works with minority-serving institutions and diversity-focused professional organizations to offer training opportunities — including free seats for hands-on certification courses through the NVIDIA Deep Learning Institute.
Driving the Future of AI
AI is growing at an incredible rate. The AI market is predicted to be worth $360 billion by 2028, up from just $35 billion in 2020, and is expected to add $880 billionto the U.K. economy by 2035.
Over 90 percent of leading businesseshave an ongoing investment in AI, 23 percent of customer service organizations are using AI-powered chatbots and 46 percent of people are using AI every single day.
In such a landscape, encouraging young people across the globe to embark on their AI journeys is all the more important.
Learn more about Teens in AI and the NVIDIA Jetson Grant Program.
|RecommendKeepReplyMark as Last Read|
|From: Frank Sully||9/22/2021 9:49:07 PM|
|Robotics Gets A Jolt From An Open Robotics-Nvidia Partnership|
Materials handling robot using AI perception
Nvidia and Open Robotics announced a partnership to enhance the ROS 2 (Robot Operating System) development suite. The partnership essentially combines the two most powerful robotics development environments and the two largest groups of robotics developers.
First released in 2010, ROS has been a key open-source platform for robotics developers supported by various companies in a variety of industries and government research organizations like DARPA and NASA. While the platform has continued to grow and includes the Ignition simulation environment, it has been primarily targeting traditional CPU computing models. Over the past several years, however, Nvidia has pioneered heterogeneous and AI computing for IoT and edge applications through the development of its Jetson platforms, software development kits (SDKs) like Isaac for robotics, toolkits like Nvidia TAO (Train, Adapt, and Optimize) for simplifying AI model development and deployment, and Omniverse Isaac Sim for synthetic data generation and robotics simulation. Both environments are open to developers, provide valuable code, models, data sets, and simulation resources. Now the two can be combined into Nvidia’s Omniverse collaborative development environment to allow developers to simultaneously develop everything from the physical robot to synthetic data sets to train the robot.
The Jetson product family and Isaac Robotics[-]platform
For the ROS developers, this opens a world of possibilities. Pulling ROS into the Nvidia environment offers the developer the ability to leverage offload/acceleration engines like a GPU, shared memory, and predesigned hardware acceleration algorithms Nvidia calls Isaac Gems. Thus far, Nvidia is offering three Gems for image processing and DNN-based perception models, including SGM Stereo Disparity and Point Cloud, Color Space Conversion and Lens Distortion Correction, and AprilTags Detection. The performance lift from offloading depends on the specific algorithm, but Nvidia expects that some will result in an order of magnitude improvement in performance versus the same implementation on a CPU. In addition, the Isaac Sim includes support for ROS and ROS2 algorithms, including ROS April Tag, ROS Stereo Camera, ROS Services, Movelt Motion Planning Framework, Native Python ROS Usage, and ROS2 Navigation. The Isaac Sim can also be used to generate synthetic data to train and test perception models. The predesigned algorithms combined the synthetic data allow even the most novice developer or startup to quickly develop robotic platforms.
ROS developers seeking to add AI technologies to their products will also be able to leverage other Nvidia SDKs, such as Fleet Command for remote system management, Riva for conversational AI, and Deepstream for video streaming analytics. Most importantly, from Tirias Research’s perspective is the ability to leverage the Omniverse environment, which allows multiple simultaneous users with seamless interaction between tools, and the massive amounts of new data and machine learning (ML) models being developed by Nvidia.
Although, Nvidia has SDKs for various applications, such as Isaac for robotics, Clara for healthcare, and Drive for autonomous vehicles, the ML models for each of these segments are increasingly overlapping. When discussing this point the Nvidia’s General Manager of Robotics Murali Gopalakrishna, Mr. Gopalakrishna indicated that there is considerable crossover in the development of the SDKs and models for many of the applications. According to Mr. Gopalakrishna “the only difference is the data; the decisions are still the same.” As a result, the advances in one market or application typically benefits multiple markets and applications.
Worldwide forecast for robots
According to data from Statista, the robotics market is projected to grow at over a 25% rate annually, an increase from approximately 20% prior to COVID. COVID is pushing the use of robotics in everything from healthcare and manufacturing to agriculture and food delivery. Leveraging the advancements in AI, sensors, wireless communications (5G), and semiconductor technology, robotics is rapidly moving into the mainstream of society. By 2025, the global robotics market will reach $210 billion, but that is a fraction of the value of the products and services that will be generated by robotics. Having evaluated various development platforms and tools, I can attest to the value of the resources that the Nvidia Isaac and ROS platforms offer developers. Both make it easy for developers to begin developing new robotic platforms but the combination of the two, for lack of a better way to describe it, democratizes robotic development and AI for robotics. The joining of the two environments also brings together the two largest robotics develop communities, both focused on open-source collaboration.
|RecommendKeepReplyMark as Last Read|