We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor. We ask that you disable ad blocking while on Silicon
Investor in the best interests of our community. If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
IBM AI Hardware Research Center has delivered signifiant digital AI logic, and now turns their attention to solving AI problems in an entirely new way.
The IBM AI Hardware Research Center is located in[-]the TJ Watson Center near Yorktown Heights, New York. IBM
Gary Fritz, Cambrian-AI Research Analyst, contributed to this article.
AI is showing up in nearly every aspect of business. Larger and more complex Deep Neural Nets (DNNs) keep delivering ever-more-remarkable results. The challenge, as always, is power and performance.
NVIDIA has been the leader to beat for years in the data center, with Qualcomm and Apple leading the way in mobile. NVIDIA got an early start when they realized their multi-core graphics cards were a perfect match for the massive amounts of calculations required to train and execute DNNs. NVIDIA’s tech has spurred huge growth in the sector; NVIDIA chalked up just over $2B last quarter in data center revenue, and AI accounts for a large (although unknown) portion of that high-margin treasure trove.
Here comes Analog Computing
It’s tough to beat NVIDIA at their own game, so several vendors are taking a different approach. Mythic, a Silicon Valley startup, has already released their first analog computation engine, and IBM Research is investing in an analog computation roadmap. But before we dive further into the deep end of the pool, just what is analog computation?
Traditional computers use digital storage and digital math. Data values are stored as binary representations. The typical computer architecture has a compute section (one or more CPUs or GPUs) and a memory bank. The CPU shuffles data into the CPU/GPU for calculation, then shuffles the results back out to memory. This constant data motion greatly increases the performance overhead and energy cost of the operation.
Analog computation is an entirely different approach. Numeric values are represented by continuously variable circuit values (voltage levels, charge level, or other mechanisms) in analog memory cells. Analog calculations are handled by the analog circuitry in the memory array. Each cell is “programmed” with analog circuitry, and the resulting analog value represents the desired answer. Calculations are performed directly in the memory cell, so there is no need to move data back and forth to a CPU. The massively parallel calculations possible with this approach are a perfect match for the enormous matrix calculations required to train or execute a DNN.
IBM Research Sees Analog as the Next Big Thing in AI
IBM’s analog implementation uses memristive technology. Memristors are the fourth fundamental circuit component type in addition to resistors, capacitors, and inductors. IBM uses memristive Phase-Change Memory (PCM) or Resistive Memory (ReRAM) to store analog DNN synaptic weights. Circuits are built on the chip to do the desired calculations with the analog values. This includes forward propagation for DNN inference, and additional backward propagation for weight updates for training.
IBM plans to integrate analog compute engines alongside traditional digital calculations. An analog in-memory calculation engine could handle the large-scale DNN calculations, working in partnership with traditional CPU models.
IBM Research typically delivers technology through two channels. The first, of course, is to license the Intellectual Property to tech companies. The second is to turn their inventions into innovations in their own products. From a hardware standpoint, we could envision IBM building multi-chip modules that attach one or more Analog AI accelerators to systems, possibly using IBM’s future DBHi, or Direct Bonded Heterogeneous Integration, to interconnect the accelerator to a CPU. Also note that IBM recently announced on-die digital AI accelerators as part of the next generation Z system’s Telum processor. The reduced-precision arithmetic core was derived from the technology developed by the IBM Research AI Hardware Center.
This movie isn’t over yet. There remains significant work to do, and a lot of invention especially if IBM wants to train neural networks in analog. But IBM must feel fairly confident in their prospects to start writing blogs about the technology’s prospects. Data Center power efficiency is becoming a really big deal, with some projections forecasting a 5X increase in 10 years, to 10% of worldwide power consumption. We cannot afford that, and analog could make a huge dent in reducing that.
Significant challenges remain, but analog technology has terrific potential. For more information, see our Research Note here.
All things that move will become autonomous. And all the robots out there are getting smarter, fast! NVIDIA announced its latest initiatives to deliver a suite of perception technologies for developers seeking innovative ways to incorporate cutting-edge computer vision and AI/ML functionality into their ROS-based robotics applications. These new tools reduce time spent developing with ease as they improve performance within your software projects, making them easier than ever before.
NVIDIA and Open Robotics have entered into an agreement to accelerate the performance of ROS 2 on NVIDIA’s Jetson AI platform, as well as GPU-based systems. The two companies will also enable seamless simulation interoperability between Ignition Gazebo’s system and NVIDIA Isaac Sim on Omniverse. Software resulting from this partnership is expected to be released in the spring of 2022.
The Jetson platform is the go-to solution for robotics. It enables high-performance, low latency processing that helps robots be responsive and safe while also being collaborative. Open Robotics will be enhancing the ROS 2 framework to allow for efficient management of data flow and shared memory across GPU processors. This should significantly improve performance when processing applications that rely heavily on high bandwidth, such as lidar sensors in robotic systems. Apart from improved deployment on Jetson, Open Robotics and NVIDIA plan to integrate Ignition Gazebo and NVIDIA Isaac Sim.
By connecting these two simulators together, ROS developers can easily move their robots and environments between Ignition and Isaac Sim to run large-scale simulations. They will be able to use each simulator’s advanced features such as high fidelity dynamics or photorealistic rendering to generate synthetic data when training AI models.
Isaac GEMs have just been released for ROS with significant speedup! This is really exciting news and it’s great that you can try out these new features now.
Isaac GEMs for ROS are hardware accelerated packages that make it easier to build high-performance solutions on the Jetson platform. The focus of these GEMs is improving throughput in image processing and DNN based perception models, which have become increasingly important as roboticists work with their technologies more often than ever before. These GEMs reduce load while providing significant performance gain so developers can spend less time worrying about how much power they’re using or what kind of network connection best suits them at any given moment.
I am neutral on Nvidia ( NVDA), as its strong growth rate and bullish Wall Street consensus are offset by its fairly rich valuation.
Nvidia is an American multinational technology that is credited with the invention of the graphics processing units for gaming.
The company is a pioneer in designing systems on a chip for accelerated computing, self-driving cars, and AI, and is a leader in fueling growth in manufacturing, transportation, healthcare, and other industries.
Nvidia’s Q2 2021 earnings report announced that NVIDIA RTX is featured in over 130 games and applications, including Minecraft RTX and Adobe ( ADBE) products.
The game-lag reducing NVIDIA Reflex ecosystem is supported in 20 games, including some of the major e-sports titles. The company also announced that it launched NVIDIA Base Command and Fleet Command, which simpliy the management of edge AI through a cloud service, which is transformative for many industries.
Furthermore, Nvidia also launched the NVIDIA Omniverse, a real-time 3D simulation and virtual collaboration platform.
For the second quarter of 2021, Nvidia reported revenue of $6.5 billion, showing gains of 68% from last year, and an increase of 15% from the first quarter.
Gaming revenue came in at $3.1 billion, showing an 85% growth from the previous year. This was driven by an exceptionally strong demand in the Gaming category that outstripped supply, and the introduction of GeForce RTX 3080 Ti and GeForce RTX 3070 Ti graphic cards.
The Professional Visualization sector also recorded second-quarter revenue of $519 million, showing an increase of 40% from the first quarter of 2021, and an increase of 156% from the previous year.
The company announced record Data Center revenue of $2.4 billion, which is up 35% from a year earlier. The Automotive category also showed a revenue increase of 37%, resulting in $152 million in revenue.
Nvidia has a positive outlook for the current quarter, where it expects its revenue to rise to an estimated $6.8 billion, and GAAP and non-GAAP gross margins to rise to 65.2% and 67%.
Nvidia stock does not look particularly cheap or expensive here, as it is priced at a fairly high forward P/E ratio of 53.2x, but is also growing at a very strong clip.
Normalized earnings per share are expected to increase by 61.6% in 2022, and 11.8% in 2023.
Wall Street’s Take
From Wall Street analysts, Nvidia earns a Strong Buy analyst consensus, based on 23 Buy ratings, one Hold rating, and one Sell rating in the past three months. Additionally, the average NVDA price target of $237.27 puts the upside potential at 7.5%.
Summary and Conclusions
Nvidia is enjoying rapid growth, and has very strong support from Wall Street analysts. The stock is not extremely cheap, but is likely not extremely overvalued here either.
Disclosure: At the time of publication, Samuel Smith did not have a position in any of the securities mentioned in this article.
The human hand is one of the fascinating creations of nature, and one of the highly sought goals of artificial intelligence and robotics researchers. A robotic hand that could manipulate objects as we do would be enormously useful in factories, warehouses, offices, and homes.
Yet despite tremendous progress in the field, research on robotics hands remains extremely expensive and limited to a few very wealthy companies and research labs.
Now, new research promises to make robotics research available to resource-constrained organizations. In a paper published on arXiv, researchers at the University of Toronto, Nvidia, and other organizations have presented a new system that leverages highly efficient deep reinforcement learningtechniques and optimized simulated environments to train robotic hands at a fraction of the costs it would normally take.
Training robotic hands is expensive
OpenAI trained an AI-powered robotic hand to solve the Rubik’s Cube (Image source: YouTube)
In 2019, OpenAI presented Dactyl, a robotic hand that could manipulate a Rubik’s cube with impressive dexterity (though still significantly inferior to human dexterity). But it took 13,000 years’ worth of training to get it to the point where it could handle objects reliably.
How do you fit 13,000 years of training into a short period of time? Fortunately, many software tasks can be parallelized. You can train multiple reinforcement learning agents concurrently and merge their learned parameters. Parallelization can help to reduce the time it takes to train the AI that controls the robotic hand.
However, speed comes at a cost. One solution is to create thousands of physical robotic hands and train them simultaneously, a path that would be financially prohibitive even for the wealthiest tech companies. Another solution is to use a simulated environment. With simulated environments, researchers can train hundreds of AI agents at the same time, and then finetune the model on a real physical robot. The combination of simulation and physical training has become the norm in robotics, autonomous driving, and other areas of research that require interactions with the real world.
Simulations have their own challenges, however, and the computational costs can still be too much for smaller firms.
OpenAI, which has the financial backing of some of the wealthiest companies and investors, developed Dactyl using expensive robotic hands and an even more expensive compute cluster comprising around 30,000 CPU cores.
The TriFinger platform reduced the costs of robotic research but still had several challenges. PyBullet, which is a CPU-based environment, is noisy and slow and makes it hard to train reinforcement learning models efficiently. Poor simulated learning creates complications and widens the “sim2real gap,” the performance drop that the trained RL model suffers from when transferred to a physical robot. Consequently, robotics researchers need to go through multiple cycles of switching between simulated training and physical testing to tune their RL models.
“Previous work on in-hand manipulation required large clusters of CPUs to run on. Furthermore, the engineering effort required to scale reinforcement learning methods has been prohibitive for most research teams,” Arthur Allshire, lead author of the paper and a Simulation and Robotics Intern at Nvidia, told TechTalks. “This meant that despite progress in scaling deep RL, further algorithmic or systems progress has been difficult. And the hardware cost and maintenance time associated with systems such as the Shadow Hand [used in OpenAI Dactyl] … has limited the accessibility of hardware to test learning algorithms on.”
Building on top of the work of the TriFinger team, this new group of researchers aimed to improve the quality of simulated learning while keeping the costs low.
Training RL agents with single-GPU simulation The researchers trained their models in the Nvidia Isaac Gym simulated environment and transferred the learning to a remote Europe-based robotics lab The researchers replaced the PyBullet with Nvidia’s Isaac Gym, a simulated environment that can run efficiently on desktop-grade GPUs. Isaac Gym leverages Nvidia’s PhysX GPU-accelerated engine to allow thousands of parallel simulations on a single GPU. It can provide around 100,000 samples per second on an RTX 3090 GPU.
“Our task is suitable for resource-constrained research labs. Our method took one day to train on a single desktop-level GPU and CPU. Every academic lab working in machine learning has access to this level of resources,” Allshire said.
According to the paper, an entire setup to run the system, including training, inference, and physical robot hardware, can be purchased for less than $10,000.
The efficiency of the GPU-powered virtual environment enabled the researchers to train their reinforcement learning models in a high-fidelity simulation without reducing the speed of the training process. Higher fidelity makes the training environment more realistic, reducing the sim2real gap and the need for finetuning the model with physical robots.
The researchers used a sample object manipulation task to test their reinforcement learning system. As input, the RL model receives proprioceptive data from the simulated robot along with eight keypoints that represent the pose of the target object in three-dimensional Euclidean space. The model’s output is the torques that are applied to the motors of the robot’s nine joints.
The system uses the Proximal Policy Optimization (PPO), a model-free RL algorithm. Model-free algorithms obviate the need to compute all the details of the environment, which is computationally very expensive, especially when you’re dealing with the physical world. AI researchers often seek cost-efficient, model-free solutions to their reinforcement learning problems.
The researchers designed the reward of robotic hand RL as a balance between the fingers’ distance from the object, the object’s destination location, and the intended pose.
To further improve the model’s robustness, the researchers added random noise to different elements of the environment during training.
Testing on real robotsOnce the reinforcement learning system was trained in the simulated environment, the researchers tested it in the real world through remote access to the TriFinger robots provided by the Real Robot Challenge. They replaced the proprioceptive and image input of the simulator with the sensor and camera information provided by the remote robot lab.
The trained system transferred its abilities to the real robot a seven-percent drop in accuracy, an impressive sim2real gap improvement in comparison to previous methods.
The keypoint-based object tracking was especially useful in ensuring that the robot’s object-handling capabilities generalized across different scales, poses, conditions, and objects.
“One limitation of our method—deploying on a cluster we did not have direct physical access to—was the difficulty in trying other objects. However, we were able to try other objects in simulation and our policies proved relatively robust with zero-shot transfer performance from the cube,” Allshire said.
The researchers say that the same technique can work on robotic hands with more degrees of freedom. They did not have the physical robot to measure the sim2real gap, but the Isaac Gym simulator also includes complex robotic hands such as the Shadow Hand used in Dactyl.
This system can be integrated with other reinforcement learning systems that address other aspects of robotics, such as navigation and pathfinding, to form a more complete solution to train mobile robots. “For example, you could have our method controlling the low-level control of a gripper while higher level planners or even learning-based algorithms are able to operate at a higher level of abstraction,” Allshire said.
The researchers believe that their work presents “a path for democratization of robot learning and a viable solution through large scale simulation and robotics-as-a-service.”
AI, Robotics and Automation board gets refresh! This year the board has been much more active after I became interested in AI a year ago. Several months ago the board got a new name and new moderator (Glenn Petersen), who is revamping the Introduction header, which was focused on an extensive discussion of the Singularity. He has started with a new logo for the board. I like it.
Autonomous trucks need to lighten the load when it comes to mapping, while still perceiving their surrounding environments reliably.
That’s the approach Kodiak Robotics, a Silicon Valley-based self-driving truck startup, is taking to deploy safer and more efficient delivery and logistics. Today, the company unveiled its fourth-generation vehicle — powered by NVIDIA DRIVE Orin— that uses lightweight mapping and a discreet, modular hardware design to achieve level 4 self-driving capabilities.
By avoiding an over-reliance high-definition maps and focusing on a flexible architecture, Kodiak aims to deploy self-driving systems that are always accurate as well as straightforward to install and modify.
“The way you manufacture and maintain a system is incredibly important for the trucking industry, fleets must be able to stay up and running,” said Don Burnette, co-founder and CEO of Kodiak.
This easy adaptability is crucial for an industry experiencing the dual pressures of high demand for delivery and a low supply of drivers.
E-commerce orders increased nearly 60 percent year-over-year in 2020, according to last-mile technology vendor Convey Inc., with 36 percent of shoppers opting for same-day delivery. At the same time, the trucking industry is experiencing a 92 percent turnover rate — the amount of workers joining or leaving the field in a given year — and the American Trucking Associations estimates it will be short 160,000 drivers by 2028.
This confluence of factors requires an easy solution for trucking companies to adopt while maintaining road safety.
Maps are critical to autonomous driving, helping self-driving vehicles locate themselves in space and plan routes.
Rather than rely on pre-constructed HD maps, which may not be updated in real time to reflect road changes such as construction or new traffic patterns, Kodiak vehicles perceive their environment live while using maps primarily for navigation.
This lightweight mapping strategy requires the vehicle to detect all road objects, signs and more. Such real-time perception requires high-performance, centralized AI compute architected to meet the highest safety standards.
NVIDIA DRIVE Orin achieves over 250 TOPS and is designed to handle the many applications and deep neural networks that run simultaneously in autonomous vehicles, while achieving systematic safety standards such as ISO 26262 ASIL-D.
NVIDIA DRIVE Orin
NVIDIA DRIVE Orin provides the Kodiak Driver with the data and computing power it needs to reliably make and implement decisions — safely and securely.
“NVIDIA DRIVE makes it possible to centralize the vehicle’s compute, helping provide a safe and stable path to full autonomy,” Burnette said.
It’s What’s Not on the Outside That Counts
In keeping with the company’s focus on safety, Kodiak’s autonomous trucks aren’t designed to turn heads.
The fourth-generation trucks feature a modular and discreet sensor suite in just three locations: a slim “center pod” on the front roofline of the truck, and pods integrated into both of the side mirrors. This low-profile sensor placement simplifies installation and maintenance, while increasing safety.
“When you see these trucks, you’re going to ignore them,” Burnette said.
By building this discreet system with the open and scalable NVIDIA DRIVE platform at its core, Kodiak can continue to focus on flexibility and live perception without sacrificing safety and security.
Outscanding: That’s one way to describe the groundbreaking work in AI and medical imaging from researchers at top medical centers and universities.
The arrival of AI in healthcare is arguably nowhere more apparent than in radiology, where machine learning is supporting workflows side by side with caregivers every step of the way.
At St. Jude Children’s Research Hospital, researchers are using AI to study the effects of cancer treatment on brain structures.
Data scientists at the University of California, San Francisco, are helping clinical teams automate parts of their radiology workflow with machine learning.
In Shanghai, radiologists are using AI to help identify bone fractures and multiple diseases from single images.
Learn more about the incredible research and technology advancing medical imaging worldwide below. And register for the next GPU Technology Conference, running online Nov. 8-11, to hear from healthcare experts using AI and accelerated computing around the globe.
St. Jude Researchers Extract Data to Find Cures With NVIDIA DGX
Medulloblastoma is the most common malignant brain tumor pediatric diagnosis. With extensive therapy, the average survival rate is between 70 to 75 percent. However, the therapy, while treating cancer, can cause other problems such as long-term neurocognitive deficits.
It can also cause short-term brain damage called posterior fossa syndrome, causing problems with language, emotions and movement that can last for weeks or years.
Zhaohua Lu, a biostatistician at St. Jude Children’s Research Hospital, is using neuroimaging data acquired after radiotherapy to compare the neurocognitive outcomes of patient subgroups with different brain structures affected by the cancer treatment.
This research into structural brain changes following medulloblastoma treatment will enable earlier prognosis and more targeted interventions.
“We use neural imaging data to distinguish the patients into subgroups according to their MRI measurements. So the MRI measurement at baseline after radiotherapy is a quite promising predictor for the neurocognitive outcomes 36 months later,” said Lu.
Using an NVIDIA DGX A100 for the study, Lu and the team developed a 3D convolutional autoencoder to extract features from the neuroimaging data. In a future study, the team will investigate potential confounders like age, gender and treatment intensity to build a more rigorous model that further supports their hypothesis.
Driving UCSF Research to Clinical Radiology Application With NVIDIA Clara
As demand for imaging and radiologist efforts increases, clinical teams can benefit from university research by bringing machine learning into every step of the radiology workflow — from data acquisition and inference to review and clinical practice.
One application of this machine learning model is to detect hip fractures in patientsfor triage. The classification algorithm uses object detection to identify the left and right hip and classify it as fractured or not fractured, and identify if there is hardware in the hip. When a fracture is detected, the case is accelerated to the top of the list for faster review.
“We were once discussing the implementation of inserting an algorithm into a clinical workflow and the physician told us that ‘if we could do it in two clicks, that would be too slow.’ It needed to be one click for the physicians to get into the data,” said Beck Olson, a data scientist at the UCSF Center for Intelligent Imaging. “These workflows are razor-thin in terms of efficiency and we need to seamlessly integrate these results.”
The UCSF framework routes the data through radiologists’ existing workflows using the NVIDIA Clara Deploy software development kit. This data then goes to the inference model and quantification is performed using custom operators running on the NVIDIA Clara Train SDK.
The results are then sent into the open-source XNAT platform for clinical review. Clinicians are able to provide feedback to data scientists, who in turn retrain the machine learning model, which can then be used to update the inference model NVIDIA Clara Deploy is using — providing a seamless experience for radiologists.
Modality Manufacturer Uses AI to Identify Multiple Diseases From a Single Image
In trauma patients, multiple rib fractures can indicate how severe their injuries are — a key indicator of potential respiratory failure and overall mortality. Radiologists have to dedicate time and effort to catch these injuries before it’s too late.
Shanghai-based United Imaging Intelligence, a member of the NVIDIA Inception acceleration platform, uses AI to make medical devices more efficient, optimize clinical workflows and facilitate advanced research.
“We believe the proper way to see AI is making it a ‘best friend’ for healthcare professionals and empower them to be better in doing their jobs — but not replace them,” said Terrence Chen, CEO of UII America.
With AI, UII’s technologies can analyze and detect multiple diseases from a single image. From a chest CT scan, its uAI Portal can identify rib fractures, lung nodules and lymphadenopathy, as well as other diseases like pneumonia, COVID-19, esophageal cancer, spinal tumors and breast masses and lesions.
NVIDIA has announced its GTC keynote at which CEO Jensen Huang will take center stage on the 9th of November.
NVIDIA CEO, Jensen Huang, To Address GTC Keynote on 9th November - Will Unveil New AI Tech & ProductsAccording to NVIDIA, the virtual GTC keynote will take place from 8th till 11th of November. CEO, Jensen Huang, will be addressing the keynote along with several executives from other companies on the 9th of November at 9 AM Central Time. The company has also announced that the event will host a range of new AI technologies and products with a focus on deep learning, data science, high-performance computing, robotics, data center/networking, and graphics.
Following are the major speakers at the NVIDIA GTC event:
Anima Anandkumar, director of ML research at NVIDIA and Bren Professor at Caltech
Alan Aspuru-Guzik, professor of chemistry and computer science, University of Toronto
Alan Bekker, head of conversational AI, Snap
Samy Bengio, senior director of AI and ML research, Apple
Kay Firth-Butterfield, head of AI and ML, World Economic Forum
Axel Gern, CTO, Daimler Trucks
Fei-Fei Li, professor of computer science, Stanford University
Keith Perry, CIO, St. Jude Children’s Research Hospital
Venkatesh Ramanathan, director of data science, PayPal
Ilya Sutskever, co-founder and chief scientist, OpenAI
Tim Sweeney, founder and CEO, Epic Games
Nir Zuk, founder and CTO, Palo Alto Networks
Huang’s keynote will be livestreamed on Nov. 9 at 9 a.m. Central European Time/4 p.m. China Standard Time/12 a.m. Pacific Daylight Time, with a rebroadcast at 8 a.m. PDT for viewers in the Americas. Registration is free and is not required to view the keynote.
More than 200,000 developers, innovators, researchers and creators are expected to register for the event, which will focus on deep learning, data science, high performance computing, robotics, data center/networking and graphics. Speakers share the latest breakthroughs that are transforming some of the world’s largest industries, such as healthcare, transportation, manufacturing, retail and finance.
Leaders from hundreds of other organizations will also present, including Amazon, Arm, AstraZeneca, Baidu, BMW, Domino’s, Electronic Arts, Facebook, Ford, Google, Kroger, Microsoft, MIT, Oak Ridge National Laboratory, Red Hat, Rolls-Royce, Salesforce, Samsung, ServiceNow, Snap, Volvo, Walmart and WPP.
“GTC is a great opportunity for developers and business leaders to learn the latest advances in AI, accelerated computing and computer graphics from the world’s top innovators, scientists and researchers,” said Greg Estes, vice president of Developer Programs at NVIDIA. “Startups, academia and the largest enterprises all come together at GTC, giving attendees a unique opportunity to share ideas and collaborate across boundaries to create the future.”
In recent years, GTC has expanded from high performance computing and graphics to include areas such as cloud and enterprise computing, where AI breakthroughs are often deployed. The keynote and other talks provide corporate and IT leaders the latest on how to configure secure, accelerated data centers that support modern workloads including AI, machine learning and natural language processing.
This event shouldn't be mistaken with the core GTC that will take place next year on 21st March 2022. With that said, we can expect a range of technologies, not just limited to datacenter, to be unveiled during the keynote.
AMD Announces Ambitious Goal to Increase Energy Efficiency of Processors Running AI Training and High Performance Computing Applications 30x by 2025
September 29, 2021 9:00am EDT
High-performance AMD EPYC™ CPUs and AMD Instinct™ accelerators target delivering unprecedented advance in energy efficiency for Artificial Intelligence training and Supercomputing applicationsSANTA CLARA, Calif., Sept. 29, 2021 (GLOBE NEWSWIRE) -- AMD (NASDAQ: AMD) today announced a goal to deliver a 30x increase in energy efficiency for AMD EPYC CPUs and AMD Instinct accelerators in Artificial Intelligence (AI) training and High Performance Computing (HPC) applications running on accelerated compute nodes by 2025.1 Accomplishing this ambitious goal will require AMD to increase the energy efficiency of a compute node at a rate that is more than 2.5x faster than the aggregate industry-wide improvement made during the last five years.2
Accelerated compute nodes are the most powerful and advanced computing systems in the world used for scientific research and large-scale supercomputer simulations. They provide the computing capability used by scientists to achieve breakthroughs across many fields including material sciences, climate predictions, genomics, drug discovery and alternative energy. Accelerated nodes are also integral for training AI neural networks that are currently used for activities including speech recognition, language translation and expert recommendation systems, with similar promising uses over the coming decade. The 30x goal would save billions of kilowatt hours of electricity in 2025, reducing the power required for these systems to complete a single calculation by 97% over five years.
“Achieving gains in processor energy efficiency is a long-term design priority for AMD and we are now setting a new goal for modern compute nodes using our high-performance CPUs and accelerators when applied to AI training and high-performance computing deployments,” said Mark Papermaster, executive vice president and CTO, AMD. “Focused on these very important segments and the value proposition for leading companies to enhance their environmental stewardship, AMD’s 30x goal outpaces industry energy efficiency performance in these areas by 150% compared to the previous five-year time period.”
“With computing becoming ubiquitous from edge to core to cloud, AMD has taken a bold position on the energy efficiency of its processors, this time for the accelerated compute for AI and High Performance Computing applications,” said Addison Snell, CEO of Intersect360 Research. “Future gains are more difficult now as the historical advantages that come with Moore’s Law have greatly diminished. A 30-times improvement in energy efficiency in five years will be an impressive technical achievement that will demonstrate the strength of AMD technology and their emphasis on environmental sustainability.”
Increased energy efficiency for accelerated computing applications is part of the company’s new goals in Environmental, Social, Governance (ESG) spanning its operations, supply chain and products. For more than twenty-five years, AMD has been transparently reporting on its environmental stewardship and performance. For its recent achievements in product energy efficiency, AMD was named to Fortune’s Change the World list in 2020 that recognizes outstanding efforts by companies to tackle society’s unmet needs.
In addition to compute node performance/Watt measurements3, to make the goal particularly relevant to worldwide energy use, AMD uses segment-specific datacenter power utilization effectiveness (PUE) with equipment utilization taken into account.3 The energy consumption baseline uses the same industry energy per operation improvement rates as from 2015-2020, extrapolated to 2025. The measure of energy per operation improvement in each segment from 2020-2025 is weighted by the projected worldwide volumes4 multiplied by the Typical Energy Consumption (TEC) of each computing segment to arrive at a meaningful metric of actual energy usage improvement worldwide.
Dr. Jonathan Koomey, President, Koomey Analytics, said “The energy efficiency goal set by AMD for accelerated compute nodes used for AI training and High Performance Computing fully reflects modern workloads, representative operating behaviors and accurate benchmarking methodology.”
For more than 50 years AMD has driven innovation in high-performance computing, graphics and visualization technologies ? the building blocks for gaming, immersive platforms and the datacenter. Hundreds of millions of consumers, leading Fortune 500 businesses and cutting-edge scientific research facilities around the world rely on AMD technology daily to improve how they live, work and play. AMD employees around the world are focused on building great products that push the boundaries of what is possible. For more information about how AMD is enabling today and inspiring tomorrow, visit the AMD (NASDAQ: AMD) website, blog, Facebook and Twitter pages.
AMD, the AMD Arrow logo, EPYC, Instinct and combinations thereof, are trademarks of Advanced Micro Devices, Inc. Other names are for informational purposes only and may be trademarks of their respective owners.
_________________________________ 1 Includes AMD high performance CPU and GPU accelerators used for AI training and High-Performance Computing in a 4-Accelerator, CPU hosted configuration. Goal calculations are based on performance scores as measured by standard performance metrics (HPC: Linpack DGEMM kernel FLOPS with 4k matrix size. AI training: lower precision training-focused floating point math GEMM kernels such as FP16 or BF16 FLOPS operating on 4k matrices) divided by the rated power consumption of a representative accelerated compute node including the CPU host + memory, and 4 GPU accelerators. 2 Based on 2015-2020 industry trends in energy efficiency gains and data center energy consumption in 2025. 3 The CPU socket and GPU node power consumptions are based on segment-specific utilization (active vs. idle) percentages then multiplied by PUE to determine actual energy use for calculation of the performance per Watt. 4 Total 2025 Server CPUs - 18.8 Mu (IDC - Q1 2021 Tracker), Total HPC CPUs – 3.3Mu (Hyperion- Q4 2020 Tracker), Total 2025 HPC GPUs 624k (Hyperion HPC Market Analysis, April ’21)
My comment: Thanks for an interesting article. Growth from new AI ventures is definitely the future, but I would think autonomous vehicles is the #1 bet, with the Metaverse (Omniverse) an interesting contender and the Isaac robotics simulation platform another strong performer. Cheers, Frank