We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.

   Technology StocksNVIDIA Corporation (NVDA)

Previous 10 Next 10 
From: Frank Sully4/20/2021 2:44:27 PM
   of 2485

AI Funding Spree: +$300m for Groq, +$676m for SambaNova by Dr. Ian Cutress on April 19, 2021 7:00 AM EST

The growth of AI has seen a resurgence in venture capital funding for silicon start-ups. Designing AI silicon for machine learning, both for training and inference, has become hot property in Silicon Valley, especially as machine learning compute and memory requirements are coalesced into tangible targets for this silicon to go after. A number of these companies are already shipping high performance processors to customers, and are looking for further funding to help support customers, expand the customer base, and develop next generation products until profitability happens, or the company is acquired. The two latest funding rounds for AI silicon were announced in this past week.

Groq (Series C, $300m, Tensor Streaming Processor Q100) When Groq’s first product came onto the scene, detailed by the Microprocessor Report back in January 2020, it was described as the first PetaOP processor that eschewed traditional many-core designs and instead implemented a single VLIW-like core with hundreds of functional units. In this method, the data is subject to instruction flow, rather than instructions being reliant on data flow, saving time on synchronicity and decode overhead that many-core processors require.

The end result is a product that implements 400,000 multiply-accumulate units, but the key marketing metric is the deterministic performance. Using this single core methodology, Groq’s Q100 TSP will take the same time to inference workload without any quality-of-service requirements. In speaking with CEO Jonathan Ross, Groq’s TSP enables workloads that were previously unusable due to long tail quality of service performance degradation (i.e. worst case results take too long). This is especially important in analysis that requires batch size 1, such as video.

The Groq ecosystem also means that distribution across many TSPs simply scales out inferences per second, with multiple Q100 parts under the same algorithm all implementing the same deterministic performance.

Jonathan stated to us, as the company has stated in the past, that Groq as a company was built on a compiler-first approach. Historically this sort of approach puts a lot of pressure on the compiler doing the optimization (such as Itanium and other VLIW processors), and often leads to concerns about the product as a whole. However, we were told that the team never touched any silicon design until six months into the software and compiler work, allowing the company to lock down the key aspects of the major ML frameworks before even designing the silicon.

As part of its funding efforts, Groq reached out to us for a company update. All of Groq’s hardware and software work to date has been achieved through two rounds of VC funding, totaling $67.3m, with about $50m being used so far. In that capital they have designed, built, and deployed the Q100 TSP to almost a dozen customers, including the audio/visual industry, datacenter, and government labs. The second generation product is also well underway. This latest Series C funding round of $300m, led by Tiger Global Management and D1 Capital, will allow the company to expand from 120 people to 250 by the end of the year, support current and future customers with bigger teams, and enable a progressive roadmap.

Groq stated in our briefing that its second generation product will build on its unique design points, offering alternatives for customers that were interested in Q100 but have other requirements for their workloads. Each generation of Groq’s TSP, according to the company, will have half a dozen unique selling points in the market (some public, some not), with one goal at least to displace as many GPUs as possible with a single TSP in order to give customers the best TCO.

SambaNova (Series D, $676m, Cardinal AI) The second company this week is SambaNova, whose Series D funding is a staggering $676 million, led by SoftBank’s Vision Fund 2, with new investors Temasek and GIC, joining existing backers such as BlackRock, Intel Capital, GV (formerly Google Ventures) and others. To date SambaNova has generated over $1.1 billion in investment, enabling a $5 billion valuation.

SambaNova’s entry into the AI silicon space is with its Cardinal AI processor. Rather than focusing on machine learning inference workloads, such as trying to identify animals with a known algorithm, the Cardinal AI processor is one of the few dedicated implementations to provide peak training performance. Training is a substantially harder problem than inference, especially as training algorithms are constantly changing and requirements for the biggest datasets are seemingly ever increasing.

The Cardinal AI processor has already featured on AnandTech, when SambaNova announced its eight-socket solution known as the ‘DataScale SN10-8R’. In a quarter rack design, an EPYC Rome x86 system is paired with eight Cardinal processors backed by 12 terabytes of DDR4-3200 memory, and SambaNova can scale this to a half-rack or full-rack solution. Each Cardinal AI processor has 1.5 TB of DDR4, with six memory channels for 153 GB/s bandwidth per processor. Within each eight socket configuration, the chips are connected in an all-to-all fashion with 64x PCIe 4.0 lanes to dedicated switching network silicon (like an NVSwitch) for 128 GB/s in each direction to all other processors. The protocol being used over PCIe is custom to SambaNova. The switches also enable system-to-system connectivity that allows SambaNova to scale as required. SambaNova is quoting that a dual-rack solution will outperform an equivalent DGX-A100 deployment by 40% and will be at a much lower power, or enable companies to coalesce a 16-rack 1024 V100 deployment into a single quarter-rack DataScale system.

SambaNova’s customers are looking for a mix of private and public cloud options, and as a result the flagship offering is a Dataflow-as-a-Service product line allowing customers a subscription model for AI initiatives without purchasing the hardware outright. These subscription systems can be deployed internally to the company with the subscription, and be managed remotely by SambaNova. The company cites that TensorFlow or PyTorch workloads can be rebuilt using SambaNova’s compiler in less than an hour.

SambaNova has not given many more details on its architecture as yet, however they do state that SambaNova can enable AI training that requires large image datasets (50000x50000 pixel images, for example) for astronomy, oil-and-gas, or medical imaging that often require losing resolution/accuracy for other platforms. The Cardinal AI processor can also perform in-the-loop training allowing for model reclassification and optimization of inference-with-training workloads on the fly by enabling a heterogeneous zerocopy-style solution – GPUs instead have to memory dump and/or kernel switch, which can be a significant part of any utilization analysis.

The company has now been through four rounds of funding:

  • Series A, $56m, led by Walden International and Google Ventures
  • Series B, $150m, led by Intel Capital
  • Series C, $250m, led by BlackRock
  • Series D, $676m, led by SoftBank
This puts SambaNova almost at the top of AI chip funding with $1132m, just behind Horizon Robotics ($1600m), but ahead of GraphCore ($460m), Groq ($367m), Nuvia ($293m, acquired by Qualcomm), Cambricon ($200m), and Cerebras ($112m).


Share RecommendKeepReplyMark as Last Read

From: Frank Sully4/22/2021 11:25:47 AM
   of 2485
Behind NVIDIA’s Megatron


The team performed training iterations on models with a trillion parameters at 502 petaFLOP/s on 3072 GPUs by combining three techniques.

Share RecommendKeepReplyMark as Last Read

From: Frank Sully4/25/2021 2:36:34 AM
1 Recommendation   of 2485
What Is Quantum Computing?

Share RecommendKeepReplyMark as Last Read

From: Frank Sully4/27/2021 10:01:16 PM
   of 2485
Arm Marks Data Center Foray, Intensifying Competition With Intel, AMD: Reuters

Anusuya Lahiri , Benzinga Staff Writer
April 27, 2021 2:19pm
  • British chip technology firm, Arm Ltd, which NVIDIA Corporation is acquiring NVDA 0.62% in a $40 billion deal, elaborated on its next generation of data center technology, citing Oracle Corp (NYSE: ORCL) and Alibaba Group Holding Ltd (NYSE: BABA) as customers, Reuters reports.
  • Qualcomm Inc (NASDAQ: QCOM) and Apple Inc (NASDAQ: AAPL) licensed Arm’s underlying intellectual property to create their processor chips.
  • Arm’s entry into the data center processor market will intensify competition with Intel Corp (NASDAQ: INTC) and Advanced Micro Devices Inc (NASDAQ: AMD).
  • Intel competitor NVIDIA Corp (NASDAQ: NVDA) targeted the $40 billion Arm acquisition to capitalize on the latter’s data center push, stated analysts.
  • Arm’s N2 Neoverse computing cores were estimated to be about 40% faster than the previous generation. The artificial intelligence-focused V1 cores were estimated to be 50% faster than traditional methods.
  • Marvell Technology Inc (NASDAQ: MRVL) was making a chip using Arm’s new technology. Oracle will use Ampere Computing produced Arm-based chips for its cloud computing business. Alibaba was set to offer Arm-based cloud computing services via an unnamed chip vendor.
  • Price action: NVDA shares traded lower by 0.58% at $615.51 on the last check Tuesday.

Share RecommendKeepReplyMark as Last Read

From: Frank Sully4/28/2021 5:52:45 PM
   of 2485
How AI Improves Radar Perception for Autonomous Vehicles

Share RecommendKeepReplyMark as Last Read

From: Frank Sully4/28/2021 6:12:53 PM
   of 2485
AMD vs Nvidia: Which GPUs Are Best for Ray Tracing?

Share RecommendKeepReplyMark as Last Read

From: Frank Sully4/29/2021 3:18:59 PM
   of 2485
Why Cloudflare and NVIDIA Are Bringing AI to the Edge

Share RecommendKeepReplyMark as Last Read

From: Frank Sully5/1/2021 8:36:24 AM
   of 2485
Why Investors Shouldn't Underestimate NVIDIA

Share RecommendKeepReplyMark as Last Read

From: Frank Sully5/1/2021 8:44:53 AM
   of 2485
Nvidia Is Not Just a Graphics Chip Company Anymore

Four minute video.

Share RecommendKeepReplyMark as Last Read

From: Frank Sully5/2/2021 6:51:04 PM
   of 2485
NVIDIA "I Am AI" Video

Share RecommendKeepReplyMark as Last Read
Previous 10 Next 10