We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : AMD, ARMH, INTC, NVDA
AMD 55.17+1.0%May 22 4:00 PM EDT

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: FUBHO who wrote (31156)7/21/2019 9:31:43 AM
From: Vattila of 37136
Is BFloat16 something that belongs in an accelerator or in a general-purpose core?

It is obviously useful for ML, but how useful is it across domains? One interesting part of the article is that Intel, rather than implementing BFloat16 in hardware to evaluate it, simulated it with AVX-512, with "only a very slight performance tax". However, according to the article, "Intel will be supporting the format in both its general purpose Xeon line and its purpose-built NNP processor".

I am sceptical to the idea that a balanced general-purpose core should be encumbered by BFloat16 — and AVX-512, for that matter — unless the inclusion can be shown to have greater benefits than executing such code on GPGPU or dedicated accelerator.

FP16 vs BFloat16:

By the way, BFloat16 was supported in AMD's ROCm 2.6 (rocBLAS/Tensible), released earlier this month:

"Radeon ROCm 2.6 brings various information reporting improvements, the first official release of rocThrust and hipCUB, MIGraphX 0.3 for reading models frozen from Tensorflow, MIOpen 2.0 with Bfloat16 support and other features, BFloat 16 for rocBLAS/Tensible, AMD Infinity Fabric Link support, RCCL2 support, rocFFT improvements, ROCm SMI fixes, and other enhancements."

However there is no hardware support for BFloat16 in Radeon yet:

"Added mixed precision bfloat16/IEEE f32 to gemm_ex. The input and output matrices are bfloat16. All arithmetic is in IEEE f32."

It would be interesting so see how simulating BFloat16 with RDNA would compare in suitability and performance to what Intel did to evaluate the format with AVX-512.
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext