Technology StocksAMD, ARMH, INTC, NVDA

Previous 10 Next 10 
From: neolib2/15/2012 11:57:58 AM
of 25008
SUNNYVALE, CA -- (Marketwire) -- 02/15/12 -- AMD (NYSE: AMD) today announced the arrival of its AMD Radeon™ HD 7770 GHz Edition and HD 7750 graphics cards. The AMD Radeon HD 7770 GHz Edition is the first graphics card equipped with a reference engine clock that breaks the one gigahertz barrier -- making it the world's first 1GHz GPU. When coupled with AMD's Graphics Core Next (GCN) Architecture, the AMD Radeon HD 7770 GHz Edition offers incredible, best-in-class entertainment experiences that every gamer deserves. The AMD Radeon HD 7750 is a superior performance-level graphics card that doesn't require its own separate power connector and provides exceptional gaming experiences under 75 watts.


Share KeepReplyMark as Last Read

From: FUBHO2/15/2012 1:24:48 PM
of 25008
Japanese University Boots Up 800-Teraflop GPU Supercomputer

Michael FeldmanFebruary 14, 2012

Japan's newest supercomputer, an 802-teraflop GPU-accelerated Appro cluster, went into production last week at the University of Tsukuba, just north of Tokyo. The machine represents the lynchpin of the university's HA-PACS project, a three-year effort that will attempt to push the envelope on GPU-pumped supercomputing.

HA-PACS, which stands for Highly Accelerated Parallel Advanced system for Computational Sciences, is just the latest in a series "PACS" systems at the Tsukuba. The original system, known as PACS-9, was installed in 1978 and delivered 7 kiloflops (yes kiloflops!). Every two to four years thereafter, the university's Center for Computational Sciences upgraded to a new system. The last one, PACS-CS, was deployed in 2006 and topped out at 14.3 teraflops.

The new Appro cluster represents the 8th generation supercomputer at Tsukuba and is the first to be accelerated by GPUs. As you might suspect, the vast majority of the 802 flops is provided by the graphics units, in this case, based on the latest NVIDIA Tesla GPU part, the M2090. Each cluster node pairs four of them with two 8-core Xeon E5 ("Sandy Bridge") CPUs from Intel.

In aggregate, the 268-node HA-PACS machine will house 1072 GPUs and 536 CPUs, as well as a total of 34 terabytes of memory on the CPU side and an additional 6.4 terabytes for the GPUs. External storage amounts to just over half a petabyte, based on DataDirect Network's SFA10000 gear. As a result of the high computational density afforded by the graphics chips, the entire cluster fits into just 26-racks and draw a little over 400 KW of power.

Using the top-of-the line CPUs and GPUs makes for a dense and powerful cluster, with each node delivering just shy of 3 teraflops (peak) performance. And even though most of the flops are GPU-derived (665 teraflops per M2090), each Xeon E5 chips in with a respectable 166 teraflops, thanks to the addition of the new Advanced Vector Extensions (AVX) instructions.

This is Appro's second big system deployment at Tsukuba, having delivered the 95-teraflop T2K Open Supercomputer there in 2009. That machine used AMD's quad-core Opterons and no GPUs.

Appro, by the way is one of the few server vendors offering systems equipped with Xeon E5 CPUs these days, and already claims four such systems on the TOP500 list: "Zin" (961 teraflops) at Lawrence Livermore National Lab, "Luna" (293 teraflops) at Los Alamos National Lab, "Gordon" (262 teraflops) at the San Diego Supercomputer Center and "Chama" at Sandia National Labs. That's a nice accomplishment, considering Intel has yet to officially release the E5 chips into the wild.

CPU's aside, the main focus for HA-PACS is to draw the most performance from the GPU hardware. The project has a two-pronged mission in this regard: to bring more big science codes to the GPU and to develop a tightly coupled parallel computing acceleration mechanism in order to "further optimize the utility of the graphics hardware."

On the application side, HA-PACS will be porting codes to the GPU in the areas of subatomic particles, life sciences, astrophysics, nuclear physics and environmental science. For example, astrophysics applications that deal with radiation transfer can take advantage of ray tracing methods, which modern GPUs are tailor-made for. Likewise, for elementary particle physics, GPUs can be used to great advantage to accelerate dense matrix computations.

On the computational research side, the HA-PACS team is in the process of developing custom hardware to support direct communications between the GPUs. The idea is to enable the graphics processors to quickly shuffle data between themselves without the overhead involved in going through the CPU.

This custom hardware, known as the Tightly Coupled Accelerator (TCA), will be distinct from the HA-PACS base cluster from Appro, but will eventually be integrated with it, says Taisuke Boku, deputy director of Center for Computational Sciences at University of Tsukuba. According to him, TCA will use PCIe as a communication channel between the GPUs and employ FPGA technology to facilitate this.

The FPGA will be based on an existing implementation developed at Tsukuba called PEACH, which stands for PCI Express Adaptive Communication Hub. The idea is to provide a controller that enables PCIe devices to directly communicate with one another on a peer-to-peer basis, rather than as slave devices.

To make this work for TCA, an upgraded implementation of the FPGA, known as PEACH2, will be developed. It will incorporate NVIDIA's GPU-Direct communication protocols to facilitate data transfers between the Tesla parts. Bandwidth will also be improved from the original PEACH version, which used four ports of PCIe Gen2 x4 as the communication link. For PEACH2, four ports of PCIe Gen2 x8 will be supported, doubling throughput.

The first prototype of the TCA is under development now. The plan is to to incorporate the technology into a second cluster, which will be glued to the Appro base cluster by early 2013. The TCA cluster will add an additional 200-plus teraflops into production, bringing the integrated HA-PACS system to over a petaflop.

The HA-PACS work will be a precursor to future exascale systems already in the minds of Boku and his team at Tsukuba. He believes future exascale system will require some level of accelerated computing technology due to its inherent advantages in performance and energy efficiency.

"The largest issue on the accelerated computing is how to fill the gap between its powerful internal computation performance and relatively poor external communication performance," says Boku. "In some applications, we may need a paradigm shift toward a new generation of algorithms. HA-PACS will be the testbed for developing these algorithms."

Share KeepReplyMark as Last ReadRead Replies (1)

To: FUBHO who wrote (4938)2/15/2012 2:05:47 PM
From: neolib
of 25008
And even though most of the flops are GPU-derived (665 teraflops per M2090), each Xeon E5 chips in with a respectable 166 teraflops, thanks to the addition of the new Advanced Vector Extensions (AVX) instructions.

I'd like a PC with just one of those M2090's and one E5, but I suspect a) either that is the total for all the GPUs and all the CPUs which would be >800TF by a bit, so more than than machine is rated at, b) or there needs to be a decimal point in there somewhere...

More interesting is that there is roughly 5x the memory hanging off the CPU side vs the GPU side, which is nearly the inverse of the compute numbers. I wonder if this is typical and why? I suppose the CPU side might largely handle data flow to the GPU's so the local GPU memory doesn't need to be so large??

Share KeepReplyMark as Last ReadRead Replies (1)

To: neolib who wrote (4939)2/15/2012 3:34:22 PM
of 25008 now we have reports of everything from bad to great
TSMC's 28nm is great, says Xilinx

David Manners

Wednesday 15 February 2012 11:59

Despite reports of low yields on TSMC’s 28nm process, Xilinx says it is having the fastest new-node product roll-out in its history having shipped four of its five 28nm product families in 11 months.

That is half the time it took to roll out initial devices in two product families at 40nm.

TSMC says that the 28nm ramp is 3x faster than the 40nm ramp.

Though 40nm was, of course, a dog of a node.

Xilinx is using TSMC’s HPL process. Its rival Altera is using TSMC’s LP and HP processes.

TSMC's wireless clients are using the LP process.

Xilinx says it has taped out more than 10 devices and expects to be sampling all members of its 28nm portfolio by mid-2012.

‘TSMC's HPL process has given Xilinx a tremendous time-to-market advantage and is providing yields in line with the company's aggressive roadmap and better than previous generations,’ states Xilinx.

Share KeepReplyMark as Last ReadRead Replies (2)

To: THE WATSONYOUTH who wrote (4940)2/15/2012 3:42:02 PM
From: neolib
of 25008
Somewhere down the middle I guess. Spin on both sides.

Share KeepReplyMark as Last Read

From: neolib2/15/2012 3:46:27 PM
of 25008
EDN has an editorial on ARM in automotive based on data from Semicast back in Dec. It appears ARM is the leading 32-bit embedded CPU in automotive, and its in a wide range of automotive applications. They claim 3 ARM CPU's on average in every light vehicle produced worldwide in 2011 and growing to 7 in 2016. One area not mentioned is powertrain, where I think PPC still dominates.

Share KeepReplyMark as Last Read

To: THE WATSONYOUTH who wrote (4940)2/15/2012 11:04:49 PM
From: Toro Caca
of 25008
watson, so now we have reports of everything from bad to great<br/>TSMC's 28nm is great, says Xilinx

Do you think that Xilinx products are like APU's. Designs are designs? might there not be some interaction between
design and process that lead to yield problems?

El Toro

Share KeepReplyMark as Last ReadRead Replies (2)

To: Toro Caca who wrote (4943)2/15/2012 11:15:58 PM
From: neolib
of 25008
Mainly I'd think Xilinx's 28nm parts are much bigger than AMD's APU and GPU's from TSMC. So one would expect much worse yields??

Share KeepReplyMark as Last ReadRead Replies (1)

From: neolib2/15/2012 11:24:28 PM
of 25008
At good read on the subject is here:

It actually claims that the FPGA and CPU/GPU die are on the same size, but Xilinx chose the HPL process vs the more standard HP process used by the CPU/GPU designs. Details as to why in the above link.

Share KeepReplyMark as Last Read

From: neolib2/15/2012 11:44:52 PM
of 25008
AMD gets a reprieve:

Intel to postpone mass shipments of Ivy Bridge processors
Monica Chen, Taipei; Joseph Tsai, DIGITIMES [Thursday 16 February 2012]
Intel recently notified its partners about plans to postpone mass shipments of its upcoming Ivy Bridge processors. Despite that the company will still announce the new products and ship a small volume of the processors in early April, mass shipments are not expected to occur until after June, according to sources from notebook players.

Because most first-tier notebook vendors are having trouble digesting their Sandy Bridge notebook inventories due to the weak global economy, while Intel is also troubled by its Sandy Bridge processor inventory, the CPU giant plans to delay mass shipments of the new processors to minimize the impact, the sources noted.

With Intel changing its launch schedule, notebook vendors have all started adjusting their projects for new Ivy Bridge models; however, the notebook vendors still believe the PC replacement trend is unlikely to start until after September, when Microsoft launches Windows 8, and the first three quarters of 2012 will still be a dark period for the notebook industry.

However, Intel's decision to slow down its Ivy Bridge processor launch will benefit USB 3.0 chipmakers such as Renesas, ASMedia and Etron allowing them to earn an extra quarter of sales, according to sources from the chipmakers. The sources pointed out that the share of third-party USB 3.0 chipmakers in the notebook market was originally expected to drop to only around 20% in 2012, but with Intel's delay, their share is expected to climb back to 50%.

Share KeepReplyMark as Last Read
Previous 10 Next 10 

Copyright © 1995-2018 Knight Sac Media. All rights reserved.Stock quotes are delayed at least 15 minutes - See Terms of Use.