Technology StocksMellanox Technologies, Ltd.

Previous 10 Next 10 
From: PaulAquino7/9/2017 12:10:01 PM
   of 940
Deep Learning in the Spotlight at ISC

July 9, 2017 by staff

At ISC High Performance 2017, held in Frankfurt, Germany, deep learning is driving new computing innovation as the HPC industry sets its sights on AI hardware and applications.

This year the conference dedicated an entire day to deep learning, Wednesday, June 21, to discuss the recent advances in artificial intelligence based on deep learning technology. However it, not just the conference where deep learning was dominating the conversation as the showfloor of the exhibition hosted many new products dedicated to optimizing HPC hardware for use in deep learning and AI workloads.

Cray announced the Cray Urika-XC analytics software suite which aims to deliver analytics tools – specifically targeting analytics and deep learning to the Company’s line of Cray XC supercomputers.

Nvidia launched its PCIE based Volta V100 GPU. However, the company also demonstrated the use of its GPU technology in combination with deep learning as part of the human brain project.

HPE launched new server solutions aimed specifically at HPC and AI workloads while Mellanox highlighted its work to fine tune technology for AI and deep learning applications. Mellanox announced that deep learning frameworks such as TensorFlow, Caffe2, Microsoft Cognitive Toolkit, and Baidu PaddlePaddle can now leverage the company’s smart offloading capabilities.

Shifting paradigms

The Cray Urika-XC solution is a set of applications and tools optimised to run seamlessly on the Cray XC supercomputing platform. In basic terms the company is taking the toolset it has developed through the Urika GX platform, optimising it for deep learning and then applying the software and toolsets to its XC series of supercomputers.

The software package is comprised of the Cray Graph Engine, the Apache Spark analytics environment; the BigDL distributed deep learning framework for Spark, the distributed Dask parallel computing libraries for analytics, and widely-used languages for analytics including Python, Scala, Java, and R.

The Cray Urika-XC analytics software suite highlights the convergence of traditional HPC and data-intensive computing – such as deep learning – as core workloads for supercomputing systems in the coming years.

As the data volumes in HPC grow, the industry is responding by moving away from the previous FLOPs centric model to a more data-centric model. This requires not only innovation in parallel processing, network, and storage performance but also the software and tools used to process the vast quantities of data needed to train deep learning networks.

While deep learning is not the only trigger for this new model, it exemplifies the changing paradigm of architectural design in HPC.
One example of this is the Swiss National Supercomputing Centre (CSCS) in Lugano, Switzerland which currently uses the Cray Urika-XC solution on the ‘Piz Daint,’ which, after its recent upgrade, is now one of the fastest supercomputers in the world.

CSCS has been responding to the increased needs for data analytics tools and services,” said Professor Thomas Schulthess, director of the Swiss National Supercomputing Centre (CSCS). “We were very fortunate to participate with our Cray supercomputer Piz Daint in the early evaluation phase of the Cray Urika-XC environment. Initial performance results and scaling experiments using a subset of applications including Apache Spark and Python have been very promising. We look forward to exploring future extensions of the Cray Urika-XC analytics software suite.”

Also this week at ISC, Nvidia announced the PCI Express version of their latest Tesla GPU accelerator, the Volta-based V100. The SXM2 form factor card was first announced earlier this year at the company’s GPU technology conference (GTC), but users can now use the more traditional PCIE slot to connect the Volta-based GPU.

It is not just hardware in the spotlight however as the company also highlighted some of the latest research that is making use of these technologies such as the Human Brain Project. Created in 2013 by the European Commission, the project’s aims include gathering, organizing and disseminating data describing the brain and its diseases, and simulating the brain itself.

Scientists at the Jülich Research Center (Forschungszentrum Jülich), in Germany, are developing a 3D multi-modal model of the human brain. They do this by analyzing thousands of ultrathin histological brain slices using microscopes and advanced image analysis methods — and then reconstructing these slices into a 3D computer model.

Analyzing and registering high-resolution 2D image data into a 3D reconstruction is both data and compute-intensive. To process this data as fast as possible the Julich researchers are using Jülich’s JURON supercomputer – one of two pilot systems delivered by IBM and NVIDIA to the Jülich Research Center.

The Juron cluster is composed of 18 IBM Minsky servers, each with four Tesla P100 GPU accelerators with NVIDIA NVLink interconnect technology.

Deep learning drives product innovation across the industry

Hewlett Packard Enterprise was also keen to get in on the AI action as the company launched the HPE Apollo 10 Series.
HPE Apollo 10 Series is a new platform, optimised for entry level Deep Learning and AI applications. The HPE Apollo sx40 System is a 1U dual socket Intel Xeon Gen10 server with support for up to 4 NVIDIA Tesla SXM2 GPUs with NVLink. The HPE Apollo pc40 System is a 1U dual socket Intel Xeon Gen10 server with support for up to 4 PCIe GPU cards.

‘Today, customer’s HPC requirements go beyond superior performance and efficiency,’ said Bill Mannel, vice president and general manager, HPC and AI solutions, Hewlett Packard Enterprise. ‘They are also increasingly considering security, agility and cost control. With today’s announcements, we are addressing these considerations and delivering optimised systems, infrastructure management, and services capabilities that provide A New Compute Experience.’

Collaboration to drive AI performance

Mellanox announced that it is optimizing its existing technology to help accelerate deep learning performance. The company announced that deep learning frameworks such as TensorFlow, Caffe2, Microsoft Cognitive Toolkit, and Baidu PaddlePaddle can now leverage Mellanox’ smart offloading capabilities to increase performance and, the company claims, provide near-linear scaling across multiple AI servers.

The Mellanox announcement highlights the work of the company to ensure its products can meet the requirements of users running deep learning workloads but it also demonstrates Mellanox’ willingness to work with partners, such as Nvidia, to further increase performance and integration of their technologies.

Advanced deep neural networks depend upon the capabilities of smart interconnect to scale to multiple nodes, and move data as fast as possible, which speeds up algorithms and reduces training time,” said Gilad Shainer, vice president of marketing at Mellanox Technologies. “By leveraging Mellanox technology and solutions, clusters of machines are now able to learn at a speed, accuracy, and scale, that push the boundaries of the most demanding cognitive computing applications.”

One of the key points of this announcement is that Mellanox is working with partners to ensure that deep learning frameworks and hardware (such as Nvidia GPUs) are compatible with Mellanox interconnect fabric to help promote the use of Mellanox networking solutions to AI/deep learning users.

More information was provided by Duncan Poole, director of platform alliances at NVIDIA: “Developers of deep learning applications can take advantage of optimized frameworks and NVIDIA’s upcoming NCCL 2.0 library which implements native support for InfiniBand verbs and automatically selects GPUDirect RDMA for multi-node or NVIDIA NVLink when available for intra-node communications.”

Share RecommendKeepReplyMark as Last Read

From: PaulAquino7/10/2017 2:12:06 PM
   of 940
Intel has 'most to lose' from 'tectonic shift in computing,' Jefferies says in downgrading stock

Jefferies downgraded Intel from hold to underperform on Monday, saying the chipmaker has the "most to lose" in the "4th tectonic shift in computing."

Jefferies said it is downgrading Intel because "its Xeon/Xeon PHI platform is disadvantaged vs NVidia in emerging parallel workloads like deep neural networking."

Jefferies calls out several areas of concern for Intel, including Microsoft's Windows new support for ARM processors and the rapid 200 percent growth of Nvdia's data-center business year over year.

Nvidia has been one of the market's hottest stocks recently; SoftBank Group bought a $4 billion stake in Nvidia in May. Shares of Nvidia are up 28 percent this year.

In a separate note on the semiconductor sector, Jefferies says it sees a major "tectonic shift" in the industry that will favor parallel computing platforms already used by AMD, Nvidia, Cavium and Xilinx.

Earlier tectonic shifts noted by Jefferies included the mainframe era in the 1950s; the minicomputer era in the 1970s; the personal computer era in the 1980s and 1990s; the cellphone/server era in the 2000s and the parallel processing IoT era we're just now entering.

"NVDA was the first to recognize and successfully invest in a HW/SW platform (GPU/CUDA) targeted specifically at parallel processing applications, and our field checks suggest it is years ahead of its competition," Jefferies said, referring to Nvidia's strategy to take advantage of computing power from graphics processing units versus standard processors.

Jefferies reset its 2018 price target for Intel to $29 from $38. Shares of Intel were at $33.30 in Monday's premarket, down 1.7 percent.

Share RecommendKeepReplyMark as Last Read

From: PaulAquino7/10/2017 2:47:46 PM
1 Recommendation   of 940
Ethernet Getting Back On The Moore’s Law Track

July 10, 2017 Timothy Prickett Morgan

It would be ideal if we lived in a universe where it was possible to increase the capacity of compute, storage, and networking at the same pace so as to keep all three elements expanding in balance. The irony is that over the past two decades, when the industry needed for networking to advance the most, Ethernet got a little stuck in the mud.

But Ethernet has pulls out of its boots and left them in the swamp and is back to being barefoot again on much more solid ground where it can run faster. The move from 10 Gb/sec to 40 Gb/sec was slow and costly, and if it were not for the hyperscalers and their intense bandwidth hunger we might not even be at 100 Gb/sec Layer 2 and Layer 3 switching, much less standing at the transition to 200 Gb/sec and looking ahead to the not-to-distant future when 400 Gb/sec will be available.

Bandwidth has come along just at the right moment, when advances in CPU throughput are stalling as raw core performance did a decade ago and as new adjunct processing capabilities, embodied in GPUs, FPGAs, and various kinds of specialized processors are coming to market to get compute back on the Moore’s Law track. Storage, thanks to flash and persistent flash-like and DRAM-like memories such as 3D XPoint from Intel and Micron Technology, is also undergoing an evolution. It is a fun time to be a system architect, but perhaps only because we know that with these advanced networking options that bandwidth is not going to be a bottleneck.

The innovation that is allowing Ethernet to not leap ahead so much as jump to where it should have already been is PAM-4 signaling. The typical non-return to zero, or NRZ, modulation used with Ethernet switching hardware, cabling, and server adapters can encode one bit on a signal. With pulse amplitude modulation, or PAM, multiple levels of signaling can be encoded, so multiple bits can be encoded in the signal. With PAM-4, there are four levels of signaling which allow for two bits of data to be encoded at the same time on the signal, which doubles the effective bandwidth of a signal without increasing the clock rate. And looking ahead down the road, there is a possibility of stuffing even more bits in the wire using higher levels of PAM, and the whiteboards of the networking world are sketching out how to do three bits per signal with PAM-8 encoding and four bits per signal with PAM-16 encoding.

With 40 Gb/sec Ethernet, we originally had 10 Gb/sec lanes aggregated. This was not a very energy efficient way to do 40 Gb/sec, and it was even worse for early 100 Gb/sec Ethernet aggregation gear, which ganged up ten 10 Gb/sec lanes. When the hyperscalers nudged the industry along in July 2014 to backcast this 25 GHz (well, really 28 GHz before encoding) to 25 Gb/sec and 50 Gb/sec Ethernet switching with backwards compatibility to run 10 Gb/sec and 40 Gb/sec, the industry did it. So we got to affordable 100 Gb/sec switching with four lanes running at 25 Gb/sec, and there were even cheaper 25 Gb/sec and 50 Gb/sec options for situations where bandwidth needs were not as high, and at a much better cost. (Generally, you got 2.5X the bandwidth for 1.5X to 1.8X the cost, depending on the switch configuration.)

With the 200 Gb/sec Spectrum-2 Ethernet switching that Mellanox Technologies is rolling out, and that other switch makers are going to adopt, the signaling is still running at 25 GHz effective, but with the Spectrum-2 gear Mellanox has just unveiled, it is layering on PAM-4 modulation to double pump the wires, so it delivers 50 Gb/sec per lane even though it is still running at the same speed as 100 Gb/sec Ethernet lanes. And to reach 400 Gb/sec with Spectrum-2 gear, Mellanox is planning to widen out to eight lanes running at this 25 GHz (effective) while layering on PAM-4 modulation to get 100 Gb/sec effective per lane. At some point, the lane speed will have to increase to 50 GHz, but with PAM-8 modulation the switching at eight lanes could be doubled again to 800 GB/sec, and with PAM-16 you could hit 1.6 TB/sec. Adding in the 50 GHz real signaling here would get us to 3.2 TB/sec – something that still probably seems like a dream and that is probably also very far into the future.

This all sounds a lot easier in theory than it will be to actually engineer, Kevin Deierling, vice president of marketing at Mellanox, tells The Next Platform. “You can go to PAM-8 and you can go to Pam-16, but when you do that, you are starting to shrink the signal and it gets harder and harder to discriminate from one level in the signal and the next. Your signal-to-noise ratio goes away because you are shrinking your signal. Some folks are saying let’s go to PAM-8 modulation, and other folks are saying that they need to use faster signaling rates like 50 GHz. I think we will see a combination of both.”

The sweet thing about using PAM-4 to get to 200 Gb/sec switching is that the same SFP28 and QSFP28 adapters and cables that were used for 100 Gb/sec switching (and that are used for the 200 Gb/sec Quantum HDR InfiniBand that was launched by Mellanox last year and that will start shipping later this year) are used for the doubled up Ethernet speed bump. You need better copper cables for Spectrum-2 because the signal-to-noise ratio is shrinking, and similarly the optical transceivers need to be tweaked for the same reason. But the form factors for the adapters and switch ports remain the same.

With the 400 Gb/sec Spectrum-2 switching, the adapters have new wider form factors, with Mellanox supporting the QSFP-DD (short for double density) option instead of the OSFP (short for Octal Small Form Factor) option for optical ports. Deierling says Mellanox will let the market decide and support whatever it wants – one, the other, or both – but it is starting with QSFP-DD.

The Spectrum-2 ASIC can deliver 6.4 Tb/sec of aggregate switching bandwidth, and it can be carved up in a bunch of ways, including 16 ports at 400 Gb/sec, 32 ports at 200 Gb/sec, 64 ports at 100 Gb/sec (using splitter cables), and 128 ports running at 25 Gb/sec or 50 Gb/sec (again, using splitter cables). The Spectrum-2 chip can handle up to 9.52 billion packets per second, and has enough on chip SRAM to handle access control lists (ACLs) that span up to 512,000; with one of the 200 Gb/sec ports and a special FPGA accelerator that is designed to act as an interface to a chunk of external DRAM next to the chip, the Spectrum-2 can handle up to 2 million additional routes on the ACL – what Deierling says is the first internet-scale Ethernet switch based on a commodity ASIC that is suitable for hyperscaler-class customers who want to do Layer 3 routing on a box at the datacenter scale.

As for latency, which is something that everyone is always concerned with, the port-to-port hop on the Spectrum-2 switch is around 300 nanoseconds, and this is about as low as the Ethernet protocol, which imposes a lot of overhead, can go, according to Deierling. The SwitchX-2 and Quantum InfiniBand ASICs from Mellanox can push latencies down to 100 nanoseconds or a tiny bit lower, but that is where InfiniBand hits a wall.

At any rate, Mellanox reckons that Spectrum-2 has the advantage in switching capacity, with somewhere between 1.6X and 1.8X the aggregate switching bandwidth compared to its competition – and without packet loss – and somewhere on the order of 1.5X to 1.7X lower latency, too.

At the moment, Mellanox is peddling four different configurations of its Spectrum-2 switches, which are shown below:

The Spectrum-2 switches are being made available in two different form factors, two full width devices and two half width devices. The SN3700 has a straight 32 ports running at 200 Gb/sec for flat, Clos style networks, while the SN3410 has 48 ports running at 50 Gb/sec with eight uplinks running at 200 Gb/sec for more standard three tiered networks used in the enterprise and sometimes on the edges of the datacenter at hyperscalers. The SN3100 is a half-width switch that has 16 ports running at 200 Gb/sec, and the SN3200 has 16 ports running at 400 Gb/sec.

It is interesting that there is not a full width SN series switch with 400 Gb/sec ports. This is intentionally so and based on the expected deployment scenarios. In scenarios where a very high bandwidth switch is needed to create a storage cluster or a hyperconverged storage platform, 16 ports in a rack is enough and two switches at 16 ports provides redundant paths between compute and storage or hyperconverged compute-storage nodes to prevent outages.

There is even a scenarios that, using the VMS Wizard software for the Spectrum-2 switch that converts a quad of the 2100 Gb/sec and 400 Gb/sec switches that creates a virtual modular switch that with 64 of the SN3410 devices that can support up to 3,072 ports in a single management domain. Take a look:

This Virtual Modular Switch is about 25 percent less expensive than actual modular switches with the same port count and lower bandwidth and higher latency.

Programmability is a big issue with networking these days, and the Spectrum-2 devices will be fully programmable and support both a homegrown compiler and scripting stack created by Mellanox as well as the P4 compiler that was created by Barefoot Networks for its “Tofino” Ethernet switch ASICs and that is being standardized upon by some hyperscalers. Mellanox expects for hyperscalers to want to do a lot of their own programming, but that most enterprise customers will simply run the protocols and routines that Mellanox itself codes for the machines. The point is, when a new protocol or extension comes along, Spectrum-2 will be able to adopt it and customers will not have to wait until new silicon comes out. The industry waited far too long for VXLAN to be supported in chips, and that will not happen again.

As for pricing, the more bandwidth you get, the more you pay, but the cost per bit keeps coming down and will for the 200 Gb/sec and 400 Gb/sec speeds embodied in the Spectrum-2 lineup. Pricing depends on volumes and on the cabling, of course, but here is how it generally looks. With the jump from 40 Gb/sec to 100 Gb/sec switching (based on the 25G standard), customers got a 2.5X bandwidth boost for somewhere between 1.5X and 1.8X the price – somewhere around a 20 percent to 30 percent price/performance benefit. Today, almost two years later, 100 Gb/sec ports are at price parity with 40 Gb/sec ports back then, and Deierling says that a 100 Gb/sec port costs around $300 for a very high volume hyperscaler and something like $600 per port for a typical enterprise customer. The jump to 200 Gb/sec will follow a similar pattern. Customers moving from 100 Gb/sec to 200 Gb/sec switches (moving from Spectrum to Spectrum-2 devices in the Mellanox lineup) will get 2X the bandwidth for 1.5X the cost. Similarly, those jumping from 100 Gb/sec to 400 Gb/sec will get 4X the bandwidth per port for 3X the cost.

Over time, we expect that there will be price parity between 100 Gb/sec pricing today and 200 Gb/sec pricing, perhaps two years hence, and that the premium for 400 Gb/sec will be more like 50 percent than 100 percent. But those are just guesses. A lot depends on what happens in the enterprise. What we do know is that enterprises are increasingly being forced by their applications and the latency demands of their end user applications to deploy the kind of fat tree networks that are common at HPC centers and hyperscalers and they are moving away from the over-subscribed, tiered networks of the past where they could skimp on the switch devices and hope the latencies were not too bad.

Share RecommendKeepReplyMark as Last Read

From: PaulAquino7/11/2017 9:16:32 AM
   of 940
OpenPower, Efficiency Tweaks Define Europe’s DAVIDE Supercomputer

July 11, 2017 Jeffrey Burt

When talking about the future of supercomputers and high-performance computing, the focus tends to fall on the ongoing and high-profile competition between the United States with its slowly eroding place as the kingpin in the industry and China and the tens of billions of dollars that the government has invested in recent years to rapidly expand the reach of the country’s tech community and the use of home-grown technologies in massive new systems.

Both trends were on display at the recent International Supercomputing Conference in Frankfurt, Germany, where China not only continued to hold the top two spots on the Top500 list of the world’s fastest supercomputers with the Sunway TaihuLight and Tianhe-2 systems, but a performance boost in the Cray-based Piz Daint supercomputer in Switzerland pushed it into the number-three spot, marking only the second time in 24 years – and the first time since November 1996 – that a U.S.-based supercomputer has not held one of the top three spots.

That said, the United States had five of the top 10 fastest systems on the list, and still has the most – 169 – of the top 500, with China trailing in second at 160. But as we’ve noted here at The Next Platform, the United States’ dominance of the HPC space is no longer assured, and the combination of China’s aggressive efforts in the field and worries about what a Trump administration may mean to funding of U.S. initiatives has fueled speculation of what the future holds and has garnered much of the attention.

However, not to be overlooked, Europe – as illustrated by the rise of the Piz Daint system, which saw its performance double with the addition of more Nvidia Tesla 100 GPU accelerators – is making its own case as a significant player in the HPC arena. For example, Germany houses 28 of the supercomputers in the latest Top500 list released last month, with France and the United Kingdom both with 17. In total, Europe houses 105 of the Top500 systems, good for 21 percent of the market and third behind the United States and China.

Something that didn’t generate a lot of attention at the show was the introduction of the DAVIDE (Development for an Added Value Infrastructure Designed in Europe) system onto the list, coming in at 299 with a performance of 654.2 teraflops and a peak performance of more than 991 teraflops. Built by Italian vendor E4 Computer Engineering, DAVIDE came out of a multi-year Pre-Commercial Procurement (PCP) project of the Partnership for Advanced Computing in Europe (PRACE), a nonprofit association headquartered in Brussels. PRACE in 2014 kicked off the first phase of its project to fund the development of a highly power-efficient HPC system design. PCP is a process in Europe in which different vendors compete through multiple phases of development in a project. A procurer like PRACE gets multiple vendors involved in the initial stage of the program and then compete through phases – solution design, prototyping, original development and validation and testing of first projects. Through each evaluation phase, the number of competing vendors is reduced. The idea is to have the procurers share risks and benefits of innovation with the vendors. Over the course of almost three years, the number of system vendors for this program was whittled down from four – E4, Bull, Megaware and Maxeler Technologies – with E4 last year being awarded the contract to build its system.

The goal was to build an HPC system that can run highly parallelized, memory-intensive workloads such as weather forecasting, machine learning, genomic sequencing and computational fluid dynamics, the type of applications that are becoming more commonplace in HPC environments. At the same time, power efficiency also was key. With a peak performance of more than 991 teraflops, the 45-node cluster consumes less than 2 kilowatts per node, according to E4.

E4 already offers systems powered by Intel’s x86 processors, ARM-based chip and GPUs, but for DAVIDE, the company opted for IBM’s OpenPower Foundation, an open hardware development community that IBM officials launched with partners like Nvidia and Google in 2014 to extend the reach of the vendor’s Power architecture beyond the core data center and into new growth areas like hyperscale environments and emerging workloads – including machine learning, artificial intelligence and virtual reality – while cutting into Intel’s dominant share of the server chip market. E4 already is a member of the foundation and builds other systems running on Power.

Each 2U node of the cluster is based on the OpenPower systems design codenamed “Minsky” and runs on two 3.62GHz Power8+ eight-core processors and four Nvidia Tesla P100 SXM2 GPUs based on the company’s “Pascal” architecture and aimed at such modern workloads as artificial intelligence and climate predictions. The GPUs are hooked into the CPUs via Nvidia’s NVLink high-speed interconnect, and the nodes use Mellanox Technologies’ EDR 100 Gb/s Infiniband interconnects as well as 1 Gigabit Ethernet networking. Each node offers a maximum performance of 22 TFlops. In total, the cluster runs on 10,800 Power8+ cores, with 11,520GB of memory and SSD SATA and NVMe drives for storage. It runs the CentOS Linux operating system

The Minsky nodes are designed to be air-cooled, but to increase the power efficiency of the system, E4 is leveraging technology from CoolIT Systems that uses direct hot-water cooling – at between 35 and 40 degrees Celsius – for the CPUs and GPUs, with cooling capacity of 40 kW. Each rack includes an independent liquid-liquid or liquid/air heat exchanger unit with redundant pumps, and the compute nodes are connected to the heat exchange via pipes and a side bar for water distribution. E4 officials estimate that the CoolIT technology can extract about 80 percent of heat generated by the nodes.

The chassis is custom-built and based on the OpenRack form factor.

Power efficiency is further enhanced by software developed in conjunction with the University of Bologna that enables fine-grain measuring and monitoring of power consumption by the nodes and the system as a whole through data collected from components like the CPUs, GPUs, memory components and fans. There also is the ability to cap the amount of power used and schedule tasks based on the amount of power being consumed and to profile the power an application uses, according to the vendor. A dedicated power monitor interface – based on the BeagleBone Black Board, an open-source development platform – enables frequent direct sampling from the power backplane and integrates with the system-level power management software.

DAVIDE takes advantage of other IBM technologies, including IBM XL compilers, ESSL math library and Spectrum MPI, and APIs enable developers to tune the cluster’s performance and power consumption.

E4 currently is making DAVIDE available to some users for just jobs as porting applications and profiling energy consumption.

Share RecommendKeepReplyMark as Last Read

From: PaulAquino7/11/2017 12:27:54 PM
1 Recommendation   of 940
Mellanox InfiniBand and Ethernet Solutions Accelerate New Intel®Xeon® Scalable Processor-Based Platforms for High Return on Investment

Provides Improved Performance, Efficiency and Scalability for High-Performance, Cloud, Hyperscale, Artificial Intelligence, Storage Applications and More

July 11, 2017 12:15 PM Eastern Daylight Time

SUNNYVALE, Calif. & YOKNEAM, Israel--( BUSINESS WIRE)--Mellanox® Technologies, Ltd. (NASDAQ:MLNX), a leading supplier of high-performance, end-to-end smart interconnect solutions for data center servers and storage systems, today announced that its InfiniBand and Ethernet smart interconnect solutions deliver improved performance, efficiency and scalability for new Intel® Xeon® Scalable processor-based compute and storage platforms. In order to effectively maximize the increase in the processor performance and the increased number of the processor cores, and to maximize the data center return on investment, faster and more intelligent interconnect is needed. Mellanox Ethernet solutions help users migrate from 10 and 40 gigabit networks to 25, 50, 100 and soon, 200 and 400G speeds, which support the new data analysis capabilities in these new platforms. Mellanox InfiniBand solutions provide In-Network Computing acceleration engines to enhance the Intel® Xeon® Scalable processor usage and overall applications productivity. Leveraging Mellanox technology, IT managers and data center users can enjoy an optimized out-of-the-box experience, along with backwards and future compatibility, thereby protecting their investment.

“The combination of Mellanox In-Network Computing technology, the highest data speeds and other acceleration engines, in concert with Intel’s new Intel® Xeon® Scalable processor-based platforms, enables data center users to enjoy new performance and scalability levels in the best possible way,” said Gilad Shainer, vice president of marketing, Mellanox Technologies. “We have been testing a variety of applications and have demonstrated an increase of 25 to 35 percent in overall system performance with Intel® Xeon® Scalable processors, compared to Intel’s previous-generation processor. We are happy to work with Intel and our partners to ensure that our Ethernet and InfiniBand solutions are fully optimized and ready to provide a world-leading performance and out-of-the-box experience for Intel® Xeon® Scalable processor-based compute and storage systems.”


The entire Mellanox InfiniBand and Ethernet solution portfolio is currently available for the new Intel® Xeon® Scalable platforms.

Share RecommendKeepReplyMark as Last ReadRead Replies (1)

To: PaulAquino who wrote (863)7/11/2017 1:46:56 PM
From: brokendreams
   of 940
Wait, so the products still work moving to the next x86 platform and their is more application performance because of more cores, speeds and feeds? Wow, imagine that. Just like every other instance.

Ok, here's the real news. Skylake is finally "out" and getting into people's hands. All those deals that Mellanox is waiting to fill due to the delay can now begin filling. I'm expecting them to at least reach the low-end of their guidance for q2 while projecting a nice increase in q3 and maybe a larger pop in q4.

Also, IBTA made an announcement today and one goody inside was that "HDR InfiniBand" gear would be testing in October.

Share RecommendKeepReplyMark as Last Read

From: brokendreams7/11/2017 4:05:19 PM
   of 940
This is the big ?

In yet another notable step forward, the new Intel Xeon Scalable platform includes processors that integrate Intel Omni-Path Architecture (Intel OPA). Intel OPA provides 100Gbps bandwidth with low latency for HPC clusters. With this integration, users can take advantage of a fabric built for HPC without consuming I/O ports and slots, freeing those up for other platform uses.

In addition:
Provides up to 4x10GbE high speed ethernet capability for high data throughput and low latency workloads, reducing total system cost, power consumption and network latency2. Ideal for software-defined storage solutions, NVM Express* over fabric solutions and virtual machine migrations.

This is all integrated in the new Xeon Scalable Processors

Share RecommendKeepReplyMark as Last Read

From: PaulAquino7/12/2017 8:17:37 AM
1 Recommendation   of 940
Mellanox Spectrum™-2 Ethernet Switch

Product Brief

Share RecommendKeepReplyMark as Last Read

From: brokendreams7/17/2017 11:07:30 AM
1 Recommendation   of 940
One thing to keep in mind. There is always a bottleneck somewhere.

If you look at the 400 Gb/sec Ethernet products coming from Innovium and Mellanox Technologies, it would be nice if PCI-Express 5.0 was here next year instead of two years from now. Switching is getting back in synch with compute, but the PCI bus is lagging, and that is not a good thing considering how many things that are moving very fast now hang off of it.

Share RecommendKeepReplyMark as Last Read

From: daytraderdude7/26/2017 10:52:52 AM
   of 940
How bad will it be this time?

Share RecommendKeepReplyMark as Last ReadRead Replies (1)
Previous 10 Next 10 

Copyright © 1995-2018 Knight Sac Media. All rights reserved.Stock quotes are delayed at least 15 minutes - See Terms of Use.