HUNTSVILLE, Ala., Feb. 21 — Abaco Systems today announced the innovative SWE540 6U OpenVPX 40 Gigabit Ethernet switch. It provides high speed Ethernet connectivity for Abaco’s latest generation of high performance computing solutions such as the SBC627 single board computer and the recently-announced DSP282A digital signal processor. This enables the company to deliver significant performance upgrades for its growing range of rapidly-deployable, complete, pre-integrated mission ready systems. These solutions target high performance embedded computing (HPEC) applications requiring the transfer of large amounts of data with the lowest possible latency such as radar, surveillance, situational awareness and imaging.
The rugged SWE540 is uniquely powerful and flexible, representing the only 6U OpenVPX 40GigE switch currently available that supports full Layer 2/3 features including hardware Layer 3 forwarding at fabric speed rates. Layer 3 switching and routing is a critical requirement for advanced security and complex networks. It provides dynamic routing over standard routing protocols, enabling a flexible range of network/fabric configurations and applications.
Superior patent-pending cooling technology limits system thermal load while still enabling the SWE540 to run at peak performance, further enhancing its robust reliability.
The SWE540 supports multiple OpenVPX profiles and uses the latest high performance switch silicon technology to support 40 Gigabit Ethernet performance across 20 ports, allowing configuration within demanding HPEC systems and a broad range of Abaco’s mission ready systems.
It also provides a straightforward, cost-effective upgrade for existing users of Abaco’s GBX460 switch, maximizing the long term value of customers’ investments while enabling a significant performance increase for those looking to achieve faster transfers of large amounts of data with lower latencies.
The switch features Abaco’s latest OpenWare switch management software, allowing it to be easily customized for specific customer requirements. OpenWare provides support for a wide range of network protocols and MIBs (management information bases) with extensive capabilities for multicast, Quality of Service, VLANs, and differentiated services. The OpenWare management interface may be accessed via serial console, SNMP, Telnet, SSH or web interface.
The combination of the SWE540’s hardware and the OpenWare switch management software delivers comprehensive security capabilities. Designed for deployment in security-sensitive mission critical applications, SWE540 features include denial of service attack prevention, user password mechanisms with multiple levels of security and military level authorization schemes including 802.1X and sanitization to allow the overwrite of non-volatile storage if a system is compromised. Survivability is also enhanced by ECC protection on the management processor memory which offers higher reliability in harsh environments.
“The SWE540 is representative of over 20 years of Abaco’s leadership in network switch design and network engineering for mission critical applications and will enable us to deliver a step-change in performance for Abaco’s mission ready systems and customer-built systems alike,” said Mrinal Iyengar, VP, Product Management, Abaco Systems. “Other 40 Gigabit Ethernet solutions offer limited routing capability – either by not offering L3 routing at all, or by limiting it to static routing. This provides far less flexibility, limiting the use to simpler, fixed networks. The SWE540 provides L3 forwarding in the switch fabric, supported by routing protocols such as RIPv3, OSPF and so on – making it ideally suited for networks that are more complex, security sensitive, or that will be required scale and be upgraded over time.”
The SWE540 is available in both air-cooled and conduction-cooled versions, and can optionally support four QSFP+ and two 1000BaseT ports on the front panel. Rear transition modules are available to enable access to 40 Gigabit and one Gigabit ports off the backplane.
About Abaco Systems
With more than 30 years of experience, Abaco Systems is the global leader in open architecture rugged embedded mission ready systems. We deliver market-leading commercial off-the-shelf and custom products, together with best in class program lifecycle management. This, together with our 800+ professionals’ unwavering focus on our customers’ success, reduces program cost and risk, allows technology insertion with affordable readiness and enables platforms to successfully reach deployment sooner and with a lower total cost of ownership. With an active presence in hundreds of national asset platforms on land, sea and in the air, Abaco Systems is trusted where it matters most. www.abaco.com
Source: Abaco Systems
Feb. 21 — Xcelerit, the leading providers of acceleration solutions for Quantitative Finance, engineering and research, have added yet another new architecture to its expanding portfolio of processor support. The new Nvidia Tesla P100 GPU accelerator delivers 5 teraflops of double precision arithmetic – an unprecedented level of computing power that will enable new applications in machine learning, quant finance and to the supercomputing community in general. Accessing this awesome power normally requires specialist expert programming to handle the data transfers, threads and synchronisation, memory access, and register usage. The Xcelerit SDK has been making it easier for programmers to access this power over a succession of GPU architectures as well as more mainstream systems like many-core CPUs. Hicham Lahlou, Xcelerit’s CEO, is enthusiastic about the P100 – “Our customers always want the latest, fastest hardware and for those that have used Xcelerit since the beginning, they can now painlessly move their code from CPU to the P100 GPU and back with no changes required.”
The Xcelerit SDK works by allowing users to quickly identify the compute intensive parts of their C++ code and augment them with some simple programming model to reveal the hidden potential for parallelism. Once this is done, the SDK automatically takes care of mapping the code to any of the supported architectures and looks after scheduling the tasks to take maximum advantage of the underlying hardware. “The SDK is really quite adaptable,” says Lahlou, “we have been able to tailor it to very many processor architectures, instruction sets, memory configurations and inter-connects to squeeze every drop of performance out of the underlying hardware.” Lahlou feels that this capability may become even more essential over the coming years. “We are seeing a healthy diversity in processor designs coming from manufacturers such as Intel, Nvidia, Qualcomm and others – making performance code work super-efficiently across all of those platforms will keep us on our toes over the coming 12-18 months,” he said.
Xcelerit is a leading provider of acceleration solutions for Quantitative Finance, engineering, and research. Our portfolio of solutions addresses a range of acceleration challenges from algorithmic optimisations to software acceleration.
Xcelerit has received recognition as a finalist in the Red Herring Europe Top 100 award, the Red Herring Top 100 Global award, and a two-time winner of HPCwire’s “Best use of High Performance Computing in Financial Services” award. Our satisfied customers include the leading firms in investment banking, asset management, and insurance. For more information, please visit www.xcelerit.com.
The post Xcelerit Adds New Architecture to Portfolio of Processor Support appeared first on HPCwire.
The partnership will see Hammer adding Spectra Logic’s high-capacity workflow, tape and disk-based products to its portfolio and will allow Spectra Logic to strengthen its position in key backup and archive markets. By partnering with Hammer, Spectra will grow its enterprise-level reseller customer base.
Jason Beeson, Hammer’s Commercial Director, said: “This is an excellent opportunity to increase our high-performance computing offering to our partners and customers. By adding Spectra Logic’s bespoke data workflow storage solutions we can reach a whole new genre of highly data-dependent users who are seeking a complete data workflow, from input and day-to-day use right through to deep storage and archiving.”
Spectra Logic’s integrated tape and disk products have become the prevailing standard for those sectors challenged with storing, managing and accessing massive amounts of data and where high-performance computing is mission critical; sectors such as media and entertainment, scientific research, healthcare and financial services.
As Spectra Logic’s object-based storage systems links directly to the public cloud, this distribution agreement will also enable Hammer to enhance its current cloud portfolio.
At the centre of Spectra Logic’s hybrid storage ecosystem lies the Spectra BlackPearl Deep Storage Gateway, which enables users easily to store large data sets forever at virtually no cost. It provides a single interface into deep storage using cloud protocols. The Spectra Logic product family delivers the industry’s best combination of high-density, scalable storage, designed for superior performance and capacity and includes tape libraries, such as Spectra TFinity and the Spectra T950, as well as disk products, featuring the Spectra Verde and Spectra ArcticBlue Disk Solutions.
Brian Grainger, Chief Sales Officer at Spectra Logic, said: “We’ve seen a major increase in demand for high-capacity, deep storage solutions, which have become essential to a wide range of businesses grappling with the challenges of storing, managing and accessing data while dealing with the rapidly evolving mandates in Europe that are driving changes in storage requirements.
“With Hammer’s help, we aim to reach more businesses that can benefit from our product range. We are keen to work with Hammer because it is among the most specialised storage distributors in Europe and has a range of products which are complementary to our own.”
Gerard Marlow, General Manager – OEM & Whitebox Storage at Hammer, said: “For 25 years, Hammer has added value at every opportunity to ensure our channel of resellers meet and exceed their customer expectations. To achieve this we constantly seek to add leading and innovative products to our portfolio and the range of deep storage products from Spectra Logic will meet this objective.”
Spectra Logic is the third company this year to join Hammer’s portfolio of world-class vendors, following Samsung Semiconductors and Huawei.
Source: Spectra Logic
The post Spectra Logic, Hammer Announce EMEA-Wide Distribution Deal appeared first on HPCwire.
In a scaling breakthrough for oil and gas discovery, ExxonMobil geoscientists report they have harnessed the power of 717,000 processors – the equivalent of 22,000 32-processor computers – to run complex oil and gas reservoir simulation models.
This is more than four times the previous number of processors used in energy exploration HPC implementations, according to ExxonMobil, which worked with the National Center for Supercomputing Applications (NCSA) at the University of Illinois, Champaign-Urbana, and its Cray XE6 “Blue Waters” petascale system.
“Reservoir simulation has long been seen as difficult to scale beyond a few thousand processors,” John D. Kuzan, manager, reservoir function for ExxonMobil Upstream Research Company, told HPCwire’s sister publication EnterpriseTech, “and even then, ‘scale’ might mean (at best on a simple problem) ~50 percent efficiency. The ability to scale to 700,000-plus is remarkable – and gives us confidence that in the day-to-day use of this capability we will be efficient at a few thousand processors for a given simulation run (knowing that on any given day in the corporation there are many simulations being run).”(Source: NCSA)
The objective of the scaling effort is for ExxonMobil geoscientists and engineers to make better, and a higher number of, drilling decisions by more efficiently predicting reservoir performance. The company said the run resulted in data output thousands of times faster than typical oil and gas industry reservoir simulations and that was the largest number of processor counts reported by the energy industry.
ExxonMobil’s scientists, who have worked with the NCSA on various projects since 2008, began work on the “half million” challenge – i.e., scaling reservoir simulations past half a million processors – since 2015. NCSA’s Blue Waters system is one of the most powerful supercomputers in the world. Scientists and engineers use the system on a range of engineering and scientific problems. It uses hundreds of thousands of computational cores to achieve peak performance of more than 13 quadrillion calculations per second and has more than 1.5 PB of memory, 25 PB of disk storage and 500 PB of tape storage.
The reservoir simulation benchmark involved a series of multi-million to billion cell models on Blue Waters using hundreds of thousands of processors simultaneously. The project required optimization of all aspects of the reservoir simulator, from input/output to communications across hundreds of thousands of processors.
“The partnership with NCSA was important because we had the opportunity to use ‘all’ of Blue Waters,” said Kuzan, “and when trying to use the full capability/capacity of a machine the logistics can be a challenge. It means not having the machine for some other project (even if it is for only a few minutes per run). The NCSA was willing to accommodate this and worked very hard not to disrupt others using the machine.”
The simulations were run on a proprietary ExxonMobil application, one that Kuzan said has not yet been named but is referred to as the “integrated reservoir modeling and simulation platform.”
Reservoir simulation studies are used to guide decisions, such as well placement, the design of facilities and development of operational strategies, to minimize financial and environmental costs. To model complex processes accurately for the flow of oil, water, and natural gas in the reservoir, simulation software solves a number of complex equations. Current reservoir management practices in the oil and gas industry are often hampered by the slow speed of reservoir simulation.
“NCSA’s Blue Waters sustained petascale system, which has benefited the open science community so tremendously, is also helping industry break through barriers in massively parallel computing,” said Bill Gropp, NCSA’s acting director.
The post ExxonMobil, NCSA, Cray Scale Reservoir Simulation to 700,000+ Processors appeared first on HPCwire.
Since our initial coverage of the TSUBAME3.0 supercomputer yesterday, more details have come to light on this innovative project. Of particular interest is a new board design for NVLink-equipped Pascal P100 GPUs that will create another entrant to the space currently occupied by Nvidia’s DGX-1 system, IBM’s “Minsky” platform and the Supermicro SuperServer (1028GQ-TXR).
The press photo shared by Tokyo Tech revealed TSUBAME3.0 to be an HPE-branded SGI ICE supercomputer. The choice is not surprising considering that SGI has long held a strong presence in Japan. SGI Japan, the primary contractor here, has collaborated with Tokyo Tech on a brand-new board design that we’ve been told is destined for the HPE product line.TSUBAME3.0 node design (source: Tokyo Tech)
The board is first of its kind in employing Nvidia GPUs (four), NVLink processor interconnect technology, Intel processors (two) and the Intel Omni-Path Architecture (OPA) fabric. Four SXM2 P100s are configured into a hybrid mesh cube, making full use of the NVLink (1.0) interconnect to offer a large amount of memory bandwidth between the GPUs. As you can see in the figure on the right, each half the quad connects to its own PLX PCIe switch, which links to an Intel Xeon CPU. The PCIe switches also enable direct one-to-one connections between the GPUs and an Omni-Path link. A slide from a presentation shared by Tokyo Tech depicts how the this hooks into the fabric.
TSUBAME3.0 will be comprised of 540 such nodes for a total of 2,160 SXM2 P100s and 1,080 Xeon E5-2680 V4 (14 core) CPUs.
At the rack level, 36 server blades house a total of 144 Pascals and 72 Xeons. The components are water cooled with an inlet water temperature a warm 32 degrees (C), for a PUE of 1.033. “That’s lower than any other supercomputer I know,” commented Tokyo Tech Professor Satoshi Matsuoka, who is leading the design. (Here’s a diagram of the entire cooling system.)
Each node also has 2TBs of NVMe SSD for I/O acceleration, totalling more than 1 petabyte for the entire system. It can be used locally, or aggregated on-the-fly with BGFS as an ad-hoc “Burst-Buffer” filesystem, Matsuoka told us.
The second-tier storage is composed of DDN’s Exascalar technology, which uses controller integration to achieve a 15.9PB Lustre parallel file system in three racks.TSUBAME3.0 node overview (source: Tokyo Tech)
With 15 SGI ICE XA racks and two network racks, TSUBAME3.0 delivers 12.2 petaflops of spec’d computational power within 20 racks (excluding the in-row chillers). This makes TSUBAME 3.0 the smallest >10 petaflops machine in the world, said Matsuoka, who offered for comparison the K computer (10.5 Linpack petaflops, 11.3 peak) which extends to 1,000 racks, a 66X delta.
Like TSUBAME2.0/2.5, the new system continues the endorsement of smart partitioning. “The TSUBAME3.0 node is ‘fat’ but we want flexible partitioning,” said Matsuoka. “We will be using container technology as a default, being able to partition the nodes arbitrarily into pieces for flexible scheduling and achieving very high utilization. A job that uses only CPUs or just one GPU won’t waste the remaining resources on the node.”
As we noted in our earlier coverage, total rated system performance is 12.15 double-precision petaflops, 24.3 single-precision petaflops and 47.2 half-precision petaflops, aka “AI-Petaflops.”
“Since we will keep TSUBAME2.5 and KFC alive, the combined ‘AI-capable’ performances of the three machines will reach 65.8 petaflops, making it the biggest capacity infrastructure for ML/AI in Japan, or 6 times faster than the K-computer,” said Matsuoka.Satoshi Matsuoka with the TSUBAME3.0 blade
At yesterday’s press event in Japan, Professor Matsuoka also revealed that Tokyo Tech and the National Institute of Advanced Industrial Science and Technology (AIST) are going to open their joint “Open Innovation Laboratory” (OIL) next Monday, Feb. 20. Prof. Matsuoka will lead this organization and TSUBAME3.0 will be partially used for these joint efforts. The main resource of OIL will be an upcoming massive AI supercomputer, named “ABCI,” announced in late November (2016). So in some respects, TSUBAME3.0, with an operational target of summer 2017, will be a prototype machine to ABCI, which has a targeted installation of Q1 2018.
“Overall, I believe TSUBAME3.0 to be way above class compared to any supercomputers that exist, including the [other] GPU-based ones,” Professor Matsuoka told HPCwire. “There are not really any technical compromises, and thus the efficiency of the machine by every metric will be extremely good.”
The post TSUBAME3.0 Points to Future HPE Pascal-NVLink-OPA Server appeared first on HPCwire.
SANTA CLARA, Calif., Feb. 17 — DataDirect Networks (DDN) today announced that the Tokyo Institute of Technology (Tokyo Tech) has selected DDN as its strategic storage infrastructure provider for the new TSUBAME3.0 supercomputing system. The innovative design of the TSUBAME3.0 is a major step along an evolutionary path toward a fundamental convergence of data and compute. TSUBAME3.0 breaks with many of the conventions of the world’s top supercomputers, incorporating elements and design points from containerization, cloud, artificial intelligence (AI) and Big Data, and it exhibits extreme innovation in the area of power consumption and system efficiency.
“As we run out the clock on Moore’s law, performance enhancements will increasingly be driven by improvements in data access times that come from faster storage media and networks, innovative data access approaches and the improvement of algorithms that interact with data subsystems,” said Satoshi Matsuoka, Professor, Ph.D., of the High Performance Computing Systems Group, GSIC, Tokyo Institute of Technology.
The IO infrastructure of TSUBAME3.0 combines fast in-node NVMe SSDs and a large, fast, Lustre-based system from DDN. The 15.9PB Lustre parallel file system, composed of three of DDN’s high-end ES14KX storage appliances, is rated at a peak performance of 150GB/s. The TSUBAME collaboration represents an evolutionary branch of HPC that could well develop into the dominant HPC paradigm at about the time the most advanced supercomputing nations and consortia achieve Exascale computing.
DDN and Tokyo Tech have worked together, starting with TSUBAME2.0, the previous generation supercomputer at Tokyo Tech, which debuted in the #4 spot on the Top500 and was certified as “the Greenest Production Supercomputer in the World.”
“Our collaboration with Tokyo Tech began more than six years ago and has spanned several implementations of the TSUBAME system,” said Robert Triendl, senior vice president of global sales, marketing and field services, DDN. “What is exciting about working with Satoshi and his team is the clear vision of advancing research computing from systems that support tightly-coupled simulations toward a new generation of data-centric infrastructures for the future of research big data but also AI and machine learning.”
Operated by the Global Scientific Information and Computing Center at Tokyo Tech, the TSUBAME systems are used by a variety of scientific disciplines and a wide-ranging community of users. Tokyo Tech researchers – professors and students – are the top users of the system, followed by industrial users, foreign researchers and external researchers working in collaboration with Tokyo Tech professors.
“Tokyo Tech is very pleased with our DDN solution and long-term partnership, and we are looking forward to teaming with DDN on future storage technologies for new application areas, such as graph computing and machine learning,” added Matsuoka.
DataDirect Networks (DDN) is the world’s leading big data storage supplier to data-intensive, global organizations. For more than 18 years, DDN has designed, developed, deployed and optimized systems, software and storage solutions that enable enterprises, service providers, universities and government agencies to generate more value and to accelerate time to insight from their data and information, on premise and in the cloud. Organizations leverage the power of DDN storage technology and the deep technical expertise of its team to capture, store, process, analyze, collaborate and distribute data, information and content at the largest scale in the most efficient, reliable and cost-effective manner. DDN customers include many of the world’s leading financial services firms and banks, healthcare and life science organizations, manufacturing and energy companies, government and research facilities, and web and cloud service providers. For more information, go to www.ddn.com or call 1-800-837-2298.
The post Tokyo Tech Selects DDN as Storage Infrastructure Provider for New TSUBAME3.0 Supercomputer appeared first on HPCwire.
Feb. 17 — One of the main tools doctors use to detect diseases and injuries in cases ranging from multiple sclerosis to broken bones is magnetic resonance imaging (MRI). However, the results of an MRI scan take hours or days to interpret and analyze. This means that if a more detailed investigation is needed, or there is a problem with the scan, the patient needs to return for a follow-up.
A new, supercomputing-powered, real-time analysis system may change that.
Researchers from the Texas Advanced Computing Center (TACC), The University of Texas Health Science Center (UTHSC) and Philips Healthcare, have developed a new, automated platform capable of returning in-depth analyses of MRI scans in minutes, thereby minimizing patient callbacks, saving millions of dollars annually, and advancing precision medicine.
The team presented a proof-of-concept demonstration of the platform at the International Conference on Biomedical and Health Informatics this week in Orlando, Florida.
The platform they developed combines the imaging capabilities of the Philips MRI scanner with the processing power of the Stampede supercomputer – one of the fastest in the world – using the TACC-developed Agave API Platform infrastructure to facilitate communication, data transfer, and job control between the two.
An API, or Application Program Interface, is a set of protocols and tools that specify how software components should interact. Agave manages the execution of the computing jobs and handles the flow of data from site to site. It has been used for a range of problems, from plant genomics to molecular simulations, and allows researchers to access cyberinfrastructure resources like Stampede via the web.
“The Agave Platform brings the power of high-performance computing into the clinic,” said William (Joe) Allen, a life science researcher for TACC and lead author on the paper. “This gives radiologists and other clinical staff the means to provide real-time quality control, precision medicine, and overall better care to the patient.”
The entire article can be found here.
Source: Aaron Dubrow, TACC
The post Stampede Supercomputer Assists With Real-Time MRI Analysis appeared first on HPCwire.
Feb. 17 — Advanced Clustering Technologies is helping customers solve challenges by integrating NVIDIA Tesla P100 accelerators into its line of high performance computing clusters. Advanced Clustering Technologies builds custom, turn-key HPC clusters that are used for a wide range of workloads including analytics, deep learning, life sciences, engineering simulation and modeling, climate and weather study, energy exploration, and improving manufacturing processes.
“NVIDIA-enabled GPU clusters are proving very effective for our customers in academia, research and industry,” said Jim Paugh, Director of Sales at Advanced Clustering. “The Tesla P100 is a giant step forward in accelerating scientific research, which leads to breakthroughs in a wide variety of disciplines.”
Tesla P100 GPU accelerators are based on NVIDIA’s latest Pascal GPU architecture, which provides the throughput of more than 32 commodity CPU-based nodes. The Tesla P100 specifications are:
- 5.3 teraflops double-precision performance
- 10.6 teraflops single-precision performance
- 21.2 teraflops half-precision performance
- 732GB/sec memory bandwidth with CoWoS HBM2 stacked memory
- ECC protection for increased reliability
“Customers taking advantage of Advanced Clustering’s high performance computing clusters with integrated NVIDIA Tesla P100 GPUs benefit from the most technologically advanced accelerated computing solution in the market – greatly speeding workload performance across analytics, simulation and modeling, deep learning and more,” said Randy Lewis, Senior Director of Worldwide Field Operations at NVIDIA.
About Advanced Clustering
Advanced Clustering Technologies, a privately held corporation based in Kansas City, Missouri, is dedicated to developing high-performance computing (HPC) solutions. The company provides highly customized turn-key cluster systems — utilizing out-of-the-box technology — to companies and organizations with specialized computing needs.
The technical and sales teams have more than 50 years of combined industry experience and comprehensive knowledge in the areas of cluster topologies and cluster configurations. In business since 2001, Advanced Clustering Technologies’ commitment to exceeding client expectations has earned the company the reputation as one of the nation’s premier providers of high performance computing systems. For more details, please visit http://www.advancedclustering.com/technologies/gpu-computing/.
Source: Advanced Clustering
The post Advanced Clustering Integrating NVIDIA Tesla P100 Accelerators Into Line of HPC Clusters appeared first on HPCwire.
In a press event Friday afternoon local time in Japan, Tokyo Institute of Technology (Tokyo Tech) announced its plans for the TSUBAME 3.0 supercomputer, which will be Japan’s “fastest AI supercomputer,” when it comes online this summer (2017). Projections are that it will deliver 12.2 double-precision petaflops and 64.3 half-precision (peak specs).
Nvidia was the first vendor to publicly share the news in the US. We know that Nvidia will be supplying Pascal P100 GPUs, but the big surprise here is the system vendor. The Nvidia blog did not specifically mention HPE or SGI but it did include this photo with a caption referencing it as TSUBAME3.0:TSUBAME3.0 – click to expand (Source: Nvidia)
That is most certainly an HPE-rebrand of the SGI ICE XA supercomputer, which would make this the first SGI system win since the supercomputer maker was brought into the HPE fold. For fun, here’s a photo of the University of Tokyo’s “supercomputer system B,” an SGI ICE XA/UV hybrid system:Source: University of Tokyo-Institute for Solid State Physics
TSUBAME3.0 is on track to deliver more than two times the performance of its predecessor, TSUBAME2.5, which ranks 40th on the latest Top500 list (Nov. 2016) with a LINPACK score of 2.8 petaflops (peak: 5.6 petaflops). When TSUBAME was upgraded from 2.0 to 2.5 in the fall of 2013, the HP Proliant SL390s hardware stayed the same, but the GPU was switched from the NVIDIA (Fermi) Tesla M2050 to the (Kepler) Tesla K20X.
Increasingly, we’re seeing Nvidia refer to half-precision floating point capability as “AI computation.” Half-precision is suitable for many AI training workloads (but by no means all) and it’s usually sufficient for inferencing tasks.
With this rubric in mind, Nvidia says TSUBAME3.0 is expected to deliver more than 47 petaflops of “AI horsepower” and when operated in tandem with TSUBAME2.5, the top speed increases to 64.3 petaflops, which would give it the distinction of being Japan’s highest performing AI supercomputer.
According to a Japanese-issue press release, DDN will be supplying the storage infrastructure for TSUBAME 3.0. The high-end storage vendor is providing a combination of high-speed in-node NVMe SSD and its high-speed Lustre-based EXAScaler parallel file system, consisting of three racks of DDN’s high-end ES14KX appliance with capacity of 15.9 petabytes and a peak performance of 150 GB/sec.
TSUBAME3.0 is expected to be up and running this summer. The Nvidia release notes, “It will used for education and high-technology research at Tokyo Tech, and be accessible to outside researchers in the private sector. It will also serve as an information infrastructure center for leading Japanese universities.”
“NVIDIA’s broad AI ecosystem, including thousands of deep learning and inference applications, will enable Tokyo Tech to begin training TSUBAME3.0 immediately to help us more quickly solve some of the world’s once unsolvable problems,” said Tokyo Tech Professor Satoshi Matsuoka, who has been leading the TSUBAME program since it began.
“Artificial intelligence is rapidly becoming a key application for supercomputing,” said Ian Buck, vice president and general manager of Accelerated Computing at NVIDIA. “NVIDIA’s GPU computing platform merges AI with HPC, accelerating computation so that scientists and researchers can drive life-changing advances in such fields as healthcare, energy and transportation.”
We remind you the story is still breaking, but wanted to share what we know at this point. We’ll add further details as they become available.
The post Tokyo Tech’s TSUBAME3.0 Will Be First HPE-SGI Super appeared first on HPCwire.
Within the haystack of a lethal disease such as ALS (amyotrophic lateral sclerosis / Lou Gehrig’s Disease) there exists, somewhere, the needle that will pierce this therapy-resistant affliction. Finding the needle is a trial-and-error process of monumental proportions for scientists at pharmaceutical companies, medical research centers and academic institutions. As models grow in scale so too does the need for HPC resources to run simulations iteratively, to try-and-fail fast until success is found.
That’s all well and good if there’s ready access to HPC on premises. If not, drug developers, such as ALS researcher Dr. May Khanna, Pharmacology Department assistant professor at the University of Arizona, have turned to HPC resources provided by public cloud services. But using AWS, Azure or Google introduces a host of daunting compute management problems that tax the skills and time availability of most on-site IT staffs.
These tasks include data placement, instance provisioning, job scheduling, configuring software and networks, cluster startup and tear-down, cloud provider setup, cost management and instance health checking. To handle these cloud orchestration functions tied to 5,000 cores of Google Cloud Preemptive VMs (PVMs), Dr. Khanna and her team at Arizona turned to Cycle Computing to run “molecular docking” simulations at scale by Schrödinger Glide molecular modeling drug design software.
The results: simulations that would otherwise take months have been compressed to a few hours, short enough to be run during one of Dr. Khanna’s seminars and the output shared with students.
Developing new drugs to target a specific disease often starts with the building blocks of the compounds that become the drugs. The process begins with finding small molecules that can target specific proteins that, when combined, can interact in a way that becomes the disease’s starting point. The goal is to find a molecule that breaks the proteins apart. This is done by simulating how the small molecules dock to the specific protein locations. These simulations are computationally intensive, and many molecules need to be simulated to find a few good candidates.
Without powerful compute resources, researchers must artificially constrain their searches, limiting the number of molecules to simulate. And they only check an area of the protein known to be biologically active. Even with these constraints, running simulations takes a long time. Done right, molecular docking is an iterative process that requires simulation, biological verification, and then further refinement. Shortening the iteration time is important to advancing the research.
The objective of Dr. Khanna’s work was to simulate the docking of 1 million compounds to one target protein. After a simulation was complete, the protein was produced in the lab, and compounds were then tested with nuclear magnetic resonance spectroscopy.
“It’s a target (protein) that’s been implicated in ALS,” the energetic Dr. Khanna told EnterpriseTech. “The idea is that the particular protein was very interesting, people who modulated it in different ways found some significant improvement in the ALS models they have with (lab) mice. The closer we can link biology to what we’re seeing as a target, the better chance of actually getting to a real therapeutic.”
“Modulating,” Dr. Khanna explained, is disrupting two proteins interacting in a way that is associated with ALS, a disease that currently afflicts about 20,000 Americans and for which there is no cure. “We’re trying to disrupt them, to release them to do their normal jobs,” she said.
She said CycleCloud plays a central role in running Schrödinger Glide simulations. Without Google Cloud PVMs, simulations would take too long and model sizes would be too small to generate meaningful results. Without CycleCloud, the management of 5,000 PVM nodes would not be possible.
CycleCloud provides a web-based GUI, a command line interface and APIs to define cloud-based clusters. It auto-scales clusters by instance types, maximum cluster size and costing parameters, deploying systems of of up to 156,000 cores while validating each piece of the infrastructure. Additionally, it syncs in-house data repositories with cloud locations in a policy / job driven fashion, to lower costs.
It should be noted that the use of Google Cloud’s PVMs, while helping to hold down the cost of running simulations to $200, contribute an additional degree of complexity to Dr. Khanna’s project work. Preemptible compute capacity offers the advantage of a consistent price not subject to dynamic demand pricing, as are other public cloud instances. PVMs are assigned to a job for a finite period of time but – here’s the rub – they can be revoked at any moment. While Dr. Khanna’s workflow was ideal for leveraging PVMs, since it consists of small, short-running jobs, PVMs can disappear at without warning.
In the case of Dr. Khanna’s ALS research work, said Jason Stowe, CEO of Cycle Computing said, “if you’re willing to getting rid of the node, but you’re able to use it during that timeframe at substantially lower cost, that allows you get a lot more computing bang for your buck. CycleCloud automates the process, taking care of nodes that go away, making sure the environment isn’t corrupted, and other technical aspects that we take care of so the user doesn’t have to.”
The simulation process is divided into two parts. The first step uses the Schrödinger LigPrep package, which converts 2D structures to the 3D format used in the next stage. This stage started with 4 GB of input data staged to an NFS filer. The output data was approximately 800KB and was stored on the NFS filer as well. To get the simulation done as efficiently as possible, the workload was split into 300 smaller jobs to assist in scaling the next stage of the workflow. In total, the first stage consumed 1500 core-hours of computation.
The Schrödinger Glide software package performs the second stage of the process, where the actual docking simulation is performed. Each of the 300 sub-jobs consists of four stages, each with an attendant prep stage. The total consumption was approximately 20,000 core-hours using 5,000 cores of n1-highcpu-16 instances. Each instance had 16 virtual cores with 60 gigabytes of RAM. The CycleCloud software dynamically sized the cluster based on the number of jobs in queue and replaced preempted instances.
Dr. Khanna’s research is the early stages of a process that, if successful, could take several years before reaching human clinical trials.
“The faster we can do this, the less time we have to wait for results, so we can go back and test it again and try to figure out what compounds are really binding,” she said, “the faster the process can move along.”
Dr. Khanna said plans are in place to increase the size of the pool of potential compounds, as well as include other proteins in the simulation to look for interactions that would not typically be seen until later in the process. The team will also simulate over the entire surface of the protein instead of just a known-active area unlocking “an amazing amount of power” in the search process, she said.
“That jump between docking to binding to biological testing takes a really long time, but I think we can move forward on that with this cloud computing capacity,” she said. “The mice data that we saw was really exciting…, you could see true significant changes with the mice. I can’t tell you we’ve discovered the greatest thing for ALS, but showing that if we take these small molecules and we can see improvement, even that is so significant.”
The post Drug Developers Use Google Cloud HPC in the Fight against ALS appeared first on HPCwire.
HAARLEM, The Netherlands, Feb. 16, 2017 — Asperitas, cleantech startup from the Amsterdam area, one of the world’s datacentre hotspots, is introducing a unique solution based on a total liquid cooling concept called Immersed Computing.
After 1.5 years of research and development with an ecosystem of partners Asperitas is launching their first market ready solution, the AIC24, at the leading international industry event Data Centre World & Cloud Expo Europe.
The Asperitas AIC24 is at the centre of Immersed Computing. It is a closed system and the first water-cooled oil-immersion system which relies on natural convection for circulation of the dielectric liquid. This results in a fully self-contained and Plug and Play modular system. The AIC24 needs far less infrastructure than any other liquid installation, saving energy and costs on all levels of datacentre operations. The AIC24 is the most sustainable solution available for IT environments today. Ensuring the highest possible efficiency in availability, energy reduction and reuse, while increasing capacity. Greatly improving density, while saving energy at the same time.
The AIC24 is designed to ensure the highest possible continuity for cloud providers. Total immersion ensures no oxygen gets in touch with the IT components, preventing oxidation. Thermal shock is greatly reduced due to the high heat capacity of liquid. The immersed environment only has minor temperature fluctuations, greatly reducing stress by thermal expansion on micro-electronics. These factors eliminate the root cause for most of the physical degradation of micro-electronics over time.
Plug and Play green advanced computing anywhere
The AIC24 is Plug and Play. A single module requires only power, access to a water loop and data connectivity to operate. Combined with its silent workings, these limited requirements enable high flexibility in deployment sites and scenarios for the AIC24.
Two specially designed Convection Drives for forced water and natural ﬂow of oil, are capable of transferring 24 kW of heat from the oil while keeping all the IT components at allowable operating temperatures.
Maximised IT capacity, the Asperitas Universal Cassette can contain multiple physical servers. Each module accommodates 24 AUC’s, as well as 2 Universal Switching Cassettes. This currently adds up to 48 immersed servers and 2 immersed switches.
Immersed Computing is a concept driven by sustainability, efficiency and flexibility and goes far beyond just technology. In many situations, Immersed Computing can save more than 50% of the total energy footprint. By using immersion, 10-45% of IT energy is reduced due to the lack of fans, while other energy consumers like cooling installations can achieve up to 95% energy reduction. It allows for warm water cooling which provides even more energy savings on cooling installations. One more benefit, Immersed Computing enables high temperature heat reuse.
Immersed Computing includes an optimised way of work, highly effective deployment, flexible choice of IT and drastic simplification of datacentre design. Offering great advantages on all levels of any datacentre value chain, Immersed Computing realises maximum results in Cloud, Private and Edge environments.
Asperitas is a cleantech company focused on greening the datacentre industry by introducing immersed computing.
The Asperitas Development partners include University of Leeds, Aircraft Development and Systems Engineering (ADSE), Vienna Scientific Cluster, Super Micro, Schleifenbauer and Brink Industrial. Asperitas is furthermore recognised and supported by the Netherlands Enterprise Agency as a promising new cleantech company.
The post Dutch Startup Offers Immersive Cooling for Cloud, Edge and HPC Datacenter appeared first on HPCwire.
Here at HPCwire, we aim to keep the HPC community apprised of the most relevant and interesting news items that get tweeted throughout the week. The tweets that caught our eye this past week are presented below.
— Chris Mustain (@ChrisMustain) February 16, 2017
— Data Center Systems (@InspurServer) February 16, 2017
— Bilel Hadri (@mnoukhiya) February 13, 2017
— LauraSchulz (@lauraschulz) February 15, 2017
— Wayne State C&IT (@WayneStateCIT) February 13, 2017
— Chris Mustain (@ChrisMustain) February 16, 2017
— Sharon Broude Geva (@SBroudeGeva) February 15, 2017
— Fernanda Foertter (@hpcprogrammer) February 15, 2017
— NCI Australia (@NCInews) February 12, 2017
— Chris Mustain (@ChrisMustain) February 16, 2017
— George Markomanolis (@geomark) February 13, 2017
— NCSAatIllinois (@NCSAatIllinois) February 13, 2017
— Fernanda Foertter (@hpcprogrammer) February 14, 2017
Click here to view the top tweets from last week.
February 16, 2017 − 2017 ASC Student Supercomputer Challenge (ASC17) held its opening ceremony at Zhengzhou University. 230 teams from all over the world will challenge the world’s fastest supercomputer Sunway TaihuLight, artificial intelligence application, Gordon Bell Award nomination application, and compete for 20 places in the finals. Hundred supercomputing experts and team representatives worldwide attended the opening ceremony.
The number of teams registered ASC17 Challenge has reached a new high, is up 31% compare to the last year. The competition platforms and applications have been designed to reflect the leading-edge characteristic: Sunway TaihuLight and the most advanced supercomputer in Henan province (which is in the middle of China) will perform different competition applications. Baidu’s AI application, intelligent driving traffic prediction and a high-resolution global surface wave simulation MASNUM_WAVE, a 2016 Gordon Bell Prize finalist will give the teams the opportunities to challenge the “Super Brain” and the “Big Science “. Meanwhile, ASC17 finals will include 20 teams, instead of the original 16 teams.
Wang Endong, initiator of the ASC challenge, academician of the Chinese Academy of Engineering and Chief Scientist at Inspur, said that with the convergence of HPC, big data and cloud computing, intelligent computing as represented by artificial intelligence will become the most important and significant component for the coming computing industry , and bring new challenges in computing technologies. For two consecutive seasons, ASC Challenge has set AI applications to hope students can understand the deep learning algorithms, and acquire the knowledge relating to big data and cutting-edge computing technologies, thereby grooming inter-disciplinary supercomputing talent for the future.
On the day of the opening ceremony, Henan province’s fastest supercomputer in Zhengzhou University (Zhengzhou City) Supercomputing Center launched and become one of the competition platforms for ASC17. Liu Jiongtian, academician of the Chinese Academy of Engineering , President of Zhengzhou University, pity not to attended the event but believed that this will allow teams worldwide to experience the latest technology such as KNL many core architecture. At the same time, this will also help to accelerate supercomputing applications innovations in Zhengzhou and Henan Province and help to groom supercomputing talent in the region, promote smart city development in Zhengzhou, and support rapid economic development of the regions in middle of China.
Yang Guangwen, director of National Supercomputing Center in Wuxi, said that all the processors used in Sunway TaihuLight are home grown by China, and that it is the world’s first supercomputer to achieve 100 petaflops. Using Sunway TaihuLight as the competition platform, will give each team the opportunity to experience the world’s fastest supercomputer, in order to promote the training of young talents better. At the same time, the international exchanges resulting from ASC17 Challenge will help more people appreciate Chinese capability in independent design in the supercomputing domain.
The organizers of ASC17 Challenge have also arranged a 2-day intensive training camp for the participants, where experts from National Supercomputing Center in Wuxi, Baidu, and Inspur conducted comprehensive and systematic lectures. Topics included the design of a supercomputer system, the KNL, deep learning application optimization solutions and techniques on using Sunway TaihuLight.
The ASC Student Supercomputer Challenge is initiated by China, and supported by experts and institutions worldwide. The competition aims to be the platform to promote exchanges among young supercomputing talent from different countries and regions, as well as to groom young talent. It also aims to be the key driving force in promoting technological and industrial innovations by improving the standards in supercomputing applications and research. ASC Challenge has been held for 6 years. This year the ASC17 Challenge is co-organized by Zhengzhou University, the National Supercomputing Centre in Wuxi , and Inspur.
The post 230 Teams worldwide join ASC17 to challenge AI and TaihuLight appeared first on HPCwire.
FRANKFURT, Germany, Feb. 16 — In a continuous effort to diversify topics at the ISC High Performance conference, the organizers are pleased to announce that two of this year’s presentations in the Distinguished Talk series will focus on data analytics in manufacturing and scientific applications.
The ISC 2017 Distinguished Talk series will offer five talks, spread over Tuesday, June 20 and Wednesday, June 21. The five-day technical program sessions will be held from Sunday, June 19 through Thursday, June 22. Over 3,000 attendees are expected at this year’s conference.
On Tuesday afternoon at 1:45 PM, cybernetics expert, Dr. Sabine Jeschke, who heads the Cybernetics Lab at the RWTH Aachen University, will deliver a talk about “Robots in Crowds – Robots and Clouds.” Jeschke’s presentation will be followed by one from physicist Kerstin Tackmann, from the German Electron Synchrotron (DESY) research center, who will discuss big data and machine learning techniques used for the ATLAS experiment at the Large Hadron Collider.
Jeschke’s research expertise lies in distributed artificial intelligence, robotics, automation, and virtual worlds, among other areas. In her talk, she will discuss new trends in mobile robotic systems, with special emphasis on the relationship between AI, cognitive systems and robotics. She will also present new paradigms for robotic platforms, for example, humanoids, robots on wheels, animal-like robots and industrial robots, with respect to their application areas and their physical realization.
In her abstract, she specifies the need to consider big data and its analytics as a critical aspect of robotics. At the same time, Jeschke also identifies how high performance computing will need to be applied to robotic systems. She will be sharing all of these topics and more with the ISC 2017 audience.
Tackmann, who will be speaking immediately after Jeschke, will give an overview of the ATLAS experiment, with particular attention to its enormous flow of data generated by the ATLAS detectors. She will present some of the experiment’s results, and give an overview of the technologies employed to store, search and retrieve experimental data and metadata, including the use of analytics tools and machine learning techniques.
The ATLAS experiment is one of the two multi-purpose experiments at the Large Hadron Collider (LHC) at the European Organization for Nuclear Research (CERN) in Geneva. Since ATLAS began collecting data in 2009, it has been used to understand the processes described by the Standard Model of elementary particle physics, identify the Higgs boson, and search for particles and phenomena beyond the Standard Model.
Tackmann, who earned her PhD in Physics (experimental particle physics) from the University of Berkeley in California, has been involved in the ATLAS Experiment since 2011. She leads the Helmholtz Young Investigators Group Higgs Physics with Photons, which is a part of the ATLAS group at the DESY center.
The post Data Analytics Gets the Spotlight in Distinguished Talk Series at ISC 2017 appeared first on HPCwire.
SPRING, Tex., Feb. 16 — ExxonMobil, working with the National Center for Supercomputing Applications (NCSA), has achieved a major breakthrough with proprietary software using more than four times the previous number of processors used on complex oil and gas reservoir simulation models to improve exploration and production results.
The breakthrough in parallel simulation used 716,800 processors, the equivalent of harnessing the power of 22,400 computers with 32 processors per computer. ExxonMobil geoscientists and engineers can now make better investment decisions by more efficiently predicting reservoir performance under geological uncertainty to assess a higher volume of alternative development plans in less time.
The record run resulted in data output thousands of times faster than typical oil and gas industry reservoir simulation. It was the largest number of processor counts reported by the oil and gas industry, and one of the largest simulations reported by industry in engineering disciplines such as aerospace and manufacturing.
“This breakthrough has unlocked new potential for ExxonMobil’s geoscientists and engineers to make more informed and timely decisions on the development and management of oil and gas reservoirs,” said Tom Schuessler, president of ExxonMobil Upstream Research Company. “As our industry looks for cost-effective and environmentally responsible ways to find and develop oil and gas fields, we rely on this type of technology to model the complex processes that govern the flow of oil, water and gas in various reservoirs.”
The major breakthrough in parallel simulation results in dramatic reductions in the amount of time previously taken to study oil and gas reservoirs. Reservoir simulation studies are used to guide decisions such as well placement, the design of facilities and development of operational strategies to minimize financial and environmental risk. To model complex processes accurately for the flow of oil, water, and natural gas in the reservoir, simulation software must solve a number of complex equations. Current reservoir management practices in the oil and gas industry are often hampered by the slow speed of reservoir simulation.
ExxonMobil’s scientists worked closely with the NCSA to benchmark a series of multi-million to billion cell models on NCSA’s Blue Waters supercomputer. This new reservoir simulation capability efficiently uses hundreds of thousands of processors simultaneously and will have dramatic impact on reservoir management workflows.
“NCSA’s Blue Waters sustained petascale system, which has benefited the open science community so tremendously, is also helping industry break through barriers in massively parallel computing,” said Bill Gropp, NCSA’s acting director. “NCSA is thrilled to have worked closely with ExxonMobil to achieve the kind of sustained performance that is so critical in advancing science and engineering.”
ExxonMobil’s collaboration with the NCSA required careful planning and optimization of all aspects of the reservoir simulator from input/output to improving communications across hundreds of thousands of processors. These efforts have delivered strong scalability on several processor counts ranging from more than 1,000 to nearly 717,000, the latter being the full capacity of NCSA’s Cray XE6 system.
ExxonMobil, the largest publicly traded international oil and gas company, uses technology and innovation to help meet the world’s growing energy needs. We hold an industry-leading inventory of resources and are one of the largest integrated refiners, marketers of petroleum products and chemical manufacturers. For more information, visit www.exxonmobil.com or follow us on Twitter www.twitter.com/exxonmobil.
About the National Center for Supercomputing Applications
The National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign provides supercomputing and advanced digital resources for the nation’s science enterprise. At NCSA, University of Illinois faculty, staff, students, and collaborators from around the globe use advanced digital resources to address research grand challenges for the benefit of science and society. NCSA has been advancing one third of the Fortune 50 for more than 30 years by bringing industry, researchers and students together to solve grand challenges at rapid speed and scale. The Blue Waters Project is supported by the National Science Foundation through awards ACI-0725070 and ACI-1238993.
The post ExxonMobil and NCSA Achieve Simulation Breakthrough appeared first on HPCwire.
Feb. 16 — The SC Test of Time Award (ToTA) Committee is soliciting nominations for this year’s Test-of-Time Award to be given at the SC17 Conference in November 2017 in Denver, CO. The ToTA Award recognizes an outstanding paper that has deeply influenced the HPC discipline. It is a mark of historical impact and recognition that the paper has changed HPC trends.
The award is also an incentive for researchers and students to send their best work to the SC Conference and a tool to understand why and how results last in the HPC discipline. Papers that appeared in the SC Conference Series are considered for this award. A paper must be at least 10 years old, from the twenty conference years 1988 to 2007, inclusive.
- Submissions Close: April 1, 2017
- Web Submissions: https://submissions.supercomputing.org/
- Email Contact: firstname.lastname@example.org
Francis Alexander, a physicist with extensive management and leadership experience in computational science research, has been named Deputy Director of the Computational Science Initiative at the U.S. Department of Energy’s (DOE) Brookhaven National Laboratory. Alexander comes to Brookhaven Lab from DOE’s Los Alamos National Laboratory, where he was the acting division leader of the Computer, Computational, and Statistical Sciences (CCS) Division.
In his new role as deputy director, Alexander will work with CSI Director Kerstin Kleese van Dam to expand CSI’s research portfolio and realize its potential in data-driven discovery. He will serve as the primary liaison to national security agencies, as well as develop strategic partnerships with other national laboratories, universities, and research institutions. His current research interest is the intersection of machine learning and physics (and other domain sciences).Francis Alexander
“I was drawn to Brookhaven by the exciting opportunity to strengthen the ties between computational science and the significant experimental facilities—the Relativistic Heavy Ion Collider, the National Synchrotron Light Source II, and the Center for Functional Nanomaterials [all DOE Office of Science User Facilities],” said Alexander. “The challenge of getting the most out of high-throughput and data-rich science experiments is extremely exciting to me. I very much look forward to working with the talented individuals at Brookhaven on a variety of projects, and am grateful for the opportunity to be part of such a respected institution.”
During his more than 20 years at Los Alamos, he held several leadership roles, including as leader of the CCS Division’s Information Sciences Group and leader of the Information Science and Technology Institute. Alexander first joined Los Alamos in 1991 as a postdoctoral researcher at the Center for Nonlinear Studies. He returned to Los Alamos in 1998 after doing postdoctoral work at the Institute for Scientific Computing Research at DOE’s Lawrence Livermore National Laboratory and serving as a research assistant professor at Boston University’s Center for Computational Science.
Link to full article on the Brookhaven website: https://www.bnl.gov/newsroom/news.php?a=112057
The post Alexander Named Dep. Dir. of Brookhaven Computational Initiative appeared first on HPCwire.
Ever wonder what the inside of a machine learning model looks like? Today Graphcore released fascinating images that show how the computational graph concept maps to a new graph processor and graph programming framework it’s creating.Alexnet graph (Image Source: Graphcore)
Graphcore is a UK-based startup that’s building a new processor, called the Intelligent Processing Unit (IPU), which is designed specifically to run machine learning workloads. Graphcore says systems that have its IPU processors, which will plug into traditional X86 servers via PCIe interfaces, will have more than 100x the memory bandwidth than scalar CPUs, and will outperform both CPUs and vector GPUs for emerging machine learning workloads for both training and scoring stages.
The company is also developing a software framework called Poplar that will abstract the machine learning application development process from the underlying IPU-based hardware. Poplar was written in C++ and will be able to take applications written in other frameworks, like TensorFlow and MXNet, and compile them into optimized code to execute on IPU-boosted hardware. It will feature C++ and Python interfaces.
All modern machine learning frameworks like TensorFlow, MxNet, Caffe, Theano, and Torch use the concept of a computational graph as an abstraction, says Graphcore’s Matt Fyles, who wrote today’s blog post.
“The graph compiler builds up an intermediate representation of the computational graph to be scheduled and deployed across one or many IPU devices,” Fyles writes. “The compiler can display this computational graph, so an application written at the level of a machine learning framework reveals an image of the computational graph which runs on the IPU.”Resnet-50 graph execution plan (Image Source: Graphcore)
This is where the images come from. The image at the top of the page shows a graph based on the AlexNet architecture, which is a powerful deep neural network used in image classification workloads among others.
“Our Poplar graph compiler has converted a description of the network into a computational graph of 18.7 million vertices and 115.8 million edges,” Fyles writes. “This graph represents AlexNet as a highly-parallel execution plan for the IPU. The vertices of the graph represent computation processes and the edges represent communication between processes. The layers in the graph are labelled with the corresponding layers from the high level description of the network. The clearly visible clustering is the result of intensive communication between processes in each layer of the network, with lighter communication between layers.”
Graphcore also generated images of the graph execution plan a deep neural network built on Resnet, which Microsoft Research released in 2015. Graphcore was used to compile a 50-layer deep neural network composed of a graph execution plan with 3.22 million vertices and 6.21 million edges.
One of the unique aspects of the ResNet architecture is that it allows deep networks to be assembled from repeated section. Graphcore says its IPU only needs to define these sections once, and then can call them repeatedly, using the same code but with different “network weight data.”The graph computational execution plan for LIGO data (Image Source: Graphcore)
“Deep networks of this style are executed very efficiently as the whole model can be permanently hosted on an IPU, escaping the external memory bottleneck which limits GPU performance,” the company says.
Finally, Graphcore shared a computational graph execution plan that involved time-series data gathered from astrophysicists working at the University of Illinois. The researchers used the MXnet DNN framework to analyze data collected from the LIGO gravitational wave detector, which looks for gravitational abnormalities caused by the presence of black holes. The image that Graphcore shared is the “full forward and backward pass of the neural network trained on the LIGO data to be used for signal analysis,” the company says.
“These images are striking because they look so much like a human brain scan once the complexity of the connections is revealed,” Fyles writes, “and they are incredibly beautiful too.”
Graphcore emerged from stealth mode last October, when it announced a $30 million Series A round to help finance development of products. Its machine learning (ML) and deep learning acceleration solutions include a PCIe card that plugs directly into a server’s bus.
The “when will clouds be ready for HPC” question has ebbed and flowed for years. It seems clear that for at least some workloads and on some clouds, the answer is now. HPC cloud specialist Nimbix, for example, focuses on providing fast interconnect, large memory, and heterogeneous architecture specifically tailored for HPC. The goliath public clouds have likewise steadily incorporated needed technology and (perhaps less decisively) pricing options.
A new study posted on arXiv.org last week – Comparative benchmarking of cloud computing vendors with High Performance Linpack – authored by Exabyte.io, an admittedly biased source, reports the answer is an unambiguous yes to the question of whether popular clouds can accommodate HPC and further examines some of the differences between a few of the major players.
“For high performance computing (HPC) workloads that traditionally required large and cost-intensive hardware procurement, the feasibility and advantages of cloud computing are still debated. In particular, it is often questioned whether software applications that require distributed memory can be efficiently run on ”commodity” compute infrastructure publicly available from cloud computing vendors,” write the authors, Mohammad Mohammadi, Timur Bazhirov of Exabyte.io.
“We benchmarked the performance of the best available computing hardware from public cloud providers with high performance Linpack. We optimized the benchmark for each computing environment and evaluated the relative performance for distributed memory calculations. We found Microsoft Azure to deliver the best results, and demonstrated that the performance per single computing core on public cloud to be comparable to modern traditional supercomputing systems.
“Based on our findings we suggest that the concept of high performance computing in the cloud is ready for a widespread adoption and can provide a viable and cost-efficient alternative to capital-intensive on- premises hardware deployments.”
Exabyte.io is a young company building a cloud-based environment to assist organizations with materials design – hence it has a horse in the race. Company marketing info on its website states, “Exabyte.io powers the adoption of high-performance cloud computing for design and discovery of advanced materials, devices and chemicals from nanoscale. We combine high fidelity simulation techniques, large-scale data analytics and machine learning tools into a hosted environment available for public, private and hybrid cloud deployments.”
Leaving its interest aside the study is interesting. Here’s a list of the cloud offerings evaluated:
- Amazon Web Services (AWS)
- Microsoft Azure
- IBM Softlayer
- National Energy Research Scientific Computing Center (NERSC)
The benchmarking was done using the High Performance Linpack (HPL) program, which solves a random system of linear equations, represented by a dense matrix, in double precision (64 bits) arithmetic on distributed-memory computers. “It does so through a two-dimensional block- cyclic data distribution, and right-looking variant of the LU factorization with row partial pivoting.” It is a portable and freely available software package.
Three different AWS scenarios were tested including– hyper-threaded, non-hyper-threaded, and non-hyper-threaded with placement groups. On Azure, three different instance types were used, F-series, A-series, and H-series. Compute1-60 instances were used on Rackspace. The benchmark was also run on NERSC Edison supercomputer with hyper-threading enabled. Edison, of course, is a Cray XC30, with a peak performance of 2.57 PFLOPS, 133,824 compute cores, 357 terabytes of memory, and 7.56 petabytes of disk, holding number 60 rank on the top500. Specific configurations shown below.
In many cases the performances were quite similar but each also had strengths and weaknesses. For example, network saturation at scale and slower processor clock speeds affected IBM Softlayer’s performance according to the study. The authors also noted: “AWS and Rackspace show a significant degree of parallel performance degradation, such that at 32 nodes the measured performance is about one-half of the peak value.”
The brief paper is best read in full for the details. The performance data for each of the clouds is presented. Below is a summary figure of cloud performances.Figure 1: Speedup ratios (the ratios of maximum speedup Rmax to peak speedup Rpeak) against the number of nodes for all benchmarked cases. Speedup ratio for 1,2,4,8,16 and 32 nodes are investigated and given by points. Lines are drawn to guide the eye. The legend is as follows: AWS – Amazon Web Services in the default hyper-threaded regime; AWS-NHT – same, with hyperthreading disabled; AWS-NHT- PG – same, with placement group option enabled; AZ – Mi- crosoft Azure standard F16 instances; AZ-IB-A – same provider, A9 instances; AZ-IB-H – same provider, H16 instances; RS – Rackspace compute1-60 instances; SL – IBM/Softlayer virtual servers; NERSC – Edison computing facility of the National Energy Research Scientific Computing Center.
On balance, argue the authors, “Our results demonstrate that the current generation of publicly available cloud computing systems are capable of delivering comparable, if not better, performance than the top-tier traditional high performance computing systems. This fact confirms that cloud computing is already a viable and cost-effective alternative to traditional cost- intensive supercomputing procurement.”
Here is a link to the paper on arXiv.org: https://arxiv.org/pdf/1702.02968.pdf
Feb. 15 — The National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign is pleased to announce the Blue Waters Weekly Webinar Series. The series will provide the research and education communities with a variety of opportunities to learn about methods, tools, and resources available to advance their computational and data analytics skills, with an emphasis on scaling to petascale and beyond.
Webinars will generally occur every Wednesday, with a few exceptions to avoid conflicts with major HPC conferences and events. All sessions will be free and open to the to everyone who registers. Registered participants will be able to pose questions using NCSA’s Blue Waters Slack environment. Registration is required for access to YouTube Live broadcasts. Webinars will begin at 10 a.m. Central Time (UTC-6).
Each webinar will be led by a developer or an expert on the topic. The first visualization webinar, “Introduction to Data Visualization” hosted by Vetria Byrd, Purdue University, will take place on March 1, 2017; the first workflows webinar, “Overview of Scientific Workflows” will be hosted by Scott Callaghan, University of Southern California, on March 8, 2017; and the first petascale application improvement discovery webinar, “Getting I/O Done with Parallel HDF5 on Blue Waters” hosted by Gerd Heber, HDF Group, will take place March 29, 2017. The list of webinar tracks as well as specific sessions will be refined and expanded over time.
For more information about the webinar series, including registration, abstracts, speakers, as well as links to Youtube recordings, please visit the Blue Waters webinar series webpage.