HPC Wire

Subscribe to HPC Wire feed
Since 1987 - Covering the Fastest Computers in the World and the People Who Run Them
Updated: 6 hours 10 min ago

IDEAS Scales up to Foster Better Software Development for Exascale

Mon, 09/25/2017 - 13:05

Sept. 25, 2017 — Scalability of scientific applications is a major focus of the Department of Energy’s Exascale Computing Project (ECP)and in that vein, a project known as IDEAS-ECP, or Interoperable Design of Extreme-scale Application Software, is also being scaled up to deliver insight on software development to the research community.

One channel is a series of webinars on best practices for HPC software developers which began in June. IDEAS-ECP members will also present a half-day tutorial on Better Scientific Software and host a birds-of-a-feather (BOF) session at the SC17 conference in November.

The IDEAS-ECP webinar series debuted with a talk on “Python in HPC” by Rollin Thomas of the National Energy Research Scientific Computing Center (NERSC), at Lawrence Berkeley National Laboratory (Berkeley Lab) in collaboration with William Scullin of Argonne National Laboratory and Matt Belhorn from Oak Ridge National Laboratory. More than 200 people registered for the event and 158 attended.

“At the meeting of ECP principal investigators in January, we noticed from the presentations that a number of the projects use Python, but getting good performance out of Python on highly parallel machines is difficult so we chose that as the topic for our first webinar,” said Berkeley Lab’s Osni Marques, who is the IDEAS-ECP lead for developing webinar content. “Our ultimate goal is to provide strategies for improving productivity and the development of applications, libraries and software.”

The webinars and SC17 tutorial are part of the IDEAS-ECP Outreach effort led by David Bernholdt of Oak Ridge National Laboratory. Bernholdt said that the original IDEAS project grew out of longstanding recognition in the HPC science community that fostering better software development would lead to more productive and sustainable software.

“When ECP was ramping up, people started talking about the need for improved practices as they would need to take a quantum leap to get their software and applications ready for exascale,” Bernholdt said. “So we persuaded them that IDEAS could be valuable for ECP, as well as have even broader impacts.”

Originally funded by DOE’s offices of Advanced Scientific Computing Research and Biological and Environmental Research, IDEAS launched a biweekly series of webinars in 2016, which was a demanding schedule. As part of ECP, the team rebooted the webinars to a more sustainable monthly schedule.

The webinars are announced via an 800-member mailing list and registration is required (and free). Each webinar is also archived for viewing by anyone.

Bernholdt said that the webinars are collaborative, like many of DOE’s research projects. In the case of the session on Python, the three participating labs each handles Python a bit differently and there are many nuances. By being aware of this, attendees will be better informed if they run into a snag.

The team peruses the literature and speaks with researchers are meetings to glean topics of interest for the sessions.

The tutorials are also evolving to meet the needs of researchers. The SC17 tutorial, to be led by Bernholdt, Anshu Dubey of Argonne National Laboratory and Michael A. Heroux and Alicia Klinvex of Sandia National Laboratories, is an evolved version of a tutorial given at the February 2017 SIAM Conference on Computational Science and Engineering. A version of it will also be presented at the PRACE (Partnership for Advanced Computing in Europe) IT4Innovations meeting Oct. 4-5 in Prague.

Bernholdt said the project fills an important niche as software development for both science and HPC are in a different context from mainstream computing and IDEAS tailors its information to the scientific HPC community.

The overall IDEAS-ECP project is led by Mike Heroux of Sandia National Laboratories and Lois Curfman McInnes of Argonne. Ashley Barker of the Oak Ridge Leadership Computing Center is the ECP manager for the project.

Source: IDEAS-ECP Project

The post IDEAS Scales up to Foster Better Software Development for Exascale appeared first on HPCwire.

China’s TianHe-2A will Use Proprietary Accelerator and Boast 94 Petaflops Peak

Mon, 09/25/2017 - 11:35

The details of China’s upgrade to TianHe-2 (MilkyWay-2) – now TianHe-2A – were revealed last week at the Third International High Performance Computing Forum (IHPCF2017) in China. The TianHe-2A will use a proprietary accelerator (Matrix-2000), a proprietary network, and provide support for OpenMP and OpenCL. The upgrade is about 25% complete and expected to be fully functional by November 2017 according to a report by Jack Dongarra who attended the meeting and has written a fairly detailed summary.

“The most significant enhancement to the system is the upgrade to the TianHe-2 nodes; the old Intel Xeon Phi Knights Corner (KNC) accelerators will be replaced with a proprietary accelerator called the Matrix-2000. In addition, the network has been enhanced, the memory increased, and the number of cabinets expanded. The completed system, when fully integrated with 4,981,760 cores and 3.4 PB of primary memory, will have a theoretical peak performance of 94.97 petaflops, which is roughly double the performance of the existing TianHe-2 system. NUDT also developed the heterogeneous programming environment for the Matrix-20002 with support for OpenMP and OpenCL,” writes Dongarra (Report on The Tianhe-2A System).

Dongarra told HPCwire, “The Matrix-2000 was designed by the NUDT people. They claim it was fabbed in China. They did not want to have the manufacturing process disclosed.”

The TianHe-2 vaulted China atop the Top500 list in June of 2013 where it stayed until June 2016 when China’s Sunway TaihuLight topped the list with a LINPACK of 93 petaflops. The Sunway was China’s first supercomputer to use homegrown processors (see HPCwire article, China Debuts 93-Petaflops ‘Sunway’ with Homegrown Processors). China has held the top two positions ever since.

“The TianHe-2A is one of the three prototype systems for Exascale in China. The others are the TaiHu Light in Wuxi and the Sugon Machine based on X86 architecture,” said Dongarra.

Each of the 17,792 TH-2A compute nodes will use two of Intel’s Ivy Bridge CPUs (12 cores clocked at 2.2 GHz) and two of the new NUDT-designed Matrix-2000 accelerators (128 cores clocked at 1.2 GHz). This combination results in a compute system with 35,584 Ivy Bridge CPUs, 35,584 Matrix-2000 accelerators, reports Dongarra.

Introduction of the China-developed Matrix-2000 accelerator showcases China’s continued progress towards technology independence.

As described by Dongarra, each Matrix- 2000 has 128 compute cores clocked at 1.2 GHz, achieving 2.4576 teraflops of peak performance. The peak power dissipation is about 240 Watts and the dimensions are 66mm by 66mm. The accelerator itself is configured with four supernodes (SNs) that are connected through a scalable on-chip communication network. Each SN has 32 compute cores and complies with the cache coherence. The accelerator supports eight DDR4-2400 channels and is integrated with a ×16 PCI Express 3.0 endpoint port. The compute core is an in-order 8~12 stage reduced instruction set computer (RISC) pipeline extended with a 256-bit vector instruction set architecture (ISA). Two 256-bit vector functional units (VFUs) are integrated into each compute core, resulting in 16 double precision FLOPs per cycle. Thus, the peak performance of the Matrix-2000 can be calculated as: 4 SNs × 32 cores × 16 FLOPs per cycle × 1.2 GHz clock = 2.4576 Tflop/s.

As shown below, a TH-2A compute blade is composed of two parts: the CPM (left) and the APU (middle). The CPM integrates four Ivy Bridge CPUs, and the APU integrates four Matrix- 2000 accelerators. Each compute blade contains two heterogeneous compute nodes.

The TH-2A upgrades required the design and implementation of a heterogeneous computing software stack for the Matrix-2000 accelerator writes Dongarra. This software stack provides a compiling and execution environment for OpenMP 4.5 and OpenCL 1.2. The runtime software stack is illustrated in figure below.

“In kernel mode, there is a light-weight Linux-based operating system (OS), with the accelerator device driver embedded within it, running on the Matrix-2000 that provides device resource management and data communication with the host CPU through the PCI Express connection. The OS manages the computing cores through an elaborately designed thread pool mechanism, which enables task scheduling with low overhead and high efficiency.”

China’s rapid advance in supercomputing and its accelerated effort to build its own technology ecosystem has been a hot topic for some time. Dongarra captures the dynamics and technology achievement neatly his summary:

“In February 2015, the US Department of Commerce prevented some Chinese research groups from receiving Intel technology. The department cited concerns about nuclear research being performed on compute systems equipped with Intel components. The research centers affected include: NSCC-G, site of Tianhe-2; the National SC Center Tianjin, site of Tianhe-1A; the NUDT, developer; and the National SC Center Changsha, location of NUDT.

“At the 2015 International Supercomputing Conference (ISC) in Frankfurt, Yutong Lu, the director of the NSCC-G, described the TianHe-2A system (Figure 10). Most of what was shown in her slide in 2015 has been realized in the Matrix-2000 accelerator. They had hoped to replace the Intel KNC accelerator in their TH-2 with the Matrix-2000 by 2016. However, because of delays that has not happened until very recently.

“After the embargo on Intel components by the US Department of Commerce, it has taken NUDT about two years to design and implement a replacement for the Intel Xeon Phi KNC accelerator. Their replacement is about the same level of performance as the current generation of Intel’s Xeon Phi, known as Knights Landing (KNL). Equaling the performance of the state-of-the-art KNL chip and developing the accompanying software stack in such a short time is an impressive result.”

Last week’s IHPCF2017 meeting was sponsored by the Ministry of Science and Technology (MOST) and the National Science Foundation of China (NSFC), organized by NUDT, and hosted by the National Supercomputer Center in Guangzhou (NSCC-GZ), was held on September 18–20, 2017 in Guangzhou, China,. There were roughly 160 attendees, reported Dongarra.

Given this latest announcement, and speculation of what may be happening with the TaihuLight system, the SC17 conference in November should spark interesting discussion. Clearly the international jostling for sway in the race to pre- and full exascale machines continues to heat up.  Just last week, the U.S. Exascale Computing Project announced the retirement of Paul Messina as director and appointment of Doug Kothe as new director.

Expectations are high that Summit (Oak Ridge National Laboratory) will be at or near the top of the Top500 list. Likewise, there’s been speculation that Sierra (Lawrence Livermore National Laboratory) might be ready by then. It’s been awhile since the U.S. was top dog in the Top500. In any case, it will be interesting to see the next batch on LINPACK scores and what shuffling of the Top500 emerges.

Link to Dongarra’s excellent summary paper: https://www.dropbox.com/s/0jyh5qlgok73t1f/TH-2A-report.pdf?dl=0

The post China’s TianHe-2A will Use Proprietary Accelerator and Boast 94 Petaflops Peak appeared first on HPCwire.

HPE’s Spaceborne Computer Powers Up in Space and Achieves One TeraFLOP

Mon, 09/25/2017 - 09:39

Sept. 25, 2017 — It’s been one month since the SpaceX CRS-12 rocket launched from the Kennedy Space Center in Florida on August 14, sending HPE’s Spaceborne Computer to the International Space Station (ISS).

It has been a long month for us, HPE’s Spaceborne Computer engineers, as we waited for an update from NASA on when we would be scheduled to power on the system in space. With much of my team and one of NASA’s main command centers based in Houston, we have all been tied up with the aftermath of Hurricane Harvey. We finally received confirmation that all systems are a go for Thursday, September 14.

The day arrives and my colleague, Dave Petersen, and I are anxious. It’s 5:00 A.M. ET and we log in to NASA’s portal to set up their viewing windows. After a few minutes, we’re watching astronauts floating around the ISS. “Look! It’s our babies!” I say as I watch the astronauts bolt the Spaceborne Computer into its designated server rack located in the ceiling. Although the system weighs 124 pounds on Earth, it’s weightless in space so the bolts are tiny and merely used to hold it in place.

With the system securely in place, the astronaut conducts the physical installation that connects the water cooling and Ethernet cables. This is essential to do before powering on the supercomputer to ensure there isn’t a water leak. With no leak in sight, it’s time to plug the system into the two inverters. This is a huge moment of anticipation for us, as we anxiously await and watch from Earth. “Will the system power up? Will the inverters blow? Will the circuit breaker trip?” These are just a few thoughts racing through my head.

The first of two green lights appears on the first inverter. Then another single green light appears on the second inverter. A moment of panic. There are supposed to be two green lights on each inverter. The astronaut turns on the 48 volt and 110 volt AC. A long few seconds pass and there are two green lights on each inverter. “We did it!” I thought to myself, “The Spaceborne Computer is powered up!”

The Spaceborne Computer fully installed in the International Space Station

Now that all the hardware installations have been successfully completed, it’s time for me to take over and begin the necessary software downloads and system housekeeping. All goes as planned. The system in space is running identically to its twin on Earth.

Finally, the moment we’ve all been waiting for. I am ready to launch the multi-node High Performance LINPACK (HPL) benchmark test. This test will determine how many multiplications per second the system can produce. If all goes well, we’ll run the High Performance Conjugate Gradients (HPCG), which was designed to complement the HPL benchmark. These collective tests are the basis of the Top500’s ranking of supercomputers on Earth from fastest to slowest.

We launch the HPL run and wait patiently. We’re hoping for the highest number (most multiplications) they can get, so I wouldn’t dare check the progress for about 15 minutes or so for fear of slowing anything down. The HPL run is finally complete and not only are the results valid, but the Spaceborne Computer achieves over one trillion calculations per second, also known as one teraFLOP, which is up to 30 times faster than a laptop. We don’t hesitate to begin the HPCG. It verifies the results.

We’re ecstatic. This is exactly what they’ve been hoping for. HPE’s Spaceborne Computer is the first high performance commercial off-the-shelf (COTS) computer system to run one teraFLOP at the International Space Station.

Original article: https://news.hpe.com/hpes-spaceborne-computer-successfully-powers-up-in-space-and-achieves-one-teraflop/

Source: Mark Fernandez, HPE

The post HPE’s Spaceborne Computer Powers Up in Space and Achieves One TeraFLOP appeared first on HPCwire.

SC17 Preview: Invited Talk Lineup Includes Gordon Bell, Paul Messina and Many Others

Mon, 09/25/2017 - 09:27

With the addition of esteemed supercomputing pioneer Gordon Bell to its invited talk lineup, SC17 now boasts a total of 12 invited talks on its agenda.

As SC explains, “Invited Talks are a premier component of the SC conference program, complementing the presentations of the regular technical papers program. Invited talks feature leaders high performance computing, networking, analysis and storage who present innovative technical contributions and their applications. At all invited talks, you should expect to hear about pioneering technical achievements, the latest innovations in supercomputing, networking and data analytics and broad efforts to answer some of most complex questions of our time.”

SC’s communications team has provided the entire listing along with the days and times. All talks will be in the Mile High Ballroom at the Colorado Convention Center in Denver, Colorado. Here’s the rundown as they will appear on the agenda.

Tuesday, Nov. 14, 10:30am-12pm

Dr. Paul Messina, Argonne National Laboratory: “The U.S. DOE Exascale Computing Project” – Goals and Challenges” (a link to more information will be forthcoming)

Theresa L. Windus, Iowa State University & Ames Laboratory: “Taking the Nanoscale to the Exascale” (a link to more information will be forthcoming)

Tuesday, Nov. 14, 3:30-5pm

Gordon Bell: “Thirty Years of the Gordon Bell Prize”

Erich Strohmaier, Jack Dongarra, Horst Simon, Martin Meuer: “The TOP500 List – Past, Present and Future”

Wednesday, Nov. 15, 10:30-12pm

Judy Qiu, Indiana University: “Harp:DAAL: A Next Generation Platform for High Performance Machine Learning on HPC-Cloud”

Hans-Joachim Bungartz, Technical University of Munich (TUM): “Citius, Altius, Fortius!” (a link to more information will be forthcoming)

Wednesday, Nov. 15, 3:30-5pm

Dr. Pradeep Dubey, Intel Labs: “Artificial Intelligence and The Virtuous Cycle of Compute”

Dr. Alexandre Bayen, UC Berkeley: “Inference and Control in Routing Games”

Thursday: Nov. 16, 8:30-10am

Dr. Haohuan Fu, National Supecomputing Center-Wuxi,China: “Lessons on Integrating and Utilizing 10 Million Cores: Experience of Sunway TaihuLight”

Dr. Rommie E. Amaro, UC San Diego: “Molecular Simulation at the Mesoscale”

Thursday, Nov. 16, 10:30am-12pm

Dr. Katsuki Fujisawa, Kyushu University: “Cyber-physical System and Industrial Applications of Large-Scale Graph Analysis and Optimization Problem”

Dr. Catherine Graves, Hewlett Packard Labs: “Computing with Physics: Analog Computation and Neural Network Classification with a Dot Product Engine”

The post SC17 Preview: Invited Talk Lineup Includes Gordon Bell, Paul Messina and Many Others appeared first on HPCwire.

Computing Pioneer Gordon Bell to Speak at High Performance Computing Conference

Mon, 09/25/2017 - 07:51

DENVER, Sept. 25, 2017 — Everyone knows that computers have grown exponentially faster and more powerful. But only a few know exactly why and how – and Gordon Bell has been a sage among them since the 1960s as the former head of R and D at Digital Equipment Corp. (DEC), establishing the minicomputer industry with the PDP and Vax computers.

A globally recognized pioneer in the supercomputing world, Bell will be sharing his latest reflections and insights with his fellow scientists, engineers and researchers at SC17 from November 12-17, 2017 in Denver, Colo.

Bell will highlight the work of the winners of the ACM Gordon Bell Prize from the past 30 years. Presented by the Association for Computing Machinery (ACM), the recipients’ achievements have chronicled the important innovations and transitions of high performance computing (HPC), including the rise of parallel computing, a computing architecture that breaks down problems into smaller ones that may be solved simultaneously.

This body of work also represents innovations in large-scale data analytics, data-gathering hardware and other improvements in computing state of the art. According to Bell, the prize has recognized every gain in parallelism from widely distributed workstations to China’s Sunway TaihuLight 10-million core system in 2016.

“We are honored to have the legendary Gordon Bell speak at SC17,” said Conference Chair Bernd Mohr, from Germany’s Jülich Supercomputing Centre. “The prize he established has helped foster the rapid adoption of new paradigms, given recognition for specialized hardware, as well as rewarded the winners’ tremendous efforts and creativity – especially in maximizing the application of the ever-increasing capabilities of parallel computing systems. It has been a beacon for discovery and making the ‘might be possible’ an actual reality.”

Bell is a researcher emeritus (ret.) at Microsoft Research. His interests include extreme lifelogging, preserving everything in cyberspace and Bell’s Law describing the birth, evolution and death of computer classes. He is a founding trustee of the Computer History Museum, Mountain View, CA.

Since 1965, Bell has evangelized scalable systems, starting with his interest in multiprocessors. In 1987, he led the cross-agency group as head of NSF’s Computing Directorate that made “the plan” for the National Research and Education Network (NREN), which came to be known as the Internet. Bell is a financial supporter for ACM’s annual Gordon Bell Prize.

About SC17

SC17 is an international conference showcasing the many ways high performance computing, networking, storage and analysis lead to advances in scientific discovery, research, education and commerce. The annual event, created and sponsored by ACM (Association for Computing Machinery) and the IEEE Computer Society, attracts HPC professionals and educators from around the globe to participate in its complete technical education program, workshops, tutorials, a world-class exhibit area, demonstrations and opportunities for hands-on learning.

Source: SC17

The post Computing Pioneer Gordon Bell to Speak at High Performance Computing Conference appeared first on HPCwire.

Lustre: Stronger-than-Ever

Mon, 09/25/2017 - 01:02

Clearly, Lustre* will continue to dominate the persistent parallel file system arena, at least for a few years. The development of such complex technology doesn’t flow as quickly as for many other applications, and even though parallel file systems may soon be replaced, a gap would still exist until that technology would be available. DDN® announced in November 2016 that all its Lustre features would be merged into the Lustre master branch to allow the entire community to have more transparent access to the code, reducing the overhead of code development management and better aligning with the latest advancements. Although numerous contributors and collaborators have asked why DDN would choose to share these patches rather than leverage them as a competitive advantage and differentiator, DDN is committed to delivering these features as a foundation framework coded into the Lustre file system. These features will then support DDN’s broader development which is now looking into areas such as security, performance, RAS, and data management.

Along with the recently announced features, DDN proposes a new, novel approach for Lustre’s policy engine (LiPE) that aims to reduce installation and deployment complexity while delivering significantly faster results. LiPE relies on a set of components that allows the engine to scan Lustre MDTs quickly, create an in-memory mapping of the file system’s objects, and implement data management policies based on that mapped information. This approach initially allows users to define policies that trigger data automation via Lustre HSM hooks or external data management mechanisms. In the next stage of development, LiPE may be integrated with a File Heat Map mechanism for more automated and transparent data management, resulting in a better utilization of parallel storage infrastructure. (File Heat Map is another feature under development that will create a file mapping that weights the state object according to its utilization. For example, over time, the weight un-modified files will decay, indicating the likelihood of such a file being a WORM-style file suitable for moving into a different disk tier.)

Regarding performance, DDN has designed and developed a new Quality of Service (QoS) approach. QoS based on the Token Bucket Filter algorithm has been implemented on the OST level that allows system administrators to define the maximum number of RPCs to be issued by a user/group or job ID to a given OST. Throttling performance provides I/O control and bandwidth reservation; for example, by guaranteeing jobs with higher priority run in a more predictable time, performance variations due to I/O delays can be avoided. A new initiative between DDN and few renowned European universities will investigate the implementation of a high-level tool, possibly at the user level, that would allow an easier utilization and configuration of QoS with a set of new usability enhancements.

Other interesting features from DDN that will be available on Lustre 2.10 and its minor releases during the LTS cycle include the Project Quotas facility, single-thread performance enhancements, and secured Lustre (MLS and isolation), among others. In keeping with new HPC trends, a tremendous amount of work has also been invested into the integration of Lustre with Linux container-based workloads, providing native Lustre file system capabilities within containers, support for new kernel and specialized Artificial Intelligence and Machine Learning appliances. Customers who are moving toward software-defined storage may be surprised to learn that, as part of its strategy regarding parallel file systems, DDN has also recently announced that it will support ZFS and Lustre as software-only.

For more information about DDN’s Lustre offerings, visit the EXAScaler® product page.

Note: Other names and brands may be claimed as the property of others.

The post Lustre: Stronger-than-Ever appeared first on HPCwire.

GlobalFoundries Puts Wind in AMD’s Sails with 12nm FinFET

Sun, 09/24/2017 - 13:54

From its annual tech conference last week (Sept. 13), where GlobalFoundries welcomed more than 600 semiconductor professionals (reaching the Santa Clara venue’s max capacity and doubling 2016 attendee numbers), the one-of-four foundry business launched a raft of announcements, including a new 12nm FinFET process for high performance applications. Prominent customer Advanced Micro Devices announced it will use the 12nm technology in its Ryzen CPUs and Vega GPUs, bolstering its competitiveness against Intel and Nvidia.

GlobalFoundries said the new 12nm platform, which will be ready for risk production in the first half of 2018, will offer a 15 percent improvement in circuit density and a greater than 10 percent performance boost over industry 16nm FinFET solutions. (There’s an intended improvement over 14nm too, of course, but no specific figures were offered.) Manufactured at the GlobalFoundries’ Fab 8 factory in Malta, NY, 12LP (LP stands for Leading Performance) builds on its 14nm FinFET platform in high-volume production since early 2016.

“It pushes new design rules and some new constructs,” said GlobalFoundries CEO Sanjay Jha, “but our fundamental focus is to enable people who have already designed 14nm to be able to migrate to 12LP. It is a cost reduction path as well as a performance enhancement path for a number of our customers, including AMD.”

Source: GlobalFoundries

The 12LP platform features enhancements for RF/analog applications, and also new market-focused capability for automotive, a major focus for GlobalFoundries and close customer AMD, which according to a report published Wednesday from CNBC has snagged Tesla as a customer (however both GlobalFoundries and AMD categorically deny the rumor).

The 12nm technology is an intermediate step on the way to the 7nm FinFET node, slated for risk production in the first half of 2018. “It’s not a full redesign, but there is some design work to move into it,” said GlobalFoundries Chief Technology Officer Gary Patton in a press briefing. “We want it to be as close to an extension of 14 as possible. If you’ve invested all this design IP in 14, he added, “you want to extend that as much as possible. We’ve already done several performance enhancements on our 14nm, and this is just the next step to provide another performance enhancement but also provide a path to get some additional scaling [ahead of 7nm].”

Tirias Research analyst Jim McGregor has a positive outlook on 12nm, despite what he believes is a marketing-driven naming convention. “It’s really a subnode,” he shared with HPCwire, “It’s kind of funny because most people don’t make their subnodes public. Every company, every manufacturer whether it be Intel or GlobalFoundries has those sub-nodes. GlobalFoundries isn’t doing a full node at 10nm; they’re still going to 7nm.”

“So the announcement of calling it 12nm kind of surprised me but [having a subnode is] not something new. Where most people are just getting into 10nm manufacturing at the end of this year, right now, GlobalFoundries is pushing for the beginning of their manufacturing for first half of 2018, so they’re still aggressive. It surprised me a little bit, but I think that it was more of a benefit for AMD than anything else.

“AMD is very competitive with Intel right now, and in some cases is slaughtering Intel, so they don’t want to be perceived as being behind the curve of Intel from a process perspective,” McGregor added.

AMD uses GlobalFoundries 14nm FinFET process technology for its x86 Zen-based chips (Ryzen and EPYC) and for its Vega high performance GPUs. So far only Ryzen and Vega have been tapped for a 12nm upgrade; whether the EPYC server platform will also take this intermediary step may well hinge on 7nm’s readiness and capacity.

With the introduction of its 7nm process node, GlobalFoundries is touting a 40 percent performance improvement over 14nm and a 60 percent power reduction. The company is launching 7nm with optical lithography and has designed the technology to be drop-in compliant with EUV.

GlobalFoundries’ CEO Jha said he expects 12nm to be the last optical node, followed by 7nm becoming the first EUV node. “It will start out optical just as… 90nm started out being 200mm and then migrated to 300mm and of course the vast majority of 90nm shipped actually happened in 300mm so at the moment, I think the last 200mm node that we see is really 130-110nm and thereafter everything is 300mm. I think EUV will turn out to be that way; 7nm is the first place that will start.”

Jha’s keynote emphasized the coming age of connected intelligence (read EnterpriseTech coverage here) that is pushing the development of new silicon technologies.

“We’re seeing an important shift in the business model of the foundry business,” the CEO said. “System companies, like Google, like Amazon, like Tesla, like Microsoft, are coming directly to foundries, they are working with EDA companies, IP companies and system design houses to get the IP. They want to control the hardware/software interface for the next generation of AI developments. They really want to control the architecture of both hardware and software, and it’s been the scenario over the last 10 years.

“People who control and capture the hardware/software interface capture most of the value in the industry, and certainly Apple has proven that’s where innovation occurs. I think more and more people are beginning to see this business model and I think we’re seeing more system houses hiring semiconductor engineers and driving innovation.”

The post GlobalFoundries Puts Wind in AMD’s Sails with 12nm FinFET appeared first on HPCwire.

Machine Learning at HPC User Forum: Drilling into Specific Use Cases

Fri, 09/22/2017 - 12:47

The 66th HPC User Forum held September 5-7, in Milwaukee, Wisconsin, at the elegant and historic Pfister Hotel, highlighting the 1893 Victorian décor and art of “The Grand Hotel Of The West,” contrasted nicely with presentations on the latest trends in modern computing – deep learning, machine learning and AI.

Over the course of two days of presentations, a couple common themes became obvious: First, that machine and deep learning are focused currently on specific rather than general use cases and second, that ML and DL need to be part of an integrated workflow to be effective.

This was exemplified by Dr. Maarten Sierhuis from Nissan Research Facility Silicon Valley with his presentation “Technologies for Making Self-Driving Vehicles the Norm.” One of the most engaging talks, Dr. Sierhuis’s multi-media presentation on the triumphs and challenges facing Nissan while developing its self-driving vehicle program showcased that machine and deep learning “drives” the autonomous vehicle revolution.

The challenge that Nissan and other deep learning practitioners face is that current deep learning algorithms are programmed to learn to do one thing extremely well – the specific use case: image recognition of stop signs for example. Once an algorithm learns to recognize stop signs, the same amount of discrete learning must apply for every other road sign a vehicle may encounter. To create a general-purpose “road sign learning algorithm”, not only do you need a massive amount of image data (in the tens of millions of varied images), but also the compute to power the learning effort.

Dr. Weng-Keen Wong from the NSF echoed much the same distinction between the specific and general case algorithm during his talk “Research in Deep Learning: A Perspective From NSF” and was also mentioned by Nvidia’s Dale Southard during the disruptive technology panel. Arno Kolster from Providentia Worldwide in his presentation “Machine and Deep Learning: Practical Deployments and Best Practices for the Next Two Years” claimed as well that general purpose learning algorithms are obviously the way to go, but are still some time out.

Nissans’s Dr. Sierhuis went on to highlight some challenges computers still face which human drivers take for granted. For example, what does an autonomous vehicle do when a road crew is blocking the road in front of it? As a human driver, we’d simply move into the opposite lane to “just go around”, but to algorithms, this breaks all the rules: Crossing a double line, checking the opposite lane for oncoming traffic, shoulder checking, ensuring no crossing pedestrians, etc. All need real-time re-programming for the encountering vehicle and other vehicles that arriving at the obstacle.

Nissan proposes an “FAA-like” control system, but the viability of such a system remains to be seen. Certainly, autonomous technologies are integrating slowly into new cars to augment human drivers but a complete self-driving vehicle won’t appear in the marketplace overnight -cars will continue to function in a hybrid mode for some time. Rest assured, many of today’s young folks likely will never learn how to drive (or ask their parents to borrow the car on Saturday night).

This algorithmic specificity spotlights the difficulty of integrating deep learning into an actual production workflow.

Tim Barr’s (Cray) “Perspectives on HPC-Enabled AI” showed how Cray’s HPC technologies can be leveraged for Machine and Deep Learning for vision, speech and language. Stating that it all starts with analytics, Mr. Barr illustrated how industries such as Daimler improve manufacturing processes and products by leveraging deep learning to curtail vehicle noise and reduce vibration in its newest vehicles. Nikunj Oza from NASA Ames gave examples of machine learning behind aviation safety and astronaut health maintenance in “NASA Perspective on Deep Learning.” Dr. Oza’s background in analytics brought a fresh perspective to the proceedings and showcased that machine learning from history has earned a real place alongside modeling for industrial best practices.

In the simulation space, a fascinating talk from the LLNL HPC4Mfg program was William Elmer’s (LLNL) discussion of Proctor & Gamble’s “Faster Turnaround for Multiscale Models of Paper Fiber Products.” Simulating various paper product textures and fibers greatly reduce the amount of energy from drying and compaction. Likewise, Shiloh Industries’ Hal Gerber described “High Pressure Casting for Structural Requirements and The Implications on Simulation.” Shiloh’s team leverages HPC for changing vehicle structure — especially in creating lighter components with composites like carbon fiber and mixed materials.

It’s clear from the discussion that machine learning and AI are set to be first class citizens alongside traditional simulation within the HPC community in short order. While still unproven and with a wide variety of new software implementations, HP Labs presented a first-of-its-kind analysis of ML benchmarking on HPC Platforms. Hewlett Packard Labs’ Natalia Vassilieva’s “Characterization and Benchmarking of Deep Learning” showcased the “Book of Recipes” HP Labs is developing with various hardware and software configurations. Fresh off their integration of SGI technology into their technology stack, the talk not only highlighted the newer software platforms which the learning systems leverage, but demonstrated that HPE’s portfolio of systems and experience in both HPC and hyper scale environments is impressive indeed.

Graham Anthony, CFO of BioVista spoke on the “Pursuit of Sustainable Healthcare Through Personalized Medicine With HPC.” Mr. Anthony was very passionate about the work BioVista is doing with HPE and how HPC and deep learning change the costs of healthcare by increased precision in treatment through deriving better insights from data. BioVista takes insight from deep learning and feeds that into simulations for better treatments – a true illustration that learning is here to stay, and works hand in hand with business process flows for traditional HPC.

In his talk entitled “Charliecloud: Containers are Good for More Than Serving Cat Pictures?” Reid Priedhorsky from LANL covered a wide range of topics including software stacks, design philosophy and demoed Charliecloud which enables execution of docker containers on supercomputers.

The tongue-in-cheek title about cat pictures being synonymous with deep learning image recognition is not by accident. Stand-alone image recognition is really cool, but as expounded upon above, the true benefit from deep learning is having an integrated workflow where data sources are ingested by a general purpose deep learning platform with outcomes that benefit business, industry and academia.

From the talks, it is also clear that Machine Learning, Deep Learning and AI are presently fueled more by industry than by academia. This could be due to strategic and competitive business drivers as well as the sheer amount of data that companies like Facebook, Baidu and Google have available to them driving AI research and deep learning-backed products. HPC might not be needed to push these disciplines forward and is likely why we see this trend becoming more prevalent in everyday news.

There was obvious concern from the audience about a future where machines rule the world. Ethical questions of companies knowingly replacing workers with robots or AI came up in a very lively discussion. Some argued that there is a place for both humans and AI — quieting the fear that tens of thousands of people would be replaced by algorithms and robots. Others see a more dismal human future with evil and malevolent robots taking control and little left for humans to do. These are, of course, difficult questions to answer and further debates will engage and entertain everyone as we keep moving toward an uncertain, technical future.

On a lighter note, Wednesday evening’s dinner featured a local volunteer docent, Dave Fehlauer, giving an enjoyable, informative talk on Captain Frederick Pabst: his family, his world and his well-known Milwaukee staple, The Pabst Brewing Company.

By all accounts, this was one of the most enjoyed HPC User Forums meetings. With a coherent theme and a dynamic range of presentations, the Forum kept everyone’s interest and showcased the realm of possibilities within this encouraging trend of computing, both from industry and academic research perspectives.

The next domestic HPC User Forum will be held April 16-18, 2018 at the Loews Ventana Canyon in Tucson, Arizona. See http://hpcuserforum.com for further information.

About the Author

Arno Kolster is Principal & Co-Founder of Providentia Worldwide, a technical consulting firm. Arno focuses on bridging enterprise and HPC architectures and was co-winner of IDC’s HPC Innovation Award with his partner Ryan Quick in 2012 and 2014. He was recipient of the Alan El Faye HPC Inspiration Award in 2016. Arno can be reached at Arno.kolster@providentiaworldwide.com.

The post Machine Learning at HPC User Forum: Drilling into Specific Use Cases appeared first on HPCwire.

Biosoft Integrates Lab Equipment for Genetics Research with Help from PSSC Labs

Fri, 09/22/2017 - 09:43

LAKE FOREST, Calif., Sept. 22, 2017 — PSSC Labs, a developer of custom High-Performance Computing (HPC) and Big Data computing solutions, today announced its work with Biosoft Integrators to provide powerful, turn-key HPC Cluster solutions for researchers in the biotech and genetic research fields.

Biosoft Integrators (BSI) works with researchers around the world to integrate laboratory technology platforms. With extensive experience in laboratory settings, the company’s founders realized that often equipment and software are poorly integrated and lack the functionality to work with each other, requiring researchers to manually transfer work and data between software and equipment. BSI provides researchers with greater efficiency and management by providing tools which unify the laboratory and laboratory informatics. BSI combines knowledge, experience and technology platforms to the biotechnology marketplace including the manually tracked lab to the fully automated and integrated consumer genomics facility.

PSSC Labs will work with BSI to create truly, turn-key high performance computing (HPC) clusters, servers and storage solutions.  PSSC Labs has already delivered several hundred computing platforms for worldwide genomics and bioinformatics research. Utilizing the PowerWulf HPC Cluster as a base solution platform, PSSC Labs and BSI can customize individual components for a specific end user’s research goals.

PowerWulf HPC Clusters are proven compatible with several genomics research platforms including both Illumina and Pacific Biosciences. Each solution includes the latest Intel Xeon processors, high performance memory, advanced storage arrays and fast networking topology. The PowerWulf HPC Clusters also include PSSC Labs CBeST Cluster Management Toolkit to help researchers easily manage, monitor, maintain and upgrade their clusters.

“PSSC Labs was willing to work with us to design each HPC systems, even allowing our software engineers to work directly with personnel at their production facility to ensure each HPC platform was built to work with each individual research project,” said Stu Shannon Co- Founder and COO of BSI. “The performance and reliability of PSSC Labs’ products are amazing. Many of our clients are conducting research in remote regions in southeast Asia, where repairs to equipment is extremely difficult to perform, and since partnering with PSSC Labs’ the HPC systems have required little more than the occasional hard drive replacement.”

PSSC Labs’ PowerWulf HPC Cluster offers a reliable, flexible, high performance computing platform for a variety of applications in the following verticals: Design & Engineering, Life Sciences, Physical Science, Financial Services and Machine/Deep Learning.

Every PowerWulf HPC Cluster includes a three-year unlimited phone/email support package (additional year support available) with all support provided by their US based team of experienced engineers. Prices for a custom built PowerWulf HPC Cluster solution start at $20,000.  For more information see http://www.pssclabs.com/solutions/hpc-cluster/

About PSSC Labs

For technology powered visionaries with a passion for challenging the status quo, PSSC Labs is the answer for hand-crafted HPC and Big Data computing solutions that deliver relentless performance with the absolute lowest total cost of ownership.  All products are designed and built at the company’s headquarters in Lake Forest, California.

Source: PSSC Labs

The post Biosoft Integrates Lab Equipment for Genetics Research with Help from PSSC Labs appeared first on HPCwire.

Intel Awards Paderborn University a Cluster Powered by Xeon Processors and Arria 10 FPGAs

Fri, 09/22/2017 - 09:20

Sept. 22, 2017 — The Paderborn Center for Parallel Computing (PC²) has been selected by Intel to host a computer cluster that uses Intel’s Xeon processor with its Arria 10 FPGA software development platform. This server cluster connects Intel Xeon processor with an in-package field-programmable gate array (FPGA) via the platform’s high-speed QuickPath interconnect improving system bandwidth. The Intel FPGA can be programmed to serve as a workload-optimized accelerator offering substantial performance, agility, and energy-efficiency advantages.  This solution is suitable for a number of application domains, such as machine learning, data encryption, compression, image processing and video-stream processing. The platform also an ideal experimentation platform for innovative operating system or computing systems research, that focuses on novel approaches of integrating CPUs with accelerators at the software and hardware level. 

“We are very happy to have been selected by Intel as one of only two academic sites worldwide to host a cluster based on Intel Xeon processors and Intel Arria 10 FPGAs . Our computing center has a strong research background in accelerating demanding applications with FPGAs. The availability of these systems allows us to further expand our leadership in this area and – as a next step – bring Intel FPGA accelerators from the lab to HPC production systems,” says Prof. Dr. Christian Plessl, director of the Paderborn Center for Parallel Computing, who is been active in this research area for almost two decades.

Researchers worldwide can get access to the cluster by applying to Intel’s Hardware Accelerator Research Program. “We are looking forward to collaborate with Intel and other members of the Hardware Accelerator Research Program on using FPGA acceleration for emerging HPC and data center workloads. By provisioning access to the system to a large number of researchers, we are also gathering experience in how to manage systems with FPGA accelerators in a multi-user setting and for handling parallel applications that use multiple servers with FPGAs. This experience is crucial for deploying systems with FPGAs at scale,” explains Dr. Tobias Kenter, senior researcher and FPGA expert at the Paderborn Center for Parallel Computing.

Currently, the Paderborn Center is working on accelerating applications including theoretical physics, material sciences and machine learning with FPGAs.  This work is in collaboration with scientists from the application areas. In addition, novel domain-specific programming approaches for FPGAs are being developed to simplify the use of FPGAs for developers without a hardware design background.

About the Paderborn Center for Parallel Computing

The Paderborn Center for Parallel Computing, PC², is a scientific institute of Paderborn University, Germany. Our mission is to advance interdisciplinary research in parallel and distributed computing with innovative computer systems. We operate several high-performance cluster systems with up to 10’000 cores to provide HPC services to researchers from computational sciences at Paderborn University and the state of North Rhine-Westfalia.

One of our key research areas of is the study of computing systems with FPGA accelerators for energy-efficient HPC. The ability to customize the processing architecture implemented by the FPGA to the needs of applications allows us to build high-performance and at the same time efficient-efficient accelerators for demanding applications.

Source: Paderborn Center

The post Intel Awards Paderborn University a Cluster Powered by Xeon Processors and Arria 10 FPGAs appeared first on HPCwire.

Google Cloud Makes Good on Promise to Add Nvidia P100 GPUs

Thu, 09/21/2017 - 16:11

Google has taken down the notice on its cloud platform website that says Nvidia Tesla P100s are “coming soon.” That’s because the search giant has announced the beta launch of the high-end P100 Nvidia Tesla GPUs on the Google Cloud Platform as well as general availability of Tesla P80s, which have been in public beta since February.

Google also announced today (Sept. 21) discounts for users running virtual machine instances for more than one week per month on Google Compute Engine. The discounts that increase on a sliding scale based on monthly usage apply to both K80 and P100 GPUs. Google said the discounts mean customers pay only for the number of minutes they use an instance during a given month.

Google and other public cloud providers have been ramping GPU integration on their platforms as a way of differentiating their services in a cutthroat market that is gradually shifting to multi-cloud deployments. A recent industry survey found that enterprises are on average using three public cloud providers as they seek spread out workloads and avoid vendor lock in.

For its part, Google is stressing cloud GPUs as a way of accelerating workloads that utilize machine learning training and inference as well as geophysical data processing, genomics and other high-performance computing applications.

Released last year as a datacenter accelerator, the Tesla P100 GPU based on its Pascal architecture is touted as delivering a ten-fold performance increase compared to the K80. Google said the roll out would allow cloud customers to attach up to four P100s or eight K80s per VM. It is also offering up to four K80 boards with two GPUs per board.

Google is the latest public cloud vendor to embrace Nvidia’s P100 GPUs for hardware acceleration in the cloud. IBM said in April it would add P100s to its Bluemix development cloud for customers running computing intensive workloads such as deep learning and data analytics. Microsoft followed in May with plans to debut Pascal-generation GPU instances on its Azure cloud later this year. Microsoft hasn’t deployed them yet though (and neither has cloud king Amazon), which makes Google the first of the big three to have them.

With K80 GPUs from Nvidia now generally available on Google Compute Engine and P100s in beta, cloud GPUs are now being integrated “at all levels of the stack,” the company noted in a blog post announcing the hardware upgrades.

In terms of infrastructure, GPU workloads can run with VMs or application containers. For machine learning applications, Google stressed that its cloud tools could be reconfigured to leverage cloud GPUs to reduce the time required to train and scale models using the TensorFlow machine intelligence library.

The cloud GPUs are available within Google’s U.S. East and West Coast regions as well as European West and Asia East regions.

The post Google Cloud Makes Good on Promise to Add Nvidia P100 GPUs appeared first on HPCwire.

Cray Wins $48M Supercomputer Contract from KISTI

Thu, 09/21/2017 - 15:48

It was a good day for Cray which won a $48 million contract from the Korea Institute of Science and Technology Information (KISTI) for a 128-rack CS500 cluster supercomputer. The new system, equipped with Intel Xeon Scalable processors and Intel Xeon Phi processors, will be the largest supercomputer in South Korea and will provide supercomputing services for universities, research institutes, and industries.

The see-saw sale cycles for supercomputer vendors is always challenging and Cray had hit a couple of speed bumps caused by market slowness and other issues. The new system is expected to be put into production in 2018.

“Our supercomputing division is focused on maximizing research performance while significantly reducing research duration and costs by building a top-notch supercomputing infrastructure,” said Pillwoo Lee, General Director, KISTI. “Cray’s proficiency in designing large and complex high-performance computing systems ensures our researchers can now apply highly-advanced HPC cluster technologies towards resolving scientific problems using the power of Cray supercomputers.”

Since 1962, KISTI has served as a national science and technology information center and has provided information that researchers need to enhance South Korea’s national competitiveness as a specialized science and technology research institute supported by the government.

The Cray CS500 systems provide flexible node configurations featuring the latest processor and interconnect technologies giving customers the ability to tailor a system to specific needs — from an all-purpose high-performance computing cluster to an accelerated system configured for shared memory, large memory, or accelerator-based tasks. The contract includes the product and services.

“Leading global supercomputing centers like KISTI are pushing the boundaries of science and technology for the benefit of everyone,” said Trish Damkroger, Vice President of Technical Computing at Intel. “The leading Intel Xeon Scalable processors, Intel Xeon Phi processors and high-bandwidth Intel Omni-Path Architecture, combined with the expertise and innovation of Cray supercomputers, unleash researchers to achieve groundbreaking discoveries that address society’s most complex challenges and yield answers faster than has ever been possible before.”

Link to release: http://investors.cray.com/phoenix.zhtml?c=98390&p=irol-newsArticle&ID=2302209

The post Cray Wins $48M Supercomputer Contract from KISTI appeared first on HPCwire.

Avoiding the Storage Silo Trap; Best Practices for Data Storage in Scientific Research

Thu, 09/21/2017 - 14:42

From mismatches between compute and storage capabilities to colossal data volumes, data storage presents a number of challenges for scientific research. And as silos pop up and challenges expand, the pace of research often suffers.

The post Avoiding the Storage Silo Trap; Best Practices for Data Storage in Scientific Research appeared first on HPCwire.

Adolfy Hoisie to Lead Brookhaven’s Computing for National Security Effort

Thu, 09/21/2017 - 13:50

Brookhaven National Laboratory announced today that Adolfy Hoisie will chair its newly formed Computing for National Security department, which is part of Brookhaven’s new Computational Science Initiative (CSI).

“We see a huge potential to make a positive impact on the nation’s security by bringing our unique extreme-scale data expertise to bear on challenges of national importance,” said CSI Director Kerstin Kleese van Dam in the announcement. “The formation of this new department in CSI is our first step in this direction.”

Adolfy Hoisie, Brookhaven

Worries over computer and cyber attack need little introduction. The rapid growth in internet traffic and users and the voluminous data exchanges required between organizations to conduct business make the protection of the nation’s critical assets—including power grid infrastructure, telecommunication networks, and nuclear power stations—a big data–real-time analysis challenge.

Hoisie is an experienced and familiar name in the HPC community. Most recently he was founding director of the Department of Energy’s Center for Advanced Technology Evaluation (CENATE) based at Pacific Northwest National Laboratory. He first joined PNNL as a laboratory fellow in 2010, and went on to direct the Advanced Computing, Mathematics, and Data Division, and serve as PNNL’s lead for DOE’s ASCR programs.

“Adolfy is a long-time principal investigator in DOE’s Advanced Scientific Computing Research(ASCR) programs,” said Kleese van Dam. “At Brookhaven, he will continue in this capacity and contribute to solving computing challenges faced by other federal agencies, including those within the Department of Defense, such as the Defense Threat Reduction Agency and Defense Advanced Research Projects Agency, and the National Nuclear Security Administration. In addition, he will work closely with me and my leadership team to further CSI’s overall computing endeavors.”

Brookhaven describes the scope of the effort as, “From field-programmable gate arrays (configurable computing devices) integrated with traditional central processing units, and quantum computing that takes advantage of the way the tiniest of particles behave, to neuromorphic computing that mimics the neural networks of the human brain, these architectures are someday expected to perform operations much more quickly and with less energy. Ensuring the optimal performance of these architectures and achieving the timescales needed for different national security applications requires evaluating new hardware technologies and developing the needed system software, programming models, and analytical software in tandem.”

Link to announcement: https://www.bnl.gov/newsroom/news.php?a=212363

The post Adolfy Hoisie to Lead Brookhaven’s Computing for National Security Effort appeared first on HPCwire.

Stanford University and UberCloud Achieve Breakthrough in Living Heart Simulations

Thu, 09/21/2017 - 13:00

Cardiac arrhythmia can be an undesirable and potentially lethal side effect of drugs. During this condition, the electrical activity of the heart turns chaotic, decimating its pumping function, thus diminishing the circulation of blood through the body. Some kind of cardiac arrhythmia, if not treated with a defibrillator, will cause death within minutes.

Before a new drug reaches the market, pharmaceutical companies need to check for the risk of inducing arrhythmias. Currently, this process takes years and involves costly animal and human studies. In this project, the Living Matter Laboratory of Stanford University developed a new software tool enabling drug developers to quickly assess the viability of a new compound. This means better and safer drugs reaching the market to improve patients’ lives.

This research project has been performed by researchers from the Living Matter Laboratory at Stanford University, and supported by Living Heart Project members from SIMULIA, Hewlett Packard Enterprise, Advania, and UberCloud. It is based on the development of a Living Heart Model (LHM) that encompasses advanced electro-physiological modeling. The end goal is to create a biventricular finite element model to be used to study drug-induced arrhythmias of a human heart.

The Living Heart Project is uniting leading cardiovascular researchers, educators, medical device developers, regulatory agencies, and practicing cardiologists around the world on a shared mission to develop and validate highly accurate personalized digital human heart models. These models will establish a unified foundation for cardiovascular in silico medicine and serve as a common technology base for education and training, medical device design, testing, clinical diagnosis and regulatory science —creating an effective path for rapidly translating current and future cutting-edge innovations directly into improved patient care.

The Stanford team in conjunction with SIMULIA have developed a multi-scale 3-dimensional model of the heart that can predict the risk of this lethal arrhythmias caused by drugs. The team added capabilities to the Living Heart Model to include highly detailed cellular models, to differentiate cell types within the tissue and to compute electro-cardiograms (ECGs). This model is now able to bridge the gap between the effect of drugs at the cellular level to the chaotic electrical propagation that a patient would experience at the organ level.

A computational model that is able to assess the response of new drug compounds rapidly and inexpensively is of great interest for pharmaceutical companies, doctors, and patients. Such a tool will increase the number of successful drugs that reach the market, while decreasing cost and time to develop them, and thus help hundreds of thousands of patients in the future. However, the creation of a suitable model requires taking a multiscale approach that is computationally expensive: the electrical activity of cells is modelled in high detail and resolved simultaneously in the entire heart. Due to the fast dynamics that occur in this problem, the spatial and temporal resolutions are highly demanding.

Figure 1: Tetrahedral mesh (left) and cube mesh (right)

During the preparation and Proof of Concept phase (UberCloud Experiment 196), we set out to build and calibrate the healthy baseline case, which we then used to perturb with different drugs. After creating the UberCloud software container for SIMULIA’s Abaqus 2017 and deploying it on the HPE server in the Advania cloud, we started refining the computational mesh which consisted of roughly 5 million tetrahedral elements and 1 million nodes. Due to the intricate geometry of the heart, the mesh quality limited the time step, which in this case was 0.0012 ms for a total simulation time of 5000 ms. After realizing that it would be very difficult to calibrate our model with such a big runtime, we decided to work on our mesh, which was the current bottleneck to speed up our model. We created a mesh that was made out of cube elements (Figure 1). With this approach, we lost the smoothness of the outer surface, but reduced the number of elements by a factor of ten and increased the time step by a factor of four, for the same element size (0.7 mm). With a much faster model, we were able to calibrate the healthy, baseline case, which was assessed by electro-cardiogram (ECG) tracing (Figure 2) that recapitulates the essential features.

Figure 2: ECG tracing for the healthy, baseline case

During the final production phase, we have adapted all features of the model to a finer mesh with now 7.5 million nodes and 250,000,000 internal variables that are updated and stored within each step of the simulation. We have run 42 simulations to study whether a drug causes arrhythmias or not. With all the changes above we were able to speed up one simulation by a factor of 27 which then (still) took 40 hours using 160 CPU cores on Advania’s HPE system equipped with latest Intel Broadwell E5-2683v4 nodes and Intel OmniPath interconnect. In these simulations, we applied the drugs by blocking different ionic currents in our cellular model, replicating what is observed in cellular experiments. For each case, we let the heart beat naturally and see if the arrhythmia is developed.

Figure 3: Evolution of the electrical activity for the baseline case (no drug) and after the application of Quinidine. The electrical propagation turns chaotic after the drug is applied, showing the high risk of Quinidine to produce arrhythmias.

Figure 3 shows the application of the drug Quinidine, which is an anti-arrhythmic agent, but it has a high risk of producing Torsades de Points, which is a particular type of arrhythmia. It shows the electrical transmembrane potentials of a healthy versus a pathological heart that has been widely used in studies of normal and pathological heart rhythms and defibrillation. The propagation of the electrical potential turns chaotic (Figure 3, bottom) when compared to the baseline case (Figure 3, top), showing that our model is able to correctly and reliably predict the anti-arrhythmic risk of commonly used drugs. We envision that our model will help researchers, regulatory agencies, and pharmaceutical companies rationalize safe drug development and reduce the time-to-market of new drugs.

Acknowledgement: The authors are deeply grateful for the support from Hewlett Packard Enterprise (the Sponsor), Dassault Systemes SIMULIA (for Abaqus 2017), Advania (providing HPC Cloud resources), and the UberCloud tech team for containerizing Abaqus and integrating all software and hardware components into one seamless solution stack.

The post Stanford University and UberCloud Achieve Breakthrough in Living Heart Simulations appeared first on HPCwire.

PNNL’s Center for Advanced Tech Evaluation Seeks Wider HPC Community Ties

Thu, 09/21/2017 - 13:00

Two years ago the Department of Energy established the Center for Advanced Technology Evaluation (CENATE) at Pacific Northwest National Laboratory (PNNL). CENATE’s ambitious mission was to be a proving ground for near-term and long-term technologies that could impact DoE workloads and HPC broadly. This month the leadership baton was passed from founding director Adolfy Hoisie to Kevin Barker, a veteran PNNL researcher and member of the CENATE project since its start. Hoisie has moved to Brookhaven National Lab to lead another new initiative as chair of the just-formed Computing for National Security Department.

In its short lifespan, CENATE has made steady strides. It has assembled an impressive infrastructure of test and measurement capabilities to explore computer technology. It has tackled several specific projects, ranging from the study of novel architecture from Data Vortex and Nvidia’s DGX-1 to longer horizon efforts around neuromorphic technology. The change in leadership, emphasizes Barker, won’t alter CENATE’s ambitious plans, but it will enable refinement of several processes, notably an effort to forge tighter links to the HPC community writ large, including DoE researchers, academia, and commercial technology partners.

Today there are about a dozen CENATE staff at PNNL, says Barker. One of the biggest changes will be standing up a more inclusive, more activist steering committee to guide CENATE.

Kevin Barker, P.I., Center for Advanced Technology Evaluation (CENATE), PNNL

Recently, HPCwire talked with Barker about the developing plans for CENATE and its priorities. Barker is certainly no stranger to HPC. He joined PNNL in 2010 as a senior HPC research scientist rising to team lead for modeling and simulation in 2014. Before joining PNNL, Barker spent nearly six years at Los Alamos National Laboratory as an HPC research scientist.

HPCwire: Your prior CENATE experience will no doubt be helpful. Perhaps you could describe your role and provide a sense of what CENATE has accomplished to date.

Barker: Really, I’ve been with CENATE since it began. It had a couple of focus areas in terms of how it was organized internally. I was in charge of one of those areas around performance predictions. The idea was that CENATE would be a program that encompassed not only of testbed and performance measurements but also would take those performance measurements we could get from physical systems, or prototype systems at small scale, and use performance prediction techniques to explore what those performance impacts would be at large scale. That was my role. Now, I am the PI.

In the first two years of the project, CENATE has deployed testbeds incorporating emerging technologies in the areas of processing, memory, and networking, and has engaged with the research community to assess the applicability of these technologies to workloads of interest to the Department of Energy. Specifically, CENATE has explored high-throughput architectures applied to large-scale Machine Learning frameworks; non-volatile memories; reconfigurable optical networking technology; and self-routing, dynamic, congestion-free networks applied to graph analytics. Through a broad community engagement, CENATE has ensured that its findings are fed back through workshops and deep collaborations with leading researchers.

HPCwire: That’s an extensive list. What’s going to change?

Barker: This change in leadership isn’t a dramatic change in terms of the technical capabilities or what we hope to accomplish technically. Now we want to ensure CENATE is more tightly integrated with the outside community, the HPC community in the DoE, and the vendor space. We also want to make sure the work we are doing at CENATE has an impact back at the DoE.

We’re working on getting a good plan in place to accomplish that – engaging with the vendor community, engaging with application developers, systems software developers, with the DoE complex making CENATE resources available to those people so that we can have a collaborative research environment. They can bring their problems to CENATE and we could provide access to some of these novel and emerging technologies that CENATE is tasked with assessing.

HPCwire: Maybe we should step back and review CENATE’s mission. How do you see it evolving, particularly in light of establishing a more inclusive and activist steering committee?

Barker: Again, it hasn’t really changed. When the steering committee stands up we envision CENATE taking on two kinds of tracks in terms of the research and the resources that we look at in each track. In the first track, we envision a shorter time scale where we are looking at technologies that are very near to market that we can get close to either prototype hardware, early release hardware, or engineering sample hardware. For the second track, in terms of timescale, we want CENATE to have an impact on more novel or high risk architectural approaches. So we might look at such things as beyond Moore’s Law computing technologies.

We envision the steering committee having a big impact because we want to have some indication from the community regarding what are the technologies that we should be most interested in from a community perspective. [Tentatively] we envision a fixed six-month schedule steering committee meetings in particular to discuss what architectures should be look at in the next six months window and feedback from the previous six month windows. We haven’t decided yet whether those will take the form of a meeting or workshop where we have more community involvement from outside the just the steering committee. Those are some of the thing still under discussion.

HPCwire: Given the importance of the steering committee, how big will it be and who will be on it?

Barker: It could be 15 or so organizations, maybe a person from each organization. We would like to have participation from other labs in the DoE community, and potentially academic partners. For example Indiana University is a major user of the Data Vortex architecture so it makes sense for them to participate. The third group being the commercial vendor space. We want to have this settled (and up on the web) before supercomputing (SC17, November 12-17).

HPCwire: There are so many new technologies bubbling up at various stages of development. Adolfy had mentioned interest in developing neuromorphic chips. Is that work ongoing? What’s the thinking on longer term technologies?

Barker: We are definitely interested in these longer term technologies and think that CENATE can have a big impact in the community, presenting that to the funding sources and say hey we want to have CENATE really to be positioned to have an impact beyond the next thing that you can buy from your hardware vendor. To explore the next gen technologies that aren’t necessarily tied to commercial products at this point but may still have real impact, particularly in the generation of beyond exascale.

For example, the exascale systems are pretty well covered with the Exascale Computing Project. We’re very familiar now with what those systems are going to look like. People are very focused to get their applications to run on those architectures. That’s not really where we see CENATE having a play. In looking beyond that, what are the technologies that are going to shape high performance computing beyond exascale timeframe? We really want CENATE to be positioned to have an impact in those areas. This is what I mean by saying a refinement of the CENATE direction. Up until now CENATE has looked at a number of near-to-market or new to market technologies. And they have had a big impact. The DGX-1 is great example. We stood up a DGX-1 and immediately we had users from around the lab complex and academia clamoring to get on the machine to explore how their applications are going to perform to develop system software and things such as that.

But we want CENATE also to look beyond that, things like extreme heterogeneity, software reconfigurable computing. So this is really why we are placing and emphasis on the connection to the research community so that we can get as much as is possible an accurate prediction of why these are the technologies that we think are going to make an impact. How can CENATE position itself to help assess those technologies in the near-term and that might involve a much deeper dive into specific technologies? CENATE doesn’t have an unlimited amount of resources (time personnel dollars) so it’s very important we target those things as effectively as we can.

HPCwire: Funding is always an issue on advanced research and DoE is a big player. What about tapping into programs such as DARPA’s Electronics Resurgence Initiative (ERI) which is focused on post-Moore’s law technology and just received additional funding?

Barker: There are definitely some programs that worth [looking at]. We are not working with ERI in particular although that could be something we explore together with program management from DoE headquarters. But there are some opportunities exactly along those lines that we are looking into but nothing firm at this point.

HPCwire: Given the various changes, and the enhanced role of the steering committee, is it premature to identify the top five technologies we are going to tackle in the next year?

Barker: Exactly. Unfortunately the end of the year is kind of a busy time in the HPC world with SC (supercomputing conference) and everything else going on. We hope those kinds of things pinned down with at least some degree of certainty within the next few months.

HPCwire: One of the distinguishing aspects of CENATE is the diversity and sophistication of the test and measurement equipment and capabilities at PNNL. What’s happening on that front?

Barker: We have equipment for testing power and energy and well as for thermal measurement capability. That is still all in place. We’re expanding the evaluation test suite that we have been using up until this point, the benchmark codes. CENATE itself has an evaluation test suite in addition to reaching out to collaborators who are interested in the equipment and who bring their own software test suite. We’re interested in looking at these machines in the context of numerical simulation, high performance computing codes, as well as graph analytics codes, machine learning codes, so we are expanding that set of benchmark codes, but the measurement capabilities we have in place are still in place.

HPCwire: It sounds like, among other things, you are adapting your capabilities to be able to handle emerging, nontraditional ‘HPC’ needs such as deep learning and data analytics?

Barker: Right. One of the important things when we are looking at these architectures, and the DGX-1 is good example, is we want to evaluate those technologies in the mode they are designed to operate in. The DGX-1 really is designed as a deep learning/ machine learning architecture. Exploring simply traditional HPC simulation codes on it might not be the most appropriate thing to do. We want to paint it [DGX-1 performance and potential] in the light it was designed for. Our evaluation suite of kernels and benchmarks needs to encompass those application areas that these architectures are targeting. And things like machine learning and deep learning are becoming such a part of the DoE workload that for CENATE to remain relevant to the DoE we need to have that capability. The DoE HPC landscape is much more than tightly couple code.

HPCwire: In the past there had been talk of CENATE workshops and other outreach efforts to diffuse CENATE learnings into the community but I don’t think much has happened along those lines yet. How do you share results and what are the plans going forward?

Barker: This is one area where we have decided that some refinement is necessary. Currently the mechanism that we use to present some of these results back to the community is through publications. It’s a pretty typical route. We’ve had some success there and, for example, we have papers on our work with DGX-1 in the submission process right now. We want to expand how we do this and are still developing the plans.

Hosting user group meetings is another way. Just two weeks ago, we hosted the first Data Vortex user group meeting at PNNL and CENATE was a player in that it brought together a couple of other programs that were looking at the data vortex architecture. That was a really successful workshop. Researchers from DoE, other government agencies, academic researchers came here to PNNL specifically about the data vortex architecture which is a big architecture in CENATE. We actually have two Data Vortex machines. That’s an example we can point to where we can say CENATE is making an impact in the community.

The NDA issues are sometimes very tricky but we have some experience with other projects where similar issues have arisen so we do have some strategies to deal with NDA issues.

HPCwire: How will you reach potential collaborators. There’s the technical steering committee but given its relatively small size, who will you reach beyond its immediate interests and attract other collaborators?

Barker: We are standing up a new CENATE web site that we hope to have up very soon, which will solicit this kind of input and have a mechanism where we can say if you’re a commercial or vendor partner and you want to participate in the CENATE program, here’s how you can get in touch with us. We definitely don’t want to be an exclusive club. We want to cast a wide net in terms of the types of technologies that are represented in the steering committee. Some of this is still in progress

One of the things we are exploring [for the web site] is a way to have potential interested external parties propose what they would like to do and the equipment they would be interested in evaluating. Again, this where the technical steering committee again comes to evaluate these proposals. It might be a model where – and this is what we are moving towards – where we essentially put out a call [for proposal]. That sounds a bit formal. CENATE is not a funding organization and won’t fund external collaborators. But it will be a way for submitters to say what interesting problems are they interested in solving that CENATE could then participate in and possibly provide access to technology. So if you are professor with some graduate students you might say, ‘Here’s an application that we want to develop and we want to explore how it might work on architecture x but we don’t have the means to get architecture x, can CENATE help?’

HPCwire: Thank you for your time.

The post PNNL’s Center for Advanced Tech Evaluation Seeks Wider HPC Community Ties appeared first on HPCwire.

Los Alamos Gains Role in High-Performance Computing for Materials Program

Thu, 09/21/2017 - 11:34

LOS ALAMOS, N.M., Sept. 21, 2017 — A new high-performance computing (HPC) initiative announced this week by the U.S. Department of Energy will help U.S. industry accelerate the development of new or improved materials for use in severe environments. Los Alamos National Laboratory, with a strong history in the materials science field, will be taking an active role in the initiative.

“Understanding and predicting material performance under extreme environments is a foundational capability at Los Alamos,” said David Teter, Materials Science and Technology division leader at Los Alamos. “We are well suited to apply our extensive materials capabilities and our high-performance computing resources to industrial challenges in extreme environment materials, as this program will better help U.S. industry compete in a global market.”

“The High-Performance Computing for Materials Program will provide opportunities for our industry partners to access the high-performance computing capabilities and expertise of DOE’s national labs as they work to create and improve technologies that combat extreme conditions,” said Secretary of Energy Rick Perry. “This initiative combines two, crucial elements of the Administration’s mission at DOE – advances in high-performance computing and the improved transition of energy technologies to market.”

The HPC4Mtls initiative will initially focus on challenges facing industry as they work to develop new or improved materials that can sustain extreme conditions—including extreme pressure, radiation, and temperature, corrosion, chemical environment, vibration, fatigue, or stress states. It will focus on developing improved lightweight material technologies, as well. The program aims to enable a step change in the cost, development time, and performance of materials in severe environments and save millions of dollars in fuel and maintenance across sectors. These material advancements will also increase U.S. competitiveness in the global marketplace.

Through HPC4Mtls, industry will be able to solve common materials issues, discover new or improved materials and structures, and enhance their products and processes using the labs’ world-class computational resources and capabilities. These capabilities include:

  • Access to HPC systems, including five of the world’s ten fastest computers
  • Higher-fidelity simulations to augment products or processes
  • Prediction of material behavior in specific severe environments
  • Modeling of missing physical phenomena to enable more realistic simulations
  • Development of more complex models to capture interactions between physical phenomena
  • Access to expertise in computational fluid dynamics, thermodynamics, kinetics, materials modeling, and additive manufacturing.

Companies will be selected to participate in the initiative through an open, two-stage, competitive process and will contribute at least 20 percent of project costs. DOE will hold a closed-press workshop on October 12, 2017 in Pittsburgh, PA to provide more information on the program and engage U.S.-based companies, industry, universities, and government stakeholders.

Sponsored by DOE’s Office of Fossil Energy, the High Performance Computing for Materials (HPC4Mtls) Program is part of the larger HPC4 Energy Innovation Initiative, a Department-wide effort comprised of the Office of Fossil Energy, the Office of Energy Efficiency and Renewable Energy, and the Office of Nuclear EnergyLawrence Livermore National LaboratoryLos Alamos National LaboratoryOak Ridge National Laboratory, and the National Energy Technology Laboratory serve as the principal leads on this initiative, which could ultimately lower emissions, reduce fuel and maintenance costs across the economy, and save millions of dollars.

About Los Alamos National Laboratory (www.lanl.gov)

Los Alamos National Laboratory, a multidisciplinary research institution engaged in strategic science on behalf of national security, is operated by Los Alamos National Security, LLC, a team composed of Bechtel National, the University of California, BWX Technologies, Inc. and URS Corporation for the Department of Energy’s National Nuclear Security Administration.

Los Alamos enhances national security by ensuring the safety and reliability of the U.S. nuclear stockpile, developing technologies to reduce threats from weapons of mass destruction, and solving problems related to energy, environment, infrastructure, health and global security concerns.

Source: Los Alamos National Laboratory

The post Los Alamos Gains Role in High-Performance Computing for Materials Program appeared first on HPCwire.

ALCF Simulations Aim to Reduce Jet Engine Noise

Thu, 09/21/2017 - 10:53

CHICAGO, Ill., Sept. 21, 2017 — Humans make a lot of noise. The riffs of heavy metal bands like Metallica and Kiss have soared to levels in the 130-decibel range, levels sure to lead to auditory damage.

But try as they might, bands just can’t compete with the decibel ranges produced by jet engines. They are, said Joe Nichols, among the loudest sources of human-made noise that exist.

An assistant professor of Aerospace Engineering and Mechanics at the University of Minnesota, Nichols is fascinated by sound and its ability to find order in chaos – and by applying that understanding to the development of new technologies that can reduce noise in aircraft.

“His project leverages computational data with what he calls input-output analysis, which reveals the origins of jet noise that are otherwise hidden in direct run-of-the-mill forward simulations, or even experiments.” – Ramesh Balakrishnan, Argonne computational scientist

Nichols is working with the Argonne Leadership Computing Facility (ALCF), a U.S. Department of Energy (DOE) Office of Science User Facility within the DOE’s Argonne National Laboratory, to create high-fidelity computer simulations to determine how jet turbulence produces noise. The results may lead to novel engineering designs that reduce noise over commercial flight paths and on aircraft carrier decks.

“Noise tells you something about the fundamental nature of turbulence, because noise reveals order that is otherwise hidden in complex, highly nonlinear, chaotic phenomena,” he said.

That is why jet noise presents both a challenging and a beautiful problem for Nichols.

Taming the roar of the engine

Jet engines produce noise in different ways, but mainly it comes from the high-speed exhaust stream that leaves the nozzle at the rear of the engine. And planes are loudest when they move slowly, such as at takeoff or at landing. As the exhaust stream meets relatively still air, it creates tremendous shear that quickly becomes unstable. The turbulence produced from this instability becomes the roar of the engine.

Aeronautic engineers incorporate chevrons, broken eggshell-shaped patterns, into exhaust nozzle designs to change the shape of the jet as it leaves the engine. The idea is to reduce the noise by changing the pattern of the turbulence. But much of the design work remains a guessing game.

Working with ALCF computational scientist Ramesh Balakrishnan and Argonne’s supercomputer Mira, Nichols and his team are applying computational fluid dynamics to remove some of that guesswork. They start by conducting high-fidelity large eddy simulations that accurately capture the physics of the turbulence that is making the noise.

From those simulations they extract reduced-order, or more concise, models that explain what part of the turbulence actually makes the sound. In addition to improving scientific understanding of jet noise, these reduced-order models also provide a fast, yet accurate, means for engineers to evaluate new designs.

Simulating complex geometries like jet turbulence requires the use of an unstructured mesh — a non-uniform 3-D grid — to represent the dynamics involved. In this case, one simulation could have 500 million grid points. Multiply that by five to account for pressure, density and three components of velocity to describe the flow at every grid point. That equates to billions of degrees of freedom, or the number of variables Mira uses to simulate jet noise.

“But what if inside the jet turbulence there is a skeleton of coherent flow structures that we can describe with just 50 degrees of freedom,” suggested Nichols. “Which aspects are most important to the jet noise production? How do the flow structures interact with each other? How closely can the skeleton model represent the high-fidelity simulation?”

This work, published last year in the journal Physics of Fluids, could help engineers more precisely direct the modeling of jet engine nozzle geometries by determining, for instance, the ideal number and length of chevrons.

“What distinguishes Joe’s work from those of the other computational fluid dynamics projects at ALCF is that it involves the development of a method that could mature into becoming a design tool for aero-acoustics,” said ALCF’s Balakrishnan. “His project leverages computational data with what he calls input-output analysis, which reveals the origins of jet noise that are otherwise hidden in direct run-of-the-mill forward simulations, or even experiments.”

Simulating waves of aviation

One of the leading ways to predict the instability waves that create sound inside of turbulence is through methods based on a type of computational tool called parabolized stability equations. But while they’re good at predicting supersonic sound sources, they have a hard time predicting all the components of subsonic jet noise, especially in the sideline direction, or perpendicular to the exhaust stream.

The University of Minnesota team developed a new method based on input-output analysis that can predict both the downstream noise and the sideline noise. While it was thought that the sideline noise was random, the input-output modes show coherent structure in the jet that is connected to the sideline noise, such that it can be predicted and controlled.

Nichols also uses a variation on the input-output analysis to study noise produced by impingement, where a jet blast is directed at a flat surface, such as aircraft taking off from or hovering over an aircraft carrier deck.

Like decibel-breaking guitar licks, impingement produces a feedback loop when the turbulence hits a flat surface and accelerates outward. As the noise loops back towards the jet nozzle, new turbulence is triggered, creating extremely large tones that can reach into the 170-decibel range and do structural damage to the aircraft in question.

Nichols and his team are applying computational fluid dynamics to reduce the noise by changing the pattern of the turbulence. With Nichols are Anubhav Dwivedi (left) and Jinah Jeun (right), graduate students in Aerospace Engineering and Mechanics at the University of Minnesota. (Image courtesy of University of Minnesota.)

The team turned to Mira to conduct a high-fidelity simulation of an impinging jet without any modifications, and then measured the noise it produced. When compared to ongoing experiments, they predicted those same tones very accurately. A reduced-order model of the simulations helped Nichols more precisely predict how to change the jet configuration to eliminate feedback tones. Another simulation of the modified jet showed that the tones were almost completely gone.

“The simulations play a crucial role because they let us see spatio-temporally resolved fluid motions that would be impossible to measure experimentally, especially if you’re talking about a hot exhaust moving at Mach 1.5,” noted Nichols.

This research, says Balakrishnan, is still a work in progress, but the results are encouraging. While it still needs some refinement, it holds the promise of becoming a design tool that jet engine manufacturers may one day use to help quiet the skies.

For electric guitar makers Fender and Gibson, on the other hand, perhaps not so much.

Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation’s first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America’s scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy’s Office of Science.

The U.S. Department of Energy’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit the Office of Science website.

Source: John Spizzirri, ANL

The post ALCF Simulations Aim to Reduce Jet Engine Noise appeared first on HPCwire.

NSF-funded ‘SLATE’ Platform to Stitch Together Global Science Efforts

Thu, 09/21/2017 - 08:14

Sept. 21, 2017 — Today’s most ambitious scientific quests — from the cosmic radiation measurements by the South Pole Telescope to the particle physics of the Large Hadron Collider — are multi-institutional research collaborations requiring computing environments that connect instrumentation, data, and computational resources. Because of the scale of the data and the complexity of this science,  these resources are often distributed among university research computing centers, national high performance computing centers, or commercial cloud providers.  This resource heterogeneity causes scientists to spend more time on the technical aspects of computation than on discoveries and knowledge creation, while computing support staff are required to invest more effort integrating domain specific software with limited applicability beyond the community served.

With Services Layer At The Edge (SLATE), a $4 million project funded by the National Science Foundation, a team from the Enrico Fermi and Computation Institutes at University of Chicago will lead an effort with the Universities of Michigan and Utah to provide technology that simplifies connecting university and laboratory data center capabilities to the national cyberinfrastructure ecosystem. Once installed, SLATE connects local research groups with their far-flung collaborators, allowing central research teams to automate the exchange of data, software and computing tasks among institutions without burdening local system administrators with installation and operation of highly customized scientific computing services. By stitching together these resources, SLATE will also expand the reach of domain-specific “science gateways” and multi-site research platforms.

SLATE works by implementing “cyberinfrastructure as code”, augmenting high bandwidth science networks with a programmable “underlayment” edge platform. This platform hosts advanced services needed for higher-level capabilities such as data and software delivery, workflow services and science gateway components.

SLATE uses best-of-breed data center virtualization components, and where available, software defined networking, to enable automation of lifecycle management tasks by domain experts. As such, it simplifies the creation of scalable platforms that connect research teams, institutions and resources, accelerating science while reducing operational costs and development time. Since SLATE needs only commodity components, it can be used for distributed systems across all data center types and scales, thus enabling creation of ubiquitous, science-driven cyberinfrastructure.


At UChicago, the SLATE team will partner with the Research Computing Center and Information Technology Services to help the ATLAS experiment at CERN, the South Pole Telescope and the XENON dark matter search collaborations create the advanced cyberinfrastructure necessary for rapidly sharing data, computer cycles and software between partner institutions.  The resulting systems will provide blueprints for national and international research platforms supporting a variety of science domains.

For example, the SLATE team will work with researchers from the Computation Institute’s Knowledge Lab to develop a hybrid platform that elastically scales computational social science applications between commercial cloud and campus HPC resources. The platform will allow researchers to use their local computational resources with the analytical tools and sensitive data shared through Knowledge Lab’s Cloud Kotta infrastructure, reducing cost and preserving data security.

“SLATE is about creating a ubiquitous cyberinfrastructure substrate for hosting, orchestrating and managing the entire lifecycle of higher level services that power scientific applications that span multiple institutions,” said Rob Gardner, a Research Professor in the Enrico Fermi Institute and Senior Fellow in the Computation Institute. “It clears a pathway for rapidly delivering capabilities to an institution, maximizing the science impact of local research IT investments.”

Many universities and research laboratories use a “Science DMZ” architecture to balance security with the ability to rapidly move large amounts of data in and out of the local network. As sciences from physics to biology to astronomy become more data-heavy, the complexity and need for these subnetworks grows rapidly, placing additional strain on local IT teams.

That stress is further compounded when local scientists join multi-institutional collaborations, often requiring the installation of specialized, domain-specific services for the sharing of compute and data resources.

“Science, ultimately, is a collective endeavor. Most scientists don’t work in a vacuum, they work in collaboration with their peers at other institutions,” said Shawn McKee, director of the Center for Network and Storage-Enabled Collaborative Computational Science at the University of Michigan. “They often need to share not only data, but systems that allow execution of workflows across multiple institutions. Today, it is a very labor-intensive, manual process to stitch together data centers into platforms that provide the research computing environment required by forefront scientific discoveries.”

With SLATE, research groups will be able to fully participate in multi-institutional collaborations and contribute resources to their collective platforms with minimal hands-on effort from their local IT team. When joining a project, the researchers and admins can select a package of software from a cloud-based service — a kind of “app store” — that allows them to connect and work with the other partners.

“Software and data can then be updated automatically by experts from the platform operations and research teams, with little to no assistance required from local IT personnel,” said Joe Breen, Senior IT Architect for Advanced Networking Initiatives at the University of Utah’s Center for High Performance Computing. “While the SLATE platform is designed to work in any data center environment, it will utilize advanced network capabilities, such as software defined overlay networks, when the devices support it.”

By reducing the technical expertise and time demands for participating in multi-institution collaborations, the SLATE platform will be especially helpful to smaller universities that lack the resources and staff of larger institutions and computing centers. The SLATE functionality can also support the development of “science gateways” which make it easier for individual researchers to connect to HPC resources such as the Open Science Grid and XSEDE.

“A central goal of SLATE is to lower the threshold for campuses and researchers to create research platforms within the national cyberinfrastructure,” Gardner said.

Initial partner sites for testing the SLATE platform and developing its architecture include New Mexico State University and Clemson University, where the focus will be creating distributed  cyberinfrastructure in support of large scale bioinformatics and genomics workflows. The project will also work with the Science Gateways Community Institute, an NSF funded Scientific Software Innovation Institute, on SLATE integration to make gateways more powerful and reach more researchers and resources.

Source: Rob Mitchum, University of Chicago

The post NSF-funded ‘SLATE’ Platform to Stitch Together Global Science Efforts appeared first on HPCwire.

Berkeley Lab Cosmology Software Scales Up to 658,784 Knights Landing Cores

Wed, 09/20/2017 - 15:00

Sept. 20 — The Cosmic Microwave Background (CMB) is the oldest light ever observed and is a wellspring of information about our cosmic past. This ancient light began its journey across space when the universe was just 380,000 years old. Today it fills the cosmos with microwaves. By parsing its subtle features with telescopes and supercomputers, cosmologists have gained insights about both the properties of our Universe and of fundamental physics.

Despite all that we’ve learned from the CMB so far, there is still much about the universe that remains a mystery. Next-generation experiments like CMB Stage-4 (CMB-S4) will probe this primordial light at even higher sensitivity to learn more about the evolution of space and time and the nature of matter. But before this can happen scientists need to ensure that their data analysis infrastructure will be able to handle the information deluge.

Cumulative daily maps of the sky temperature and polarization at each frequency showing how the atmosphere and noise integrate down over time. The year-long campaign spanned 129 observation-days during which the ACTpol SS patch was available for a 13-hour constant elevation scan. To make these maps, the signal, noise, and atmosphere observations were combined (including percent level detector calibration error), filtered with a 3rd order polynomial, and binned into pixels. (Image Credit: Julian Borrill, Berkeley Lab)

That’s where researchers in Lawrence Berkeley National Laboratory’s (Berkeley Lab’s) Computational Cosmology Center (C3) come in. They recently achieved a critical milestone in preparation for upcoming CMB experiments: scaling their data simulation and reduction framework TOAST (Time Ordered Astrophysics Scalable Tools) to run on all 658,784 Intel Knights Landing (KNL) Xeon Phi processor cores on the National Energy Research Scientific Computing Center’s (NERSC’s) Cori system.

The team also extended TOAST’s capabilities to support ground-based telescope observations, including implementing a module to simulate the noise introduced by looking through the atmosphere, which must then be removed to get a clear picture of the CMB. All of these achievements were made possible with funding from Berkeley’s Laboratory Directed Research and Development (LDRD) program.

“Over the next 10 years, the CMB community is expecting a 1,000-fold increase in the volume of data being gathered and analyzed—better than Moore’s Law scaling, even as we enter an era of energy-constrained computing,” says Julian Borrill, a cosmologist in Berkeley Lab’s Computational Research Division (CRD) and head of C3. “This means that we’ve got to sit at the bleeding edge of computing just to keep up with the data volume.”

TOAST: Balancing Scientific Accessibility and Performance

Cori Supercomputer at NERSC.

To ensure that they are making the most of the latest in computing technology, the C3 team worked closely with staff from NERSC, Intel and Cray to get their TOAST code to run on all of Cori supercomputer’s 658,784 KNL processors. This collaboration is part of the NERSC Exascale Science Applications Program (NESAP), which helps science code teams adapt their software to take advantage of Cori’s manycore architecture and could be a stepping-stone to next generation exascale supercomputers.

“In the CMB community, telescope properties differ greatly, and up until now each group typically had its own approach to processing data. To my knowledge, TOAST is the first attempt to create a tool that is useful for the entire CMB community,” says Ted Kisner, a Computer Systems Engineer in C3 and one of the lead TOAST developers.

“TOAST has a modular design that allows it to adapt to any detector or telescope quite easily,” says Rollin Thomas, a big data architect at NERSC who helped the team scale TOAST on Cori. “So instead of having a lot of different people independently re-inventing the wheel for each new experiment, thanks to C3 there is now a tool that the whole community can embrace.”

According to Kisner, the challenges to building a tool that can be used by the entire CMB community were both technical and sociological. Technically, the framework had to perform well at high concurrency on a variety of systems, including supercomputers, desktop workstations and laptops. It also had to be flexible enough to interface with different data formats and other software tools. Sociologically, parts of the framework that researchers interact with frequently had to be written in a high-level programming language that many scientists are familiar with.

The C3 team achieved a balance between computing performance and accessibility by creating a hybrid application. Parts of the framework are written in C and C++ to ensure that it can run efficiently on supercomputers, but it also includes a layer written in Python, so that researchers can easily manipulate the data and prototype new analysis algorithms.

“Python is a tremendously popular and important programming language, it’s easy to learn and scientists value how productive it makes them. For many scientists and graduate students, this is the only programming language they know,” says Thomas. “By making Python the interface to TOAST, the C3 team essentially opens up HPC resources and experiments to scientists that would otherwise be struggling with big data and not have access to supercomputers. It also helps scientists focus their customization efforts at parts of the code where differences between experiments matter the most, and re-use lower-level algorithms common across all the experiments.”

To ensure that all of TOAST could effectively scale up to 658,784 KNL cores, Thomas and his colleagues at NERSC helped the team launch their software on Cori with Shifter—an open-source, software package developed at NERSC to help supercomputer users easily and securely run software packaged as Linux Containers. Linux container solutions, like Shifter, allow an application to be packaged with its entire software stack including libraries, binaries and scripts as well as defining other run-time parameters like environment variables.  This makes it easy for a user to repeatedly and reliably run applications even at large-scales.

“This collaboration is a great example of what NERSC’s NESAP for data program can do for science,” says Thomas. “By fostering collaborations between the C3 team and Intel engineers, we increased their productivity on KNL. Then, we got them to scale up to 658,784 KNL cores with Shifter. This is the biggest Shifter job done for science so far.”

With this recent hero run, the cosmologists also accomplished an important scientific milestone: simulating and mapping 50,000 detectors observing 20 percent of the sky at 7 frequencies for 1 year. That’s the scale of data expected to be collected by the Simons Observatory, which is an important stepping-stone to CMB-S4.

“Collaboration with NERSC is essential for Intel Python engineers – this is unique opportunity for us to scale Python and other tools to hundreds thousands of cores,” says Sergey Maidanov, Software Engineering Manager at Intel. “TOAST was among a few applications where multiple tools helped to identify and address performance scaling bottlenecks, from Intel MKL and Intel VTune Amplifier to Intel Trace Analyzer and Collector and other tools. Such a collaboration helps us to set the direction for our tools development.”

Accounting for the Atmosphere


The telescope’s view through one realization of turbulent, wind-blown, atmospheric water vapor. The volume of atmosphere being simulated depended on (a) the scan width and duration and (b) the wind speed and direction, both of which changed every 20 minutes. The entire observation used about 5000 such realizations. (Image Credit: Julian Borrill)

The C3 team originally deployed TOAST at NERSC nearly a decade ago primarily to support data analysis for Planck, a space-based mission that observed the sky for four years with 72 detectors. By contrast, CMB-S4 will scan the sky with a suite of ground-based telescopes, fielding a total of around 500,000 detectors for about five years beginning in the mid 2020s.

In preparation for these ground-based observations, the C3 team recently added an atmospheric simulation module that naturally generates correlated atmospheric noise for all detectors, even detectors on different telescopes in the same location. This approach allows researchers to test new analysis algorithms on much more realistic simulated data.

“As each detector observes the microwave sky through the atmosphere it captures a lot of thermal radiation from water vapor, producing extremely correlated noise fluctuations between the detectors,” says Reijo Keskitalo, a C3 computer systems engineer who led the atmospheric simulation model development.

Keskitalo notes that previous efforts by the CMB community typically simulated the correlated atmospheric noise for each detector separately. The problem with this approach is it can’t scale to the huge numbers of detectors expected for experiments like CMB-S4. But by simulating the common atmosphere observed by all the detectors once, the novel C3 method ensures that the simulations are both tractable and realistic.

“For satellite experiments like Planck, the atmosphere isn’t an issue. But when you are observing the CMB with ground-based telescopes, the atmospheric noise problem is malignant because it doesn’t average out with more detectors. Ultimately, we needed a tool that would simulate something that looks like the atmosphere because you don’t get a realistic idea of experiment performance without it,” says Keskitalo.

“The ability to simulate and reduce the extraordinary data volume with sufficient precision and speed will be absolutely critical to achieving CMB-S4’s science goals,” says Borrill.

In the short term, tens of realizations are needed to develop the mission concept, he adds. In the medium term, hundreds of realizations are required for detailed mission design and the validation and verification of the analysis pipelines. Long term, tens of thousands of realizations will be vital for the Monte Carlo methods used to obtain the final science results.

“CMB-S4 will be a large, distributed collaboration involving at least 4 DOE labs. We will continue to use NERSC – which has supported the CMB community for 20 years now – and, given our requirements, likely need the Argonne Leadership Class Facility (ALCF) systems too. There will inevitably be several generations of HPC architecture over the lifetime of this effort, and our recent work is a stepping stone that allows us to take full advantage of the Xeon Phi based systems currently being deployed at NERSC,” says Borrill.

The work was funded through Berkeley Lab’s LDRD program designed to seed innovative science and new research directions. NERSC and ALCF are both DOE Office of Science User Facilities.

The Office of Science of the U.S. Department of Energy supports Berkeley Lab. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.

More on the history of CMB research at NERSC:

About NERSC and Berkeley Lab

The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high-performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, the NERSC Center serves more than 6,000 scientists at national laboratories and universities researching a wide range of problems in combustion, climate modeling, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. DOE Office of Science. Learn more about computing sciences at Berkeley Lab.

Source: NERSC

The post Berkeley Lab Cosmology Software Scales Up to 658,784 Knights Landing Cores appeared first on HPCwire.