HPC Wire

Subscribe to HPC Wire feed
Since 1987 - Covering the Fastest Computers in the World and the People Who Run Them
Updated: 6 hours 5 min ago

Mellanox Introduces Spectrum-2 Open Ethernet Switch

Thu, 07/06/2017 - 08:27

SUNNYVALE, Calif. & YOKNEAM, Israel, July 6, 2017 — Mellanox Technologies, Ltd. (NASDAQ: MLNX), supplier of high-performance, end-to-end smart interconnect solutions for data center servers and storage systems, today announced the Spectrum-2, the world’s most scalable 200 gigabit and 400 gigabit Open Ethernet switch solution. Spectrum-2 is designed to set new records of data center scalability, more than 10 times higher than market competitors, and reduces data center operational costs by delivering 1.3 times better power efficiency. Moreover, Spectrum-2 provides new levels of programmability and optimizes routing capabilities for building the most efficient Ethernet-based compute and storage infrastructures.

Spectrum-2 provides industry-leading Ethernet connectivity for up to 16 ports of 400GbE, 32 ports of 200GbE, 64 ports of 100GbE and 128 ports of 50GbE and 25GbE, and enables a rich set of enhancements, including increased flexibility and port density, to build a variety of switch platforms optimized for cloud, Hyperscale, Enterprise data center, big data, artificial intelligence, financial, storage and more applications.

Spectrum-2’s innovative design provides IT managers the capability to fully optimize their network for specific customer requirements, and to maximize their data center return on investment. Moreover, Spectrum-2 delivers unmatched power efficiency when compared to alternative offerings, improving data center total cost of ownership. The solution implements a complete set of the network protocols within the switch ASIC in the most efficient way, providing users with all the functionality needed, out-of-box. Additionally, Spectrum-2 includes a flexible parser and packet modifier which can be programmed to process new types of future protocols, thereby future proofing the data center.

“Data Center customers are looking to significantly increase the Ethernet switch bandwidth in their networks while simultaneously raising the levels of programmability and visibility,” said Seamus Crehan, President, Crehan Research. “The Spectrum-2 switch from Mellanox not only addresses these needs, but does so with a cost-effective Open Ethernet solution.”

“Enterprise adoption of off-premises cloud services in conjunction with adoption of data driven computation using artificial intelligence (AI) techniques and machine learning (Ml) are some of the key drivers for 200GE and 400GE networking in the data center. In addition to speed, the rapid pace of innovation in cloud service provider data centers demands a programmable network where new protocols can be introduced without changing switch hardware,” said Cliff Grossner, Ph.D., research director and advisor, Data Center Research Practice, IHS Markit. “In a recent IHS Markit report, we learnt that off-premises cloud service revenue is expected to hit $343 billion in 2021, up from 126 billion in 2016; this will drive the need for high speed and power efficient programmable networking.”

“Spectrum-2 Open Ethernet switch enables our customers and partners to meet the voracious demands of data speed, data processing and real time data analytics, and to gain competitive advantages,” said Gilad Shainer, vice president of marketing at Mellanox Technologies. “With 10 times better scalability, 1.3 times better power efficiency, full programmability and flexibility, and the capability to seamlessly migrate to 200G and 400G data speeds, Spectrum-2 provides data centers with the ability to maximize return on investment and future proof their investment.”

Spectrum-2 is the first 400G and 200G Ethernet switch that provides adaptive routing and load balancing while guaranteeing Zero Packet Loss and Unconditional Port Performance. These capabilities enable predictable and highest network performance. The solution also doubles data capacity while providing the lowest latency (300 nanoseconds), 1.4 times lower than alternative offerings. Furthermore, Spectrum-2 is the ideal foundation for Ethernet storage fabrics to connect the next generation of high performance Flash based storage platforms, and combines cloud agility and scalability with enterprise reliability.

Spectrum-2 extends the capabilities of the first generation of Spectrum, which is deployed in thousands of data centers around the world. Spectrum enables IT managers to achieve leading performance and efficiency for 10G infrastructures and higher, and to effectively and economically migrate from 10G to 25G, 50G and 100G speeds. Spectrum capabilities were highlighted in a Tolly test report which demonstrated superior performance versus competitor products. Spectrum-2 maintains the same API as Spectrum, for porting software onto the ASIC via the Open SDK/SAI API or Linux upstream driver (Switchdev), and supports all of the standard network operating systems and interfaces including Cumulus Linux, SONIC, standard Linux distributions and more.

Spectrum-2 also supports an extensive set of telemetry capabilities, including the latest in-band network telemetry standard, which provide operators with full visibility into their network and allow them to monitor, diagnose and analyze every aspect of operations. This greatly simplifies data center management and enables IT managers to fully optimize the network to their data center application’s needs.

Industry Quotes:

“Our relationship with Mellanox crosses interconnect technologies and allows us to stay ahead of the market in terms of innovation and propels our portfolio evolution,” said Mr. Liu Ning, Deputy Director of Baidu SYS department. “We are looking forward to seeing Mellanox’s new generation switch product, Spectrum-2, come to market.”

“In the world of networking, capacity is king, and Mellanox does it again with Spectrum-2, pushing the limits on port speeds, densities, packet buffer size, and functionality,” said JR Rivers, Co-Founder & Chief Technology Officer, Cumulus Networks. “Our customers are on the forefront of applying web-scale principles in their data centers, and Spectrum-2 enables our customers to build high performance networks that leverages programmability capabilities and leading edge telemetry-based fabric validation.”

“Deploying scalable, reliable, and simple data management is core to our storage solutions that offer enterprise reliability, cloud scalability, efficiency, and performance,” said Marty Lans, Sr. Director Storage Connectivity and Ecosystem Engineering at Hewlett Packard Enterprise. “Spectrum-2 offers the performance, scalability, and reliability required in a storage fabric that underpins next generation storage architectures.”

“We are pleased to see Mellanox innovating with Spectrum-2,” Li Li, SVP, General Manager of Product Sales and Marketing at New H3C Group. “As the industry migrates to 200 gigabit and beyond, we are seeking new levels of programmability along with significant performance enhancements. Spectrum-2 holds the promise of providing the best of both worlds.”

“The exponential growth of data as organizations embrace cognitive computing and artificial intelligence requires faster and more scalable network infrastructures,” said Bryan Talik, director, IBM OpenPOWER System Enablement. “The Mellanox Spectrum-2 enables IBM to deliver better scalability and network optimization for OpenPOWER systems.”

“We are strongly focused on maximizing our data center ROI,” said Mr. Leijun Hu, VP of Inspur Group. “With Mellanox’s new Spectrum-2, we can see a clear path to 200 gigabit and beyond with impressive scalability and unmatched power efficiency. Spectrum-2 has a rich feature set that will also allow us to fully optimize our network to suit our specific needs which is critical to addressing our expanding business needs.”

“The increase in data volume and the need to support more users require faster network speeds and higher scalability,” Tao Liu, VP, at Kingsoft Cloud. “Mellanox Ethernet solutions empower our cloud infrastructure today and we look forward to using the advanced capabilities of Spectrum-2.”

“Mellanox Open Ethernet Spectrum and the upcoming Spectrum-2 switches enable to optimize industry wide data centers for best performance and efficiency,” said Yuval Bachar, Principal Engineer, Global Infrastructure Architecture and Strategy at LinkedIn. “The industry need for the exponential data center and edge growth requires to build a scalable and robust infrastructure. The Mellanox solution will enable this growth, offering robust feature-set and capabilities.”

“As a longtime partner of Mellanox, we are thrilled to see the industry’s first 400G and 200G Ethernet switch offering adaptive routing and load balancing while guaranteeing Zero Packet,” said Mr. Chaoqun Sha, SVP of Technology at Sugon. “Spectrum-2 will give the industry the highest network and predictable performance, which is key to our providing our customers with world-class service and support.”


Spectrum-2 SDK is available now for early-access. The Spectrum-2 switch ASIC is expected to be available later this year.

Supporting Resources:

About Mellanox

Mellanox Technologies (NASDAQ: MLNX) is a leading supplier of end-to-end Ethernet and InfiniBand smart interconnect solutions and services for servers and storage. Mellanox interconnect solutions increase data center efficiency by providing the highest throughput and lowest latency, delivering data faster to applications and unlocking system performance capability. Mellanox offers a choice of fast interconnect products: adapters, switches, software and silicon that accelerate application runtime and maximize business results for a wide range of markets including high performance computing, enterprise data centers, Web 2.0, cloud, storage and financial services. More information is available at: www.mellanox.com.

Source: Mellanox

The post Mellanox Introduces Spectrum-2 Open Ethernet Switch appeared first on HPCwire.

Rescale Joins the Automotive Simulation Center Stuttgart

Thu, 07/06/2017 - 08:25

STUTTGART, Germany, July 6, 2017 — Rescale is pleased to announce that it has become a full member of the Automotive Simulation Center Stuttgart, or asc(s. The asc(s is a non-profit organization promoting high-performance simulation in virtual vehicle development. It consists of automotive OEMs and suppliers, software and hardware manufacturers, engineering service providers, and research institutes.

The vehicle design-to-manufacturing process is becoming increasingly simulation driven, but the sheer number of components and systems in a modern car makes multi-disciplinary design for crashworthiness, NVH, durability, fatigue, thermodynamics, electromagnetics, and fluid flow extremely challenging. Fortunately, simulation software and high-performance hardware have both made huge gains in capability over recent years and it’s now possible to simulate complex multi-physics system problems and even run large design of experiments (DOEs) and optimizations on scalable cloud data center hardware. Many IT departments with on-premise capacity are struggling to support cutting-edge applications such as deep learning for autonomous driving and electromagnetic compatibility (EMC) simulation, making cloud or hybrid-cloud simulation highly attractive.

Rescale offers a turnkey “big compute” solution for the automotive industry to exploit the best available software and hardware. Rescale works with a number of automotive customers including Honda, Nissan, Toyota, America Axle, Pinnacle Engines, Magna, and Siemens. Through its partners, Rescale has over 200 pre-installed software tools and over 60 worldwide data centers on tap. Applications include airflow over the vehicle, multibody impact, combustion effectiveness, oil flow, electromagnetic compatibility, and deep learning for autonomous driving.

“Rescale is delighted to be accepted as a member of asc(s,” said Wolfgang Dreyer, Rescale’s EMEA General Manager. “asc(s provides a forum for simulation innovation across the European automotive sector and Rescale is enabling scalable, turnkey on-demand high-performance computing, all pivotal for making automotive simulation cost-effective, fast, and efficient. We look forward to working with the members of the association to better understand industry requirements and trends and to push the boundaries of automotive simulation.”

About asc(s

The asc(s is a non-profit association for know-how carriers in the field of automotive simulation. The company provides its members with the possibility to advance new simulation methods for virtual vehicle development fast and efficiently – particularly if these place high demands on the computing power and data volume.

The asc(s promotes, supports and realises the method development in the field of automotive simulation. Being an interest group and multiplier, the association can offer its members a wide range of services and activities. The main focus of the activities is the concentration of expertise from the automotive and supply industry, software and hardware manufacturers, engineering service providers and research institutes. The asc(s provides the environment for cooperation. Enterprises work hand in hand at the asc(s, thus gaining new impulses for the development of their products.

About Rescale

Rescale is the global leader for high-performance computing simulations and deep learning in the cloud. Trusted by the Global Fortune 500, Rescale empowers the world’s top scientists and engineers to develop the most innovative new products and perform groundbreaking research and development faster and at lower cost. Rescale’s ScaleX platform transforms traditional fixed IT resources into flexible hybrid, private, and public cloud resources—built on the largest and most powerful high-performance computing network in the world. For more information on Rescale’s ScaleX platform, visit www.rescale.com.

Source: Rescale

The post Rescale Joins the Automotive Simulation Center Stuttgart appeared first on HPCwire.

Mercury Systems Acquires Richland Technologies, LLC

Thu, 07/06/2017 - 07:06

ANDOVER, Mass., July 6, 2017 — Mercury Systems, Inc. (NASDAQ: MRCY, www.mrcy.com) today announced that it has acquired Richland Technologies, LLC (RTL). Based in Duluth, Ga., RTL specializes in safety-critical and high integrity systems, software, and hardware development as well as safety-certification services for mission-critical applications. In addition, the Company is a leader in safety-certifiable embedded graphics software for commercial and military aerospace applications. The acquisition complements Mercury’s acquisition of Creative Electronic Systems (CES) last November by providing additional capabilities in safety-critical markets as well as the opportunity to leverage RTL’s U.S. presence and expertise. Together, the RTL and CES acquisitions position Mercury uniquely as a leading provider of secure and safety-critical processing subsystems for aerospace and defense customers. Terms of the transaction were not disclosed. The acquisition is not expected to have a material impact on Mercury’s financial results for the first quarter or full fiscal year 2018. Mercury intends to maintain RTL’s presence in Duluth, Ga.

“We are very pleased to welcome RTL to the Mercury family,” said Mark Aslett, Mercury’s President and Chief Executive Officer. “Mercury gained a very strong footprint in safety-critical avionics with the acquisition of CES, based in Geneva, Switzerland, which we have renamed and is now operating as Mercury Mission Systems International (MMSI) as part of our Sensor and Mission Processing product line. The combination of RTL with MMS gives us a strong U.S. presence in the safety-critical avionics market, adding significant systems engineering, safety-critical software and hardware development and certification expertise to our existing mission computing portfolio. These new capabilities will enhance Mercury’s market penetration in commercial aerospace, defense platform management, C4I and mission computing – markets that are very closely aligned with Mercury’s existing market focus,” Aslett concluded.

For more information on Mercury Systems visit www.mrcy.com or contact Mercury at (866) 627-6951 or info@mrcy.com.

About Mercury Systems

Mercury Systems (NASDAQ:MRCY) is a leading commercial provider of secure sensor and mission processing subsystems. Optimized for customer and mission success, Mercury’s solutions power a wide variety of critical defense and intelligence programs. Headquartered in Andover, Mass., Mercury is pioneering a next-generation defense electronics business model specifically designed to meet the industry’s current and emerging technology needs. To learn more, visit www.mrcy.com.

Source: Mercury Systems

The post Mercury Systems Acquires Richland Technologies, LLC appeared first on HPCwire.

NCSA Grants $2.6M in Blue Waters Awards to Illinois Researchers

Thu, 07/06/2017 - 07:03

URBANA, Ill., July 6, 2017 — The National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign has awarded 3,697,000 node hours (NH) of time on the Blue Waters supercomputer to Illinois researchers from Spring 2017 proposal submissions.

The combined value of these awards is over $2.6 million dollars, and through the life of the Blue Waters program, NCSA has awarded over 43 million node hours to UI researchers—a value of nearly $27 million. Some of the time allocated for Blue Waters will go to projects that focus on HIV research, Laser Interferometer Gravitational-Wave Observatory (LIGO) simulations, genomics and global warming research.

NCSA researchers Eliu Huerta and Roland Haas will use their 495,000 NH allocation to generate catalogs of numerical relativity (NR) simulations with the open source, NR software called the Einstein Toolkit. NCSA is a an official member of the LIGO Scientific Collaboration (LSC), and Huerta and Haas share these simulations with the LSC to contribute to the validation of new gravitational wave transients.

Why Blue Waters? When LIGO detects a new gravitational wave transient, the NR waveforms are used to validate the astrophysical origin of the signal, and to constrain its astrophysical parameters. This time-critical analysis requires dozens of NR simulations, each requiring thousands of Blue Waters node hours of computer time that must be run in parallel to sample a higher dimensional parameter space. No other resource but Blue Waters can provide the required computational power on short notice.

Tandy Warnow, a professor in computer science and bioengineering at the University of Illinois at Urbana-Champaign, has been awarded 125,000 NH for her work on improving methods for phylogenomics, proteomics and metagenomics.

“The Blue Waters allocation is allowing us to develop new methods with much greater accuracy by testing and refining our algorithmic designs, so that we end up with new computational methods that are much more accurate than any current method, and that can scale to ultra-large datasets. None of this would be possible without Blue Waters!”

Juan Perilla and Jodi Hadden, researchers in the Theoretical and Computational Biophysics Group at the Beckman Institute, were allocated 582,000 NH for research focusing on virus capsids, the interactions of virus capsids with human factors and with antiviral drugs. Perilla says he and Hadden will use the allocation to study the effects of assembly inhibitor on the Hepatitis-B virus capsid and the HIV-1 capsid. “Blue Waters enables us to perform accurate, all-atom simulations of drug-compounds bound to the viral capsid and allows us to perform large-scale analysis of the results from the simulations,” said Pedilla.

Atmospheric sciences associate professor, Ryan Sriver and PhD candidate, Hui Li are using Blue Waters to explore the interactions between tropical cyclones (e.g. hurricanes) and Earth’s climate.

Sriver and Li are conducting a series of high-resolution global climate simulations using the Community Earth System Model (CESM), which features a 25 km atmosphere component capable of capturing realistic tropical cyclone activity—number, location, intensity—on a global scale. Results will enable key insights into the importance of tropical cyclones within Earth’s coupled climate system, as well as how storm activity may change in the future.

Narayana R. Aluru, Department of Mechanical Science and Engineering, could not perform the molecular dynamic simulations without the petascale power of Blue Waters. Aluru’s work focuses on systematic, thermodynamically consistent, structure-based coarse graining of room temperature ionic liquids. “Since the size of these ions is several nanometers and their interactions are highly dominated by electrostatics, the all-atom simulation of these systems is computationally expensive but critical. We perform molecular dynamics simulations which involve up to 400,000 atoms. These computationally expensive computations would not be possible to perform without a petascale supercomputer (Blue Waters),” said Aluru.

“The goal of our project is to uncover principles that drive the evolution of function in proteins,” said Gustavo Caetano-Anollés of the Department of Crop Sciences at the University of Illinois at Urbana-Champaign. “During previous Blue Waters allocations, we have performed 87 and 116 molecular dynamics simulations of protein loops on the timescales of approximately 10-12 and 50-70 nanoseconds, respectively,” said Caetano-Anollés. “In order to accomplish our goals in a timely fashion, we take advantage of efficient scalability on the NAMD simulation system in Blue Waters. Scalability coupled with GPU-computing provides acceleration gain vital to completing a project of this magnitude.”

“Using Blue Waters will allow us to self-consistently address the entire sequence of events leading to the development of a geomagnetic storm, and for the first time, to assess the implications of the induced electric fields to the enhancements of the near-Earth currents. This will provide the connection between the macro-scale dynamics and micro-scale processes leading to the development of a geomagnetic storm. As a result, this will significantly improve our space weather prediction capabilities,” said Raluca Ilie, Department of Electrical and Computer Engineering, about the impact of Blue Waters on her work, “Quantifying the Effects of Inductive Electric Fields in the Terrestrial Magnetosphere.”

Other researchers awarded Blue Waters allocations for Spring 2017 include:

  • Stuart Shapiro (Department of Physics), Milton RuizAntonios Tsokaros and Vasileios Paschalidis: 500,000 NH for “Gravitational and Electromagnetic Signatures of Compact Binary Mergers: General Relativistic Simulations at the Petascale”
  • Matthew West (Department of Atmospheric Sciences), Nicole Riemer and Jeffrey H. Curtis: 240,000 NH for “Verification of a Global Aerosol Model using a 3D Particle-Resolved Model”
  • Hsi-Yu Schive (NCSA), Matthew TurkJohn ZuhoneNathan Goldbaum and Jeremiah Ostriker: 230,000 NH for “Ultra-High Resolution Astrophysical Simulations with GAMER”
  • Aleksei Aksimentiev (Department of Physics): 225,000 NH for “Epigenetic Regulation of Chromatin Structure and Dynamics”
  • Mark Neubauer (Department of Physics), Philip ChangRob Gardner and Dewen Zhong: 140,000 NH for “Enabling Discoveries at the Large Hadron Collider through Advanced Computation and Deep Learning”
  • Benjamin Hooberman (Department of Physics), Amir FarbinMatt Zhang and Ryan Reece: 125,000 NH for “Employing Deep Learning Techniques for Particle Identification at the Large Hadron Collider”
  • Nancy Makri (Department of Chemistry), Peter Walters and Amartya Bose: 100,000 NH for “Quantum-Classical Path Integral Simulation of Charge Transfer Reactions”
  • Brad Sutton (Department of Bioengineering), Curtis JohnsonAlex Cerjanic and Aaron Anderson: 100,000 NH for “HPC-Based Approaches for High-Resolution, Quantitative MRI Applications”
  • Brian Thomas (Department of Mechanical Science & Engineering): 75,000 NH for “Multiphysics Modeling of Steel Continuous Casting”

Source: NCSA

The post NCSA Grants $2.6M in Blue Waters Awards to Illinois Researchers appeared first on HPCwire.

NSF Provides Status Report on U.S. Doctorate Education

Thu, 07/06/2017 - 07:00

The U.S. remains a potent factory for doctorate degrees according to the most recent National Science Foundation Survey of Earned Doctorates (SED). In 2015 the U.S. awarded 55,006 research doctorate degrees, the most ever recorded in the SED, with lion’s share awarded in science and engineering fields. Math and computer sciences remained the most desirable doctorates in terms of income and immediate job prospects but accounted for a small proportion of all doctorates awarded.

The 2015 SED report, which was posted late last week on the NSF web site, warned continued U.S. preeminence is not a given:

“The American system of doctoral education is widely considered to be among the world’s best, as evidenced by the large and growing number of international students each year—many of them among the top students in their countries—who choose to pursue the doctoral degree at U.S. universities. But the continued preeminence of U.S. doctoral education is not assured. Other nations, recognizing the contributions doctorate recipients make to economies and cultures, are investing heavily in doctoral education. Unless doctoral education in the United States continues to improve, the world’s brightest students, including U.S. citizens, may go elsewhere for the doctoral degree, and they may begin careers elsewhere as well.”

Noteworthy, the study deliberately omits professional degrees such as M.D., J.D., and PsyD which are aimed at professional practices rather than research jobs. Top line trends cited in the latest SED report include:

  • Science and engineering (S&E) degrees continued a 40-year trend of outpacing non-S&E degrees.
  • From 1975 to 2015, the number of S&E degrees more than doubled, with an average annual growth of 1.9 percent.
  • The number of non-S&E degrees awarded in 2015 is virtually identical to the number awarded in 1975. As a result of the different growth rates, the proportion of S&E doctorates climbed from 58 percent in 1975 to 75 percent in 2015.
  • The number of S&E doctorates awarded to temporary visa holders grew to 14,037 in 2015, up 2 percent compared to the previous year and up 30 percent since 2005.
  • The number of S&E doctorates awarded in 2015 to U.S. citizens and permanent residents grew to 24,547 in 2015, up 3 percent from the previous year and 43 percent since 2005.
  • During the 2005 to 2015 period, 10 countries accounted for 71 percent of the doctorates awarded to temporary visa holders. The top three — China, India and South Korea — accounted for more than half of the doctorates awarded to temporary visa holders.
  • “Women earned 46 percent of all doctorates in 2015, continuing a trend of women’s increasing prevalence in the annual total of recipients.

The study is replete with statistics and readily navigable online.” What follows is a very brief sampling of SED findings.

More Doctorates Being Awarded
As shown here, the number of doctorates awarded has risen steadily. Science and engineering degrees, as noted earlier, have grown fastest. That said, the relative number of students pursuing math and computer sciences doctorates hasn’t grown much. It’s now around seven percent of the total science and engineering doctorates awarded and still ranks last among disciplines.


Women Still Underrepresented in Sciences
As noted in the report, women’s share of doctorates awarded has grown over the past two decades. In 2015 women earned the majority of doctorates awarded in every broad field of study except physical and earth sciences, mathematics and computer sciences, and engineering. Contrarily, women’s share of math and computer sciences doctorates was nearly static. Indeed attracting women to HPC has long been a challenge and goal at NSF and elsewhere in the HPC community.

“Although women earned only about one-third of the 2015 doctorates awarded in physical and earth sciences and less than one-fourth of the doctorates in engineering, their relative shares of doctorates awarded in those fields has been growing rapidly. From 2005 to 2015, the proportion of doctorates in physical and earth sciences awarded to women increased by 6 percentage points, and the share of women in engineering grew by 5 percentage points. The proportion of female doctorate recipients in mathematics and computer sciences has grown more modestly, by 1 percentage point from 2005 to 2015.”


Job Market for Doctorates is Static
Job markets are always subject to generalized economic swings; that said the NSF study reports newly-minted S&E doctorates in 2015 often faced stiff challenges as measured again past trends.

“In every broad science and engineering (S&E) field, the proportion of 2015 doctorate recipients who reported definite commitments for employment or postdoctoral (postdoc) study was at or near the lowest level of the past 15 years, and it was 4 to 13 percentage points below the proportion reported in 2006, the most recent high point in definite commitments for S&E fields.”


Foreign Students – Stay or Return?
Given President Trump’s ongoing efforts to tighten visa and immigration regulation, there has been a good deal of discussion around foreign graduate students. This 2015 study doesn’t capture that dynamic but it does present some detail around where doctoral students come from and what their plans are for remaining in the U.S. or not. This section of the report is best read directly.

Perhaps not surprisingly, China is the dominant country of origin for doctoral student followed by India and South Korea. Europe, of course, has a well-developed graduate educational infrastructure.


Link to NSF summary article: https://nsf.gov/news/news_summ.jsp?cntn_id=242416&org=NSF&from=news

Link to the full NSF 2015 SED report: https://www.nsf.gov/statistics/2017/nsf17306/report/about-this-report.cfm

The post NSF Provides Status Report on U.S. Doctorate Education appeared first on HPCwire.

Nvidia, Baidu Expand AI Partnership 

Wed, 07/05/2017 - 13:28

Today at its developer conference in Beijing, Baidu announced a broadening of its AI partnership with Nvidia, including plans to bring Nvidia’s recently announced 120 TFLOPS Volta GPUs to Baidu Cloud and adoption of Nvidia’s DRIVE PX platform for Baidu’s newly named “Apollo” self-driving car strategy.

Although the companies did not disclose financial details, Nvidia stock jumped nearly 4 percent within hours of the announcement of the deal, which expands the Nividia’s entre to the vast potential of the Chinese market.

In addition, Baidu said it will optimize Baidu’s open source PaddlePaddle open source deep learning framework for Volta GPUs and bring AI capabilities to the Chinese consumer market by adding Baidu’s Duer OS voice-recognition AI system to Nvidia SHIELD TV.

“We see AI transforming every industry, and our strategy is to help democratize AI everywhere, in every cloud, in every AI framework, from the datacenter to the edge to the self-driving car,” said Ian Buck, Nvidia vice president and general manager of accelerated computing, in a pre-announcement press briefing.

Baidu also announced it will deploy in its datacenters Nvidia’s HGX reference architecture with Tesla Volta V100 and Tesla P4 GPU accelerators for AI training and inference. Combined with Baidu’s PaddlePaddle deep learning framework and Nvidia’s TensorRT deep learning inference software, “researchers and companies can harness state-of-the-art technology to develop products and services with real-time understanding of images, speech, text and video,” Nvidia said.

The availability of the Volta GPU architecture within the PaddlePaddle deep learning framework is aimed at supporting researchers and companies, along with Baidu, develop AI applications for search rankings, image classification services, real-time speech understanding, visual character recognition and other AI-powered services.

In announcing its selection of Nvidia’s DRIVE PX 2 AI supercomputer for its open source Apollo autonomous vehicle platform, Baidu said Apollo will also incorporate Tesla GPUs along with Nvidia CUDA and TensorRT software, adding that the self-driving car that Baidu showed recently at CES Asia was powered by DRIVE PX 2.

Several Chinese automakers today announced that they will join the Eco Partner Alliance of Apollo, including Changan, Chery Automobile, FAW, and Greatwall Motor.

In the Chinese AI home market, Baidu Duer OS, the company’s conversational AI system, will provide voice command capabilities to NVIDIA’s SHIELD TV for streaming video, gaming and smart home assistance. A version of the streamer, with custom software made for China, will be available later this year.

“NVIDIA and Baidu have pioneered significant advances in deep learning and AI,” said Buck. “We believe AI is the most powerful technology force of our time, with the potential to revolutionize every industry. Our collaboration aligns our exceptional technical resources to create AI computing platforms for all developers – from academic research, startups creating breakthrough AI applications, and autonomous vehicles.”

Editor’s note: This article first appeared in HPCwire’s sister publication EnterpriseTech.

The post Nvidia, Baidu Expand AI Partnership  appeared first on HPCwire.

SC17 Registration is Now Open

Wed, 07/05/2017 - 12:35

DENVER, Co., July 5, 2017 — Registration for SC17—the premier international conference on high performance computing, networking, storage and analysis—officially opens today, Wednesday, July 5.

SC17 continues the long tradition of a  robust and engaging program.  Specifically, this year SC17 offers a very competitive paper program with 327 submissions  – of which only 61 papers were accepted. It truly will be the best of the best. In addition, there will be lively discussions in 12 panels, and a variety of half- and full-day workshops will complement the overall Technical Program.

An important note: The fee structure for SC17 makes it advantageous to register early. Registering early for the Technical Program can save you up to $275 off your registration (depending on your registration category). Also, registering for Tutorials by October 15 can save you up to $350 off that registration. Register early for both the Technical Program and Tutorials and save up to $625!

Click here for more information and to register.

Source: SC17

The post SC17 Registration is Now Open appeared first on HPCwire.

Mellanox Schedules Release of Q2 2017 Financial Results

Wed, 07/05/2017 - 10:51

SUNNYVALE, Calif. & YOKNEAM, Israel, July 5, 2017 — Mellanox Technologies, Ltd. (NASDAQ: MLNX), a leading supplier of high-performance, end-to-end interconnect solutions for data center servers and storage systems, today announced that it will release its financial results for the second quarter 2017 after the market closes on Wednesday, July 26, 2017.

Following the release, Mellanox will conduct a conference call at 2 p.m. Pacific Time (5 p.m. Eastern Time). To listen to the call: dial +1-888-632-3384 (non-U.S. residents: +1-785-424-1675) approximately ten minutes prior to the start time.

The Mellanox financial results conference call will be available, via a live webcast, on the investor relations section of the Mellanox website at: http://ir.mellanox.com.

Interested parties may access the website 15 minutes prior to the start of the call to download and install any necessary audio software. An archived webcast replay will also be available on the Mellanox website.

About Mellanox

Mellanox Technologies (NASDAQ: MLNX) is a leading supplier of end-to-end Ethernet and InfiniBand intelligent interconnect solutions and services for servers, storage, and hyper-converged infrastructure. Mellanox’s intelligent interconnect solutions increase data center efficiency by providing the highest throughput and lowest latency, delivering data faster to applications and unlocking system performance. Mellanox offers a choice of high performance solutions: network and multicore processors, network adapters, switches, cables, software and silicon, that accelerate application runtime and maximize business results for a wide range of markets including high performance computing, enterprise data centers, Web 2.0, cloud, storage, network security, telecom and financial services. More information is available at www.mellanox.com.

Source: Mellanox

The post Mellanox Schedules Release of Q2 2017 Financial Results appeared first on HPCwire.

Brookhaven Lab Hosts Five-Day GPU Hackathon

Wed, 07/05/2017 - 09:51
From June 5 through 9, Brookhaven Lab’s Computational Science Initiative hosted “Brookathon”

July 5, 2017 — On June 5, coding “sprinters”—teams of computational, theoretical, and domain scientists; software developers; and graduate and postdoctoral students—took their marks at the U.S. Department of Energy’s (DOE) Brookhaven National Laboratory, beginning the first of five days of nonstop programming from early morning until night. During this coding marathon, or “hackathon,” they learned how to program their scientific applications on devices for accelerated computing called graphics processing units (GPUs). Guiding them toward the finish line were GPU programming experts from national labs, universities, and technology companies who donated their time to serve as mentors. The goal by the end of the week was for the teams new to GPU programming to leave with their applications running on GPUs—or at least with the knowledge of how to do so—and for the teams who had come with their applications already accelerated on GPUs to leave with an optimized version.

The era of GPU-accelerated computing 

GPU-accelerated computing—the combined use of GPUs and central processing units (CPUs)—is increasingly being used as a way to run applications much faster. Computationally intensive portions of an application are offloaded from the CPU, which consists of a few cores optimized for serial processing (tasks execute one at a time in sequential order), to the GPU, which contains thousands of smaller, more efficient cores optimized for parallel processing (multiple tasks are processed simultaneously).

Nicholas D’Imperio, chair of Brookhaven Lab’s Computational Science Laboratory, holds a graphics processing unit (GPU) made by NVIDIA.

However, while GPUs potentially offer a very high memory bandwidth (rate at which data can be stored in and read from memory by a processor) and arithmetic performance for a wide range of applications, they are currently difficult to program. One of the challenges is that developers cannot simply take the existing code that runs on a CPU and have it automatically run on a GPU; they need to rewrite or adapt portions of the code. Another challenge is efficiently getting data onto the GPUs in the first place, as data transfer between the CPU and GPU can be quite slow. Though parallel programming standards such as OpenACC and GPU advances such as hardware and software for managing data transfer make these processes easier, GPU-accelerated computing is still a relatively new concept.

A hackathon with a history

Here’s where “Brookathon,” hosted by Brookhaven Lab’s Computational Science Initiative(CSI) and jointly organized with DOE’s Oak Ridge National Laboratory, Stony Brook University, and the University of Delaware, came in.

“The architecture of GPUs, which were originally designed to display graphics in video games, is quite different from that of CPUs,” said CSI computational scientist Meifeng Lin, who coordinated Brookathon with the help of an organizing committee and was a member of one of the teams participating in the event. “People are not used to programming GPUs as much as CPUs. The goal of hackathons like Brookathon is to lessen the learning curve, enabling the use of GPUs on next-generation high-performance-computing (HPC) systems for scientific applications.”

Brookathon is the latest in a series of GPU hackathons that first began in 2014 at Oak Ridge Leadership Computing Facility (OLCF)—a DOE Office of Science User Facility that is home to the nation’s most powerful science supercomputer, Titan, and other hybrid CPU-GPU systems. So far, OLCF’s Fernanda Foertter, a HPC user support specialist and programmer, has helped organize and host 10 hackathons across the United States and abroad, including Brookathon and one at the Jülich Supercomputing Centre in Germany earlier this year.

Members of the organizing committee explain the motivation behind Brookathon and the other hackathons in the series, and participants and mentors discuss their experiences.

“Hackathons are intense team-based training events,” said Foertter. “The hope is that the teams go home and continue to work on their codes.”

The idea to host at Brookhaven started in May 2016, when Lin and Brookhaven colleagues attended their first GPU hackathon, hosted at the University of Delaware. There, they worked on a code for lattice quantum chromodynamics (QCD) simulations, which help physicists understand the interactions between particles called quarks and gluons. But in using the OpenACC programming standard, they realized it did not sufficiently support the C++ programming language that their code library was written in. Around this time, Brookhaven became a member of OpenACC so that CSI scientists could help shape the standard to include the features needed to support their codes on GPUs. Through the University of Delaware hackathon and weekly calls with OpenACC members, Lin came into contact with Foertter and Sunita Chandrasekaran, an assistant professor of computer science at the University of Delaware who organized that hackathon, both of whom were on board with bringing a hackathon to Brookhaven.

“Brookhaven had just gotten a computing cluster with GPUs, so the timing was great,” said Lin. “In CSI’s Computational Science Laboratory, where I work, we get a lot of requests from scientists around Brookhaven to get their codes to run on GPUs. Hackathons provide the intense hands-on mentoring that helps to make this happen.”

Teams from near and far

A total of 22 applications were submitted for a spot at Brookathon, half of which came from Brookhaven Lab or nearby Stony Brook University teams. According to Lin, Brookathon received the highest number of applications of any of the hackathons to date. Ultimately, a review committee of OpenACC members accepted applications from 10 teams, each of which brought a different application to accelerate on GPUs:

  • Team AstroGPU from Stony Brook University: codes for simulating astrophysical fluid flows
  • Team Grid Makers from Brookhaven, Fermilab, Boston University, and the University of Utah (Lin’s team): a multigrid solver for linear equations and a general data-parallel library (called Grid), both related to application development for lattice QCD under DOE’s Exascale Computing Project
  • Team HackDpotato from Stony Brook University: a genetic algorithm for protein simulation
  • Team Lightning Speed OCT (for optical coherence tomography) from Lehigh University: a program for real-time image processing and three-dimensional image display of biological tissues
  • Team MUSIC (for MUScl for Ion Collision) from Brookhaven and Stony Brook University: a code for simulating the evolution of the quark-gluon plasma produced at Brookhaven’s Relativistic Heavy Ion Collider (RHIC)—a DOE Office of Science User Facility
  • Team NEK/CEED from DOE’s Argonne National Laboratory, the University of Minnesota, and the University of Illinois Urbana-Champaign: fluid dynamics and electromagnetic codes (Nek5000 and NekCEM, respectively) for modeling small modular reactors (SMR) and graphene-based surface materials—related to two DOE Exascale Computing Projects, Center for Efficient Exascale Discretizations (CEED) and ExaSMR
  • Team Stars from the STAR from Brookhaven, Central China Normal University, and Shanghai Institute of Applied Physics: an online cluster-finding algorithm for the energy-deposition clusters measured at Brookhaven’s Solenoidal Tracker at RHIC (STAR) detector, which searches for signatures of the quark-gluon plasma
  • Team The Fastest Trigger of the East from the UK’s Rutherford Appleton Laboratory, Lancaster University, and Queen Mary University of London: software that reads out data in real time from 40,000 photosensors that collect light generated by neutrino particles, discards the useless majority of the data, and sends the useful bits to be written to disk for future analysis; the software will be used in a particle physics experiment in Japan (Hyper-Kamiokande)
  • Team UD-AccSequencer from the University of Delaware: a code for an existing next-generation-sequencing tool for aligning thousands of DNA sequences (BarraCUDA)
  • Team Uduh from the University of Delaware and the University of Houston: a code for molecular dynamics simulations, which scientists use to study the interactions between molecules

“The domain scientists—not necessarily computer science programmers—who come together for five days to migrate their scientific codes to GPUs are very excited to be here,” said Chandrasekaran. “From running into compiler and runtime errors during programming and reaching out to compiler developers for help to participating in daily scrum sessions to provide progress updates, the teams really have a hands-on experience in which they can accomplish a lot in a short amount of time.”

Read the full story at: https://www.bnl.gov/newsroom/news.php?a=212273

Source: BNL

The post Brookhaven Lab Hosts Five-Day GPU Hackathon appeared first on HPCwire.

Argonne’s Theta Supercomputer Goes Online

Wed, 07/05/2017 - 08:33

ARGONNE, Ill., July 5, 2017 — Theta, a new production supercomputer located at the U.S. Department of Energy’s Argonnne National Laboratory is officially open to the research community. The new machine’s massively parallel, many-core architecture continues Argonne’s leadership computing program towards its future Aurora system.

Theta was built onsite at the Argonne Leadership Computing Facility (ALCF), a DOE Office of Science User Facility, where it will operate alongside Mira, an IBM Blue Gene/Q supercomputer. Both machines are fully dedicated to supporting a wide range of scientific and engineering research campaigns. Theta, an Intel-Cray system, entered production on July 1.

The new supercomputer will immediately begin supporting several 2017-2018 DOE Advanced Scientific Computing Research (ASCR) Leadership Computing Challenge (ALCC) projects. The ALCC is a major allocation program that supports scientists from industry, academia, and national laboratories working on advancements in targeted DOE mission areas. Theta will also support projects from the ALCF Data Science Program, ALCF’s discretionary award program, and, eventually, the DOE’s Innovative and Novel Computing Computational Impact on Theory and Experiment (INCITE) program—the major means by which the scientific community gains access to the DOE’s fastest supercomputers dedicated to open science.

Designed in collaboration with Intel and Cray, Theta is a 9.65-petaflops system based on the second-generation Intel Xeon Phi processor and Cray’s high-performance computing software stack. Capable of nearly 10 quadrillion calculations per second, Theta will enable researchers to break new ground in scientific investigations that range from modeling the inner workings of the brain to developing new materials for renewable energy applications.

“Theta’s unique architectural features represent a new and exciting era in simulation science capabilities,” said ALCF Director of Science Katherine Riley. “These same capabilities will also support data-driven and machine-learning problems, which are increasingly becoming significant drivers of large-scale scientific computing.”

Now that Theta is available as a production resource, researchers can apply for computing time through the facility’s various allocation programs. Although the INCITE and ALCC calls for proposals recently closed, researchers can apply for Director’s Discretionary awards at any time.

Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation’s first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America’s scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy’s Office of Science.

The U.S. Department of Energy’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit the Office of Science website.

Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation’s first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America’s scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy’s Office of Science.

The U.S. Department of Energy’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit the Office of Science website.

Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation’s first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America’s scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy’s Office of Science.

The U.S. Department of Energy’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit the Office of Science website.

Source: ANL

The post Argonne’s Theta Supercomputer Goes Online appeared first on HPCwire.

BSC Participates in European Public Procurement of Innovation Solutions

Wed, 07/05/2017 - 08:30

BARCELONA, July 5, 2017 — Barcelona Supercomputing Center (BSC), together with four other leading European HPC centres, is participating for the first time in a market consultation for the purchase of HPC systems, as published today in the Official Journal of the European Union (OJEU). The Public Procurement of Innovative Solutions (PPI) involves four public procurers (BSC, CINECA, FZJ/JSC and GENCI) located in four different countries (Spain, Italy, Germany and France) working together in a joint procurement.This collaboration takes place within the Public Procurement of Innovative Solutions for High-Performance Computing (PPI4HPC) project, funded by the European Commission.

On 6 September 2017, an Open Dialogue Event (ODE) will take place in Brussels whose aim is to inform all interested suppliers about their expectations and plans, as well as to gather feedback from the market. The PPI is an administrative action to foster innovation, geared towards enhancing the development of new innovative markets from the demand side through the instrument of public procurement.

BSC Operations Director and PPI4HPC project lead at BSC, Sergi Girona, points out that “for the first time in Europe, a joint European procurement of innovative HPC systems will organise a meeting with IT companies and purchase the HPC technologies of the future. BSC, as one of the main supercomputing centres in Europe, will be part of this innovative project”.

BSC solution focused on compute and storage infrastructure

BSC will acquire a compute and storage infrastructure for high performance data analytics (HPDA), to be installed during the first half of 2019. The BSC HPDA infrastructure will combine compute nodes with innovative storage technologies such as NVRAM in conjunction with storage technologies already in place at BSC, such as standard hard drives and tape infrastructure, configured in a tiered storage solution to store hundreds of petabytes of scientific data. This solution will allow data to be supplied for BSC’s HPC and HPDA resources in the near future, during the pre-exascale period.

As for the compute element, a high performance for data analytics infrastructure will be acquired to complement HPC production clusters currently at BSC, which will pre-process simulation data with new big data or analytics paradigms and algorithms.

The BSC solution is thus expected to provide innovative technologies for massive data storage and tiering. For further information about the PPI, please visit the PPI4HPC project’s website.


A group of leading European supercomputing centres decided to form a buyers’ group to execute a joint Public Procurement of Innovative Solutions (PPI) for the first time in the area of high-performance computing (HPC). The co-funding by the European Commission (EC) will allow for a significant enhancement of the planned pre-exascale HPC infrastructure from 2019 and pave the way for future joint investments in Europe. The total investment planned will be about € 73 million. The HPC centres involved, – BSCCEA/GENCICINECA and JUELICH– have a strong track record in providing supercomputing resources at European level.


Source: BSC

The post BSC Participates in European Public Procurement of Innovation Solutions appeared first on HPCwire.

The Virtual Institute – High Productivity Supercomputing Celebrates 10th Anniversary

Wed, 07/05/2017 - 08:27

July 5, 2017 — The perpetual focus on hardware performance as a primary success metric in high-performance computing (HPC) often diverts attention from the role of people in the process of producing application output. But it is ultimately this output and the rate at which it can be delivered, in other words the productivity of HPC, which justifies the huge investments in this technology. However, the time needed to come up with a specific result or the “time to solution”, which it is often called, depends on many factors, including the speed and quality of software development. This is one of the solution steps where people play a major role. Obviously, their productivity can be enhanced with tools such as debuggers and performance profilers, which help them to find and eliminate errors or diagnose and improve performance.

Ten years ago, the Virtual Institute – High Productivity Supercomputing (VI-HPS) was created exactly with this goal in mind. Application developers should be able to focus on the science to accomplish instead of having to spend major portions of their time solving problems related to their software. With initial funding from the Helmholtz Association, the umbrella organization of the major national research laboratories in Germany, the institute was founded on the initiative of Forschungszentrum Jülich together with RWTH Aachen University, Technische Universität Dresden, and the University of Tennessee.

Since then, the members of the institute have developed powerful programming tools, in particular for the purpose of analyzing HPC application correctness and performance, which are today used across the globe. Major emphasis was given to the definition of common interfaces and exchange formats between these tools to improve the interoperability between them and lower their development cost. A series of international tuning workshops and tutorials taught hundreds of application developers how to use them. Finally, the institute organized numerous academic workshops to foster the HPC tools community and offer especially young researchers a forum to present novel program analysis methods. Today, the institute encompasses twelve member organizations from five countries.

One June 23rd, 2017, the institute celebrated its 10th anniversary at a workshop held in Seeheim, Germany. Anshu Dubey from Argonne National Laboratory, one of the keynote speakers, explained that in HPC usually all parts of the software are under research, an important difference to software development in many other areas, leading to an economy of incentives where pure development is often not appropriately rewarded. In his historical review, Felix Wolf from TU Darmstadt, the spokesman of VI-HPS, looked back on important milestones such as the bylaws introduced to cover the rapid expansion of VI-HPS taking place a few years ago. In another keynote, Satoshi Matsuoka from the Tokyo Institute of Technology /AIST, Japan highlighted the recent advances in artificial intelligence and Big Data analytics as well as the challenges this poses for the design of future HPC systems. Finally, all members of VI-HPS presented their latest productivity-related research and outlined their future strategies.

Workshop website:

Source: The Virtual Institute – High Productivity Supercomputing

The post The Virtual Institute – High Productivity Supercomputing Celebrates 10th Anniversary appeared first on HPCwire.

Intersect360 Survey Shows Continued InfiniBand Dominance

Tue, 07/04/2017 - 08:48

There were few surprises in Intersect360 Research’s just released report on interconnect use in HPC. InfiniBand and Ethernet remain the dominant protocols across all segments (system, storage, LAN) and Mellanox and Cisco lead the supplier pack. The big question is when or if Intel’s Omi-Path fabric will break through. Less than one percent of the sites surveyed (system and storage interconnect) reported using Omni-Path.

“Although this share trails well behind Mellanox, Intel will move toward the integration of its interconnect technologies into its processor roadmap with the introduction of Omni-Path. This potentially changes the dynamics of the system fabric business considerably, as Intel may begin to market the network as a feature extension of the Intel processing environment,” according to the Intersect360 Research report.

“For its part, Mellanox has preemptively responded both strategically and technologically, surrounding itself with powerful partners in the OpenPOWER Foundation and coming to market with features such as multi-host technologies, which argue for keeping host-bus adapter technology off-chip.”

Indeed Mellanox and Intel have waged a war of words over the past two years surrounding the direction of network technology and the merits of off-loading networking instructions from the host CPU and distributing more processing power throughout the network. Of course, these are still early days for Omni-Path. Battling benchmarks aside, InfiniBand remains firmly entrenched at the high end although 100 Gigabit Ethernet is also gaining attraction.

It is important to note the Intersect360 data is from its 2016+ HPC site survey. This most recent and ninth consecutive Intersect360 survey was conducted in the second and third quarter of 2016 and received responses from 240 sites. Combined with entries from the prior two surveys, 487 HPC sites are represented in Site Census reports in 2016. In total, 474 sites reported interconnect and network characteristics for 723 HPC systems, 633 storage systems, and 638 LANs. The next survey should provide stronger directional guidance for Omni-Path.

Among key highlights from the report are:

  • Over 30% of system interconnect and LAN installations reported using 1 Gigabit Ethernet. “We believe that these slower technologies are often used as secondary administrative connections on clusters, and as primary interconnect for small throughput-oriented clusters. In the LAN realm, we see Gigabit Ethernet as still in use for smaller organizations and/or for subnetworks supporting departments/workgroups within larger organizations. Still, the tenacity of this technology surprises us.”
  • About 72% of Gigabit Ethernet was mentioned as a secondary interconnect, not primary. Gigabit Ethernet comes on many systems as a standard cluster interconnect, contributing to its high use in distributed memory systems.
  • InfiniBand continues to be the preferred high-performance system interconnect. “If we exclude Ethernet 1G (dubbed high-performance interconnect), installations of InfiniBand are about two times the combined installations of Ethernet (10G, 40G, and 100G). Within InfiniBand installations, InfiniBand 40G continues to be the most installed. However, for systems acquired since 2014, InfiniBand 56G is the most popular choice for systems.”
  • Ten Gigabit Ethernet is used more for storage and LAN installations than any other protocol. Installations of 10 Gigabit Ethernet account for 35% of all storage networks reported and 35% of all LANs reported. InfiniBand has been gradually increasing its share of storage networks, increasing from to 34% from 31% with almost all of this coming from InfiniBand 56G.
Figure 3 provides a visual of the transition to higher speeds for InfiniBand system interconnects for systems reported since our very first Site Census survey in 2008. For systems acquired in 2007, we saw about equal distribution between InfiniBand 10G and 20G. By 2009, more systems reported using InfiniBand 40G. InfiniBand 40G accounted for majority share of systems until 2014 when InfiniBand 56G took over as the primary installed system interconnect. The transition is fast with the latest performance leader accounting for the majority of shipments within two to three years of availability.

Two main drivers of the overall market, reports Intersect360, are 1) the growth in data volume and stress it puts on interconnect along and 2) a persistent “if it’s not broke don’t fix it” attitude with regard to with switching to new technologies. Ethernet is benefiting from the latter.

Parallelization of code is another major influence. “Architecting interconnects for parallel applications performance has long been a major concern for MPP systems which are built around proprietary interconnects, and supercomputer-class clusters which tend to use the fastest general-purpose network technology. We believe that the trend towards greater application parallelization at all levels will drive requirements for network performance down market into high-end and midrange computing configurations,” according to Intersect360.

The report is best read in full: Here’s a brief excerpt from the report’s conclusion:

“The transition to the latest or faster interconnect appears to be occurring at about the same rate as the life cycle of servers – every two to three years. With each system refresh, the latest or best price/performance interconnect is chosen. Ultimately, though, application needs drive what system performance requirements are needed. The cost of components limit the rate of adoption. Our data suggests most academic and government sites, along with some of the commercial sites, particularly energy, large manufacturing, and bio-science sites, value the performance of InfiniBand for system interconnects. Many of the applications in these areas support and users leverage multi-processing, GPUs, and multi-core architectures.”

Perhaps not surprising, Mellanox was the top supplier for system interconnects (42% of mentions) and storage networks (35% of mentions) – in fact Intersect360 reports Mellanox gained market share in all segments its 2015 showing. Cisco continues to be the leading supplier for the LAN market, with 46% of the mentions, according to Intersect360.

Link to report summary: http://www.intersect360.com/industry/reports.php?id=149

The post Intersect360 Survey Shows Continued InfiniBand Dominance appeared first on HPCwire.

NEC Accelerates Machine Learning for Vector Computers

Mon, 07/03/2017 - 14:39

TOKYO, July 3 — NEC Corporation today announced that it has developed data processing technology that accelerates the execution of machine learning on vector computers by more than 50 times in comparison to Spark technologies (*1).

This newly developed data processing utilizes computing and communications technologies that leverage “sparse matrix” data structures in order to significantly accelerate the performance of vector computers in machine learning.

Furthermore, NEC developed middleware that incorporates sparse matrix structures in order to simplify the use of machine learning. As a result, users are able to easily launch this middleware from Python or Spark infrastructures, which are commonly used for data analysis, without special programming.

“This technology enables users to quickly benefit from the results of machine learning, including the optimized placement of web advertisements, recommendations, and document analysis,” said Yuichi Nakamura, General Manager, System Platform Research Laboratories, NEC Corporation. “Furthermore, low-cost analysis using a small number of servers enables a wide range of users to take advantage of large-scale data analysis that was formerly only available to large companies.”

NEC’s next-generation vector computer (*2) is being developed to flexibly meet a wide range of price and performance needs. This data processing technology expands the capabilities of next-generation vector computers to include large-scale data analysis, such as machine learning, in addition to numerical calculation, the conventional specialty of vector computers.

NEC will introduce this technology on July 5 at the International Symposium on Parallel and Distributed Computing 2017 (ISPDC-2017) held in Innsbruck, Austria, from Monday, July 3 to Thursday, July 6. For more information on the ISPDC-2017, please visit the following link:http://ispdc2017.dps.uibk.ac.at/


*1)Spark is a distributed processing infrastructure developed by the Apache Software Foundation for open-source software that is used in clusters connecting multiple servers.
*2)NEC begins developing next-generation vector supercomputer

Source: NEC

The post NEC Accelerates Machine Learning for Vector Computers appeared first on HPCwire.

Atmospheric Data Solutions Taps PSSC Labs to Provide HPC Clusters for Weather Modeling

Mon, 07/03/2017 - 14:36

LAKE FOREST, Calif., July 3 — PSSC Labs, a developer of custom high performance computing (HPC) and big data computing solutions, today announced it is working with Atmospheric Data Solutions, LLC (ADS) to provide powerful, turn-key HPC Cluster solutions for its weather modeling solutions.

Atmospheric Data Solutions works with various public and private agencies, including major utility providers, to develop atmospheric science products that help mitigate and manage risk from severe weather and future climate change. The weather modeling solutions that ADS create include high impact weather forecast guidance products, tailored regional wildfire forecast guidance products, and utility load and outage forecasts – all requiring analysis of a large quantity of data that demands high performance computing to maximize accuracy and maximize the number of times models can be run daily.

PSSC Labs will work with ADS to provide powerful and customized supercomputing solutions for their weather modeling products, maximizing performance while staying within the budgetary constraints of each organization utilizing the end product. In addition to deploying PSSC Lab’s PowerWulf Clusters, ADS works with PSSC Labs to ensure the installation of custom modeling software on all HPC solutions, providing a truly turn key solution that is delivered ready to use.

The PowerWulf Cluster consists of 768 Intel Xeon Processor Cores, 4 Nvidia Tesla GPU Adapters, 2.1 TB System Memory, and 40TB+ Storage, all connected via Mellanox InfiniBand Interconnects, with additional configurations available. The PowerWulf Cluster includes PSSC Labs’ CBeST Cluster Management Toolkit to simplify the management, monitoring and maintenance. PSSC Labs will continue to support the HPC Cluster by providing operating system upgrades and continued system maintenance.

“PSSC Labs was accommodating every step of the way, whether it was finding the best hardware configuration within our client’s budget or allowing our own engineers on site to work on the clusters before delivery, “said Scott Capps, Principal and Founder of ADS. “The result is that our clients can now run models four times a day, as opposed to only twice a day with previous HPC set ups, with the results from the models delivered faster as well.”

PSSC Labs’ PowerWulf HPC Cluster offer a reliable, flexible, high performance computing platform for a variety of applications in the following verticals: Design & Engineering, Life Sciences, Physical Science, Financial Services and Machine/Deep Learning.

Every PowerWulf HPC Cluster includes a three-year unlimited phone / email support package (additional year support available) with all support provided by their US based team of experienced engineers. Prices for a custom built PowerWulf HPC Cluster solution start at $20,000.  For more information see http://www.pssclabs.com/solutions/hpc-cluster/

About PSSC Labs

For technology powered visionaries with a passion for challenging the status quo, PSSC Labs is the answer for hand-crafted HPC and Big Data computing solutions that deliver relentless performance with the absolute lowest total cost of ownership.  All products are designed and built at the company’s headquarters in Lake Forest, California. For more information, 949-380-7288, www.pssclabs.comsales@pssclabs.com.

Source: PSSC Labs

The post Atmospheric Data Solutions Taps PSSC Labs to Provide HPC Clusters for Weather Modeling appeared first on HPCwire.

‘Qudits’ Join the Strange Zoo of Quantum Computing

Mon, 07/03/2017 - 12:57

By now the sheer repetition of the term qubit has made it seem comprehensible and quantum computing not so strange. Brace yourself. Here comes the ‘qudit’ – another form of quantum information but one that is able to assume very many values at once.

“Instead of creating quantum computers based on qubits that can each adopt only two possible options, scientists have now developed a microchip that can generate “qudits” that can each assume 10 or more states, potentially opening up a new way to creating incredibly powerful quantum computers, a new study finds,” writes Charles Choi for the IEEE Spectrum.

Choi’s article, ‘Qudits: The Real Future of Quantum Computing?’ was posted last Friday and briefly examines work published at the same time in Nature, ‘On-chip generation of high-dimensional entangled quantum states and their coherent control,’ suggesting a way to create these multi-dimensional qudits.

Here’s brief excerpt from the IEEE Spectrum article:

“Now scientists have for the first time created a microchip that can generate two entangled qudits each with 10 states, for 100 dimensions total, more than what six entangled qubits could generate. “We have now achieved the compact and easy generation of high-dimensional quantum states,” says study co-lead author Michael Kues, a quantum optics researcher at Canada’s National Institute of Scientific Research, or INRS, its French acronym, in Varennes, Quebec.

“The researchers developed a photonic chip fabricated using techniques similar to ones used for integrated circuits. A laser fires pulses of light into a micro-ring resonator, a 270-micrometer-diameter circle etched onto silica glass, which in turn emits entangled pairs of photons. Each photon is in a superposition of 10 possible wavelengths or colors.

“For example, a high-dimensional photon can be red and yellow and green and blue, although the photons used here were in the infrared wavelength range,” Kues says. Specifically, one photon from each pair spanned wavelengths from 1534 to 1550 nanometers, while the other spanned from 1550 to 1566 nanometers.”

So just when your head stopped spinning at the sound of the word qubit, along comes the qudit. In fairness, the IEEE article points out scientists have long known about the possibility of using qudits and notes, “A quantum computer with 300 qubits could perform more calculations in an instant than there are atoms in the known universe, solving certain problems much faster than classical computers. In principle, a quantum computer with two 32-state qudits would be able to perform as many operations as 10 qubits while skipping the challenges inherent with working with 10 qubits together.”

The feature image is of the microchip fabricated by the researchers. Below is a diagram (Nature) of the work.

Researchers used the setup pictured above to create, manipulate, and detect qudits. The experiment starts when a laser fires pulses of light into a micro-ring resonator, which in turn emits entangled pairs of photons. Because the ring has multiple resonances, the photons have optical spectrums with a set of evenly spaced frequencies (red and blue peaks), a process known as spontaneous four-wave mixing (SFWM). The researchers were able to use each of the frequencies to encode information, which means the photons act as qudits. Each qudit is in a superposition of 10 possible states, extending the usual binary alphabet (0 and 1) of quantum bits. The researchers also showed they could perform basic gate operations on the qudits using optical filters and modulators, and then detect the results using single-photon counters.

Link to IEEE Spectrum article: http://spectrum.ieee.org/tech-talk/computing/hardware/qudits-the-real-future-of-quantum-computing

Link to Nature paper: https://www.nature.com/articles/nature22986.epdf?referrer_access_token=m2Cde8lf2Zh2R9vqdRitfdRgN0jAjWel9jnR3ZoTv0PJityhJkSWpq1THf-VSsArUhH5B2sAknySsan793cm3_eBBo9MOlyHeYxjGaqZnurhzcH7meLV3MMg5Q5-D4vlMlU-NCaRIE4XBnNREmU0z1WU8YYGcro3-m56ZnOv-djeJfdioz8743j4LAE5I8vkMm6oc8W8_hmdFSbxIjbVWNw4YvBWh0_Ct8hYflCuOY38KpBEFFTmoncxMDjN8a7vpt_r52ScoN43wj4CEhpr7A%3D%3D&tracking_referrer=spectrum.ieee.org

The post ‘Qudits’ Join the Strange Zoo of Quantum Computing appeared first on HPCwire.

Optimizing Codes for Heterogeneous HPC Clusters Using OpenACC

Mon, 07/03/2017 - 07:00

Looking at the Top500 and Green500 ranks, one clearly realizes that most HPC systems are heterogeneous architectures using COTS (Commercial Off-The-Shelf) hardware, combining traditional multi-core CPUs with massively parallel accelerators, such as GPUs and MICs.

With processor frequencies now hitting a solid wall, the only truly open avenue for riding Moore’s law today is increasing hardware parallelism in several different ways: more computing nodes, more processors in each node, more cores within each processor, and longer vector instructions in each core. This trend means that applications must learn to use all these levels of hardware parallelism efficiently if we want to see performance measured at the application level growing consistently with hardware performance. Adding to this complexity, single computing nodes adopt different architectures, with multi-core CPUs supporting different instruction-sets, vector lengths and caches organizations. Also GPUs provided by different vendors have different architectures in terms of number of cores, caches organization, etc. For code developers the current goal is to map all the parallelism available at application level onto all hardware resources using architecture-oblivious approaches targeting portability at both level of code and performance across different architectures.

Several programming languages and frameworks try to tackle the different levels of parallelism available in hardware systems, but most of them are not portable across different architectures. As an example, GPUs are largely used for scientific HPC applications because a reasonable compromise of easy programmability and performance has been made possible by ad-hoc proprietary languages (e.g., CUDA for Nvidia GPUs), but these languages are by definition not portable to different accelerators. Several open-standard languages have tried to address this problem (e.g., OpenCL), targeting in principle multiple architectures, but the lack of support from various vendors has limited their usefulness.

The need to exploit the computing power of these systems in conjunction with the lack of standardization in their hardware and/or programming frameworks raised new issues for software development strongly impacting software maintainability, portability and performance. The use of proprietary languages targeting specific architectures, or open-standard languages not embraced by all vendors, often led to multiple implementations of the same code to target different architectures. For this reason there are several implementations for various scientific codes, e.g., MPI plus OpenMP and C/C++ to target CPU based clusters; MPI plus CUDA to target Nvidia GPU based clusters; or MPI plus OpenCL for AMD GPU based clusters.

The developers who pursued this strategy soon realized that maintaining multiple versions of the same code is very expensive. This is even worst for scientific software development, since it is often characterized by frequent code modifications, by the need of a strong optimization from the performance point of view, and also by a long software lifetime, which may span tens of years. Ideally, a programming language for scientific HPC applications should be portable  across most of the current architectures, allow applications to run efficiently, and moreover it should enable to run on future architecture without requiring a complete code rewrite.

Directives based programming models try to address exactly this problem, abstracting parallel programming to a descriptive level, where programmers help the compiler to identify parallelism in the code, as opposite to a prescriptive level, where programmers must specify how the code should be mapped onto the hardware of the target machine.

OpenMP (Open Multi-Processing) is probably the most common of such programming models, already used by a wide scientific community, but initially it was not designed to support accelerators. To fill this gap, in  November 2011, a new standard named OpenACC (Open Accelerators) was proposed by Cray, PGI, Nvidia, and CAPS. OpenACC is a programming standard for parallel computing allowing programmers to annotate C, C++ or Fortran codes to suggest to the compiler parallelizable regions to be offloaded to a generic accelerator.

Both OpenMP and OpenACC are based on directives: OpenMP was introduced to manage parallelism on traditional multi-core CPUs, while OpenACC was initially developed trying to fulfill the missing accelerators support in OpenMP. Today these two frameworks are converging and extending their scope to cover a large subset of HPC architectures: OpenMP version 4.0 has been designed to support also code offloading to accelerators, while compilers supporting OpenACC (such as PGI or GCC) are starting to use the same directives to target also multi-core CPUs.

“First as a member of the Cray technical staff and now as a member of the Nvidia technical staff, I am working to ensure that OpenMP and OpenACC move towards parity whenever possible,”  said James Beyer, Co-chair OpenMP accelerator sub-committee and OpenACC technical committee.

Back in 2014 our research group at the University of Ferrara in collaboration with the Theoretical Physics group of the University of Pisa, started the development of a Lattice QCD Monte Carlo application, aiming to make it portable onto different heterogeneous HPC systems. This kind of simulation, from the computational point of view, executes mainly stencil operations performing complex vector-matrix multiplications on a 4-dimensional lattice.

At the time we were using two different versions developed within the Pisa group: a C++ implementation targeting CPU based clusters and a C++/CUDA implementation targeting Nvidia GPU based clusters. Maintaining the two different versions was particularly expensive, so the availability of a language such as OpenACC offered the interesting possibility to move towards a single portable implementation. The main interest was towards GPU based clusters, but we also aimed to target other architectures like the Intel Knights Landing (KNL, not available yet at the time).

We started this project coming from an earlier experience of porting a similar application to OpenCL, which although being an open-standard, ceased later to be supported on Nvidia GPUs, forcing us to completely rewrite the application. From this point of view a directive-based OpenACC code provides some additional amount of safeguard, as, when ignoring directives, it is still a perfectly working plain C, C++ or Fortran code, which can be “easily” re-annotated using other directives and run on other architectures.

Although decorating a code with directives seems a straightforward operation requiring minimal programming efforts, this is often not enough if performance portability is required in addition to just code portability.

Just to mention one issue, memory data layout has a strong impact on performances with different architectures and this design step is critical in implementing of new codes, as changing data layout at a later stage is seldom a viable option. The two C++ and CUDA versions we were starting from diverged exactly in the data-layout used to store the lattice: we had an AoS (Array of Structure) structure for the CPU-optimized version and an SoA (Structure of Array) layout for GPUs.

We started porting the computationally more intensive kernel of the full code, the so-called Dirac Operator, to plain C, annotating it with OpenACC directives, and developed a first benchmark. This benchmark was used to evaluate possible performance drawbacks associated to an architecture-agnostic implementation. It provided very useful information on the performance impact of different data layouts; we were happy to learn that the Structure of Arrays (SoA) memory data layout is preferred when using GPUs, but also when using modern CPUs, if vectorization is enforced. This stems from the fact that the SoA format allows vector units to process many sites of the application domain (the lattice, in our case) in parallel, favoring architectures with long vector units (e.g. with wide SIMD instructions). Modern CPUs tend to have longer and longer vector units and we expect this trend to continue in the future. For this reason, data structures related to the lattice in our code were designed to follow the SoA paradigm.

Since at that time no OpenACC compiler for CPU was able to use vector instructions, we replaced OpenACC directives with OpenMP ones and compiled the code using the Intel Compiler. Table 1 shows the results of this benchmark.

After this initial benchmark, further development iterations led to a full implementation of the complete Monte Carlo code annotated with OpenACC directives and portable across several architectures. To give an idea of the level of performance portability, we report in Table 2 the execution times of the Dirac operator, compiled by the PGI 16.10 compiler (which now also targets multi-core CPUs) on a variety of architectures: Haswell and Broadwell Intel CPUs, the W9100 AMD GPU and Kepler and Pascal Nvidia GPUs.

Concerning code portability, we have shown that the same user-grade code implementation runs  on an interesting variety of state-of-the-art architectures. As we focus on  performance portability, some issues are still present. The Dirac operator is strongly memory-bound, so both Intel CPUs should be roughly three times slower than Kepler GPUs, corresponding to their respective memory  bandwidths (about 70GB/s vs. 240GB/s); what we measure is that  performance is approximately 10 times worse on  the Haswell CPU than on one K80 GPU. The Broadwell CPU runs approximately two times faster than the Haswell CPU, at least for some lattice sizes, but still does not reach the memory-limit. We have identified two main reasons for this non-optimal behavior, and both of them point to some still immature features of the PGI compiler when targeting x86 architectures:

  • Parallelization: when encountering nested loops, the compiler splits the outer-loop across different threads, while inner loops are executed serially or vectorized within each thread. Thus, in this implementation, the 4-nested loops over the 4 lattice dimensions cannot be efficiently divided in a sufficiently large number of threads to exploit all the available cores of modern CPUs.
  • Vectorization: as reported by the compilation logs, the compiler fails to vectorize the Dirac operator. To verify if this is related to how we have coded these functions, we have translated the OpenACC directives into the corresponding OpenMP ones, without changing the C code, and compiled using the Intel compiler (version 17.0.1). In this case the compiler succeeds in vectorizing the function, running a factor 2 faster.

Also concerning the AMD GPUs, performance is worse than expected and the compiler is not yet sufficiently stable (we had erratic compiler crashes). To make things even worse, we found that the support for this architecture has been dropped by  the PGI compiler (16.10 is the last version supporting AMD devices) and thus if no other compilers appear in the market, running OpenACC applications on AMD GPUs will not be easy in the future.

On the other hand, for Nvidia GPUs, performance results are similar to the ones obtainable by our previous CUDA implementation, showing a maximum performance drop of 25 percent for the full simulation code, only in some particular simulation conditions.

In conclusion, a portable implementation of a full Monte Carlo LQCD simulation is now in production on CPU and GPU clusters. The code runs efficiently on Nvidia GPUs, while performance on Intel CPUs could still be improved. We are confident that future releases of the PGI compiler will be able to fill the gap. Finally, we are able to run also on AMD GPUs, but for this architecture compiler support is an open issue with little hope for the future. In the near future we look forward to testing our code on the Intel KNL, as soon as a reasonably stable official PGI support for that processor becomes available. As a final remark we have shown that translating OpenACC codes to OpenMP and vice-versa is a reasonably easy task, so, whichever the winner, we see a nice future for our application.


Claudio Bonati, INFN and University of Pisa
Simone Coscetti, INFN Pisa
Massimo D’Elia, INFN and University of Pisa
Michele Mesiti, INFN and University of Pisa
Francesco Negro, INFN Pisa
Enrico Calore, INFN and University of Ferrara
Sebastiano Fabio Schifano, INFN and University of Ferrara
Giorgio Silvi, INFN and University of Ferrara
Raffaele Tripiccione, INFN and University of Ferrara

The post Optimizing Codes for Heterogeneous HPC Clusters Using OpenACC appeared first on HPCwire.