HPC Wire

Subscribe to HPC Wire feed
Since 1987 - Covering the Fastest Computers in the World and the People Who Run Them
Updated: 2 hours 37 min ago

Exascale FY18 Budget – The Senate Provides Their Input

Thu, 07/27/2017 - 09:39

In the federal budgeting world, “regular order” is a meaningful term that is fondly remembered by members of both the Congress and the Executive Branch. Regular order is the established process whereby an Administration submits a budget request in February. Then in about the April timeframe, the House and Senate Appropriations Sub-Committees and then full Committees provide their own budgets. After that, both the full House and full Senate pass their respective versions, which leads to a conference committee that reconciles the differences. Then the compromised appropriations bill is passed by both Houses of Congress and sent to the President to be signed into law. All of this is supposed to happen by October 1st for the start of the new federal fiscal year.

The reason that regular order is fondly remembered is because it has been many years since the process was implemented as designed. Generally, the first two steps occur, but the rest of the process falls by the wayside. The first step is the President’s request is submitted to Congress and for FY18 that occurred on May 23rd. For the Department of Energy, the House completed the first part of the second step when it passed its appropriations bill on July 12th. The Senate Appropriations Committee completed its work on July 20th for the last part of the second step. Unfortunately, this is probably the end of the “regular order” process, but more on that later.

Just like the President’s request and House markup, the news from the Senate for the U.S. exascale efforts is quite encouraging. The most straightforward news is about the National Nuclear Security Agency (NNSA) exascale activities. The Senate provided $161 million to the Advanced Simulation and Computing (ASC) program and $22 million in infrastructure for a total of $193 million to NNSA for exascale. These numbers are the same as both the President’s request and the House mark-up.

The situation for the Office of Science (SC) Advanced Scientific Computing Research (ASCR) program continues to be complicated. The Senate provided $184 million for the ASCR Exascale Computing Project (ECP). This is a bit lower than the President’s request ($197 million) but higher than the House Appropriations Committee’s mark-up ($170 million). The Senate provided a big boost for both the Oak Ridge and Argonne Leadership Computing Facilities. The appropriations for Oak Ridge was $150 million and for Argonne was $100 million for a total of $250 million. This is significantly higher than the $150 million requested by the President and provided by the House. Finally, the Senate language did not mention the mysterious language that appeared in the request about a possible advanced architecture exascale system at Argonne.

Bottom line — based on the publicly available information, it looks like U.S. is on track to invest somewhere between $509 million (President’s request), $503 million (House mark-up), and $617 million (Senate mark-up) in FY18 in exascale activities. That is great news for the U.S. and its exascale programs. Having such positive numbers from the President, House, and Senate leaves the NNSA and SC programs in a very good position to actually see that level of funding in FY18.

Unfortunately, for FY18 having such relatively uniform numbers between President’s, House’s and Senate’s budgets is the exception rather than the rule. Most of the rest of the Senate Appropriations Committee mark-up of the DOE budget is in wild contrast to the House and President’s request. A great example is the funding for the Advanced Research Project Agency – Energy (ARPA-E). In his request, the President proposed shutting down ARPA-E with zero funding for FY18. The House agreed. However, the Senate Appropriations Committee gave ARPA-E $330 million with explicit directions to keep the agency going. There are plenty of other examples of a huge divergence between the President’s, House’s, and Senate’s view of the FY18 budget. The numbers for the DOE Office of Energy Efficiency and Renewable Energy, the budgets are almost a billion dollars apart.

So — what does this mean for the DOE FY18 federal budget? First, it is extremely likely “regular order” is out the window. The process requires a conference committee to make compromises between the House and Senate numbers. When the numbers are so far apart, it is difficult to see how that will be possible. This leads to the question of, how can the budget move forward? Over the last few years, Congress and the Administration have used two mechanisms to bypass the regular budget process. One is known as a Continuing Resolution (CR). Basically, a CR is Congress’ way of admitting they cannot pass a budget and allows the Executive Branch to continue to spend funding at the previous year’s levels. CRs are challenging for Executive Branch agencies because they are not allowed to start new programs. Also, generally, due to an abundance of caution, agencies will only spend funding at the lowest of the three marks (President’s request, House, or Senate). This is particularly difficult, when the lowest of the marks is zero.

After a CR, the second mechanism is known as an omnibus appropriation. This type of appropriations is a very large “catch all” bill that is drafted by the House and Senate. Usually, omnibus bills appear after one or two CRs when it becomes clear that it will be impossible to pass a budget. Omnibus appropriations are used on an emergency basis, under the threat of a government shutdown and the details usually get little or no debate. However, given the challenges of the FY18 budget, there is already some discussion of jumping ahead right to Congress passing a series of mini-omnibuses (aka minibuses).

For exascale, whether it is regular order, a CR, or an omnibus, things are looking very good. Maintaining and extending U.S. leadership for exascale computing is clearly an issue that all parties agree deserves robust funding. Prospects for FY18 look very good. Of course, the DOE challenge then turns from getting funding to using it well.

About the Author

Alex Larzelere is a senior fellow at the U.S. Council on Competitiveness, the president of Larzelere & Associates Consulting and HPCwire’s policy editor. He is currently a technologist, speaker and author on a number of disruptive technologies that include: advanced modeling and simulation; high performance computing; artificial intelligence; the Internet of Things; and additive manufacturing. Alex’s career has included time in federal service (working closely with DOE national labs), private industry, and as founder of a small business. Throughout that time, he led programs that implemented the use of cutting edge advanced computing technologies to enable high resolution, multi-physics simulations of complex physical systems. Alex is the author of “Delivering Insight: The History of the Accelerated Strategic Computing Initiative (ASCI).”

The post Exascale FY18 Budget – The Senate Provides Their Input appeared first on HPCwire.

Cray to Expand Storage Portfolio Through Strategic Transaction and Partnership with Seagate

Wed, 07/26/2017 - 17:26

SEATTLE, July 26, 2017 — Global supercomputer leader Cray Inc. today announced it has signed a definitive agreement with Seagate to complete a strategic transaction and enter into a partnership centered around the ClusterStor high-performance storage business. As part of the partnership, Seagate and Cray will collaborate on future ClusterStor products. Cray will continue to support and enhance the ClusterStor product line and to support customers going forward. Cray expects to add more than 100 Seagate employees and contractors and to assume certain customer support obligations associated with the ClusterStor product line.

“This partnership will be an exciting next step for Cray as we broaden our portfolio and expand our ability to create a leadership position in high performance storage products,” said Peter Ungaro, president and CEO of Cray. “Building upon our long-term strategy and the amazing growth of data, storage is becoming more important in our market. With the push to exascale computing and the explosive growth in artificial intelligence, deep learning and analytics, the ability to integrate compute and storage into supercomputing systems is more critical than ever.”

“Adding Seagate’s ClusterStor product line to our DataWarp and Sonexion storage products will enable us to provide a more complete solution to customers,” continued Peter Ungaro. “Current ClusterStor customers and partners can be assured that we will continue to advance and support the ClusterStor products. In addition, I look forward to welcoming our new Cray employees along with the ClusterStor partners and reseller channel – strengthening our strategic positioning for growth into the future.”

“In 2012 Cray became our first OEM and has continued over the years to be our largest and most strategic ClusterStor partner. Today’s announcement is really the perfect evolution of that continuing, special partnership in HPC,” said Ken Claffey, vice president and general manager, Storage Systems Group at Seagate. “As the leader in the supercomputing market, Cray will be a great home for the ClusterStor, employees, customers and partners.”

Highlights of the transaction and partnership include:

  • Seagate’s existing ClusterStor Lustre, Secure Data Appliance and Spectrum Scale customers will be able to receive service and support from Cray.
  • Cray expects to add more than 100 employees, primarily in R&D, customer service and channel and reseller support.
  • Cray and Seagate will collaborate to incorporate the latest Seagate technology into future ClusterStor and Cray Sonexion products.
  • Cray will receive certain assistance from Seagate to provide support on existing service contracts.
  • The transaction closing is expected to be completed late in the third quarter of 2017.
  • Cray expects the net impact of the transaction for Cray to be in the range of breakeven for 2018.
  • Cray will provide further detail on its second quarter 2017 financial results conference call on July 27, 2017.  Instructions for joining the conference call can be found below.

Conference Call Information

Cray will host a conference call on Thursday, July 27, 2017, at 1:30 p.m. PDT (4:30 p.m. EDT) to discuss its second quarter ended June 30, 2017 financial results.  To access the call, please dial into the conference at least 10 minutes prior to the beginning of the call at (855) 894-4205. International callers should dial (765) 889-6838 and use the conference ID #56308197.  To listen to the audio webcast, go to the Investors section of the Cray website at www.cray.com/company/investors.

About Cray Inc.

Global supercomputing leader Cray Inc. (Nasdaq:CRAY) provides innovative systems and solutions enabling scientists and engineers in industry, academia and government to meet existing and future simulation and analytics challenges. Leveraging more than 40 years of experience in developing and servicing the world’s most advanced supercomputers, Cray offers a comprehensive portfolio of supercomputers and big data storage and analytics solutions delivering unrivaled performance, efficiency and scalability. Cray’s Adaptive Supercomputing vision is focused on delivering innovative next-generation products that integrate diverse processing technologies into a unified architecture, allowing customers to meet the market’s continued demand for realized performance. Go to www.cray.com for more information.

Source: Cray Inc.

The post Cray to Expand Storage Portfolio Through Strategic Transaction and Partnership with Seagate appeared first on HPCwire.

India Plots Three-Phase Indigenous Supercomputing Strategy

Wed, 07/26/2017 - 17:13

Additional details on India’s plans to stand up an indigenous supercomputer came to light earlier this week. As reported in the Indian press, the Rs 4,500-crore (~$675 million) supercomputing project, approved by the Indian government in March 2015, is preparing to install six machines, ranging from 500 teraflops to 2 petaflops, by year end. Three of these will be completely foreign-sourced and three will begin incorporating Indian design elements in preparation for a fully made-in-India supercomputer.

Under the leadership of prime minister Narendra Modi and within the auspices of the “Make in India” initiative, at least fifty new supercomputers will be built over three phases of a seven-year program. This is all part of India’s National Supercomputing Mission (NSM) to create a grid of supercomputers connecting academic and research institutions across the country. Rajat Moona, director-general of the Centre for Development of Advanced Computing (C-DAC), has said that at least 50 percent of the supercomputers will be Indian-made.

According to Milind Kulkarni, a senior scientist with the ministry of science and technology cited by Hindustan Times, six supercomputers will be built in the first phase with three of these to be foreign built. Moving toward India’s goal of indigenous supercomputing, the remaining three will be manufactured abroad, but assembled in India with C-DAC handling the overall system design.

The six supercomputers are destined for four technology institutes (Banaras Hindu University, Kanpur, Kharagpur and Hyderabad) as well as the Indian Institute of Science Education and Research in Pune and the Indian Institute of Science in Bengaluru. Two of the systems are on track for a peak computing capacity of two petaflops with the remainder being smaller machines of approximately 500 teraflops.

During the first two phases of the National Supercomputing Mission, India will concentrate its efforts on designing and manufacturing subsystems such as high-speed switches and compute nodes. The goal of the third phase will be to deploy a machine that is completely Indian-made or very close to it.

India’s no stranger to homegrown supercomputing. In 1991, C-DAC stood up India’s first indigenous supercomputer, the PARAM 8000. Much like China’s recent indigenous HPC surge, the PARAM program for Indian-made supercomputers was created in the face of a technology embargo enacted by the United States. The first PARAM system was benchmarked at 5 Gflops, making it one of the fastest systems in the world at the time.

India’s supercomputing prowess has waxed and waned in the following decades. The nation achieved its highest Top500 ranking (#4) with “Eka” in 2007 and reached its highest system share in 2013 with twelve machines. But as of the latest list installment (June 2017), this count had fallen to four. The most performant of these, at #165, is the 901-teraflops Cray XC40, installed at the at the Supercomputing Education and Research Center (SERC) at the Indian Institute of Science (IISc) in Bangalore.

The post India Plots Three-Phase Indigenous Supercomputing Strategy appeared first on HPCwire.

NIH Awards $9.3M for Further Development of PHENIX Structural Biology Software

Wed, 07/26/2017 - 14:43

July 26, 2017 — The National Institutes of Health (NIH) has awarded $9.3 million to the Department of Energy’s Lawrence Berkeley National Laboratory (Berkeley Lab) to support ongoing development of PHENIX, a software suite for solving three-dimensional macromolecular structures.

Pavel Afonine in Berkeley Lab’s Molecular Physics and Integrated Bioimaging Division created a PHENIX tool to calculate difference maps in real space from cryo-EM data. This figure shows such a map for one adenosine triphosphate molecule in a complex called the 26S proteasome. (Credit: Pavel Afonine/Berkeley Lab)

PHENIX, which stands for Python-based Hierarchical ENvironment for Integrated Xtallography, uses modern programming concepts and advanced algorithms to automate the analysis of structural biology data. The grant is awarded through the National Institute of General Medical Sciences at NIH.

Officially launched in 2000 with NIH funding, the project is a collaboration among researchers based at Berkeley Lab, Los Alamos National Laboratory, Cambridge University, and Duke University.

“The impetus behind PHENIX is a desire to make the computational aspects of crystallography more automated, reducing human error and speeding solutions,” said PHENIX principal investigator Paul Adams, director of Berkeley Lab’s Molecular Biophysics and Integrated Bioimaging Division.

Knowing the precise location of each atom in a molecule in three dimensions is critical to understanding how the molecule functions and interacts with other molecules. The shapes of the proteins in a cell, for example, reveal a lot about the work they do—whether it’s providing structural support or motility, forming selectively permeable membrane channels, or powering the cell’s metabolism. DNA sequence alone still doesn’t give enough information about how a molecule folds, or how it changes conformation.

The software is openly and freely available for academic users. Over the years, the international community of crystallographers has contributed to PHENIX’s development through the open source Computational Crystallography Toolbox.

In the past 15 years, Adams noted, “the level of automation and computation algorithms that we’ve developed has really changed the way people do a lot of structural biology work.”

Recent improvements to PHENIX in the area of experimental phasing have enabled researchers to make use of noisier, lower resolution data that previously would have been discarded.

One major development in the field of structural biology over the past five years has been the ascendance of cryo-electron microscopy (cryo-EM), where a molecule is rapidly cooled and inserted into an electron microscope. The technology owes its current popularity to advances in electron detector technology, including a direct electron detector developed by Peter Denes and colleagues at the National Center for Electron Microscopy at Berkeley Lab’s Molecular Foundry, a DOE Office of Science User Facility.

“Even before this revolution, as they’re calling it, took place we thought that cryo-EM would be an important area for us to get involved in,” Adams said.

He added that half of the work proposed under the new grant will be developing new methods for building, refining, and validating models in cryo-EM. The other half, he said, will be continuing the themes of extending crystallographic methods to work at lower resolution and with weaker data, and making it possible to solve molecular structures when there’s no similar model in the database.

“This new cryo-EM approach can produce pictures that are nearly as clear as those from X-ray diffraction, but often with less work,” said Tom Terwilliger, whose group in the Bioscience Division at Los Alamos National Laboratory is a partner in the collaborative PHENIX project. “Our new tools will make it easier for researchers to interpret their pictures from cryo-EM, hopefully making this new technique as routine as X-ray crystallography is today.”

Adams noted that this kind of multi-institutional, multi-disciplinary, technology development is something at which the national labs excel.

“We are very grateful to NIH for recognizing the value of the project and graciously agreeing to continue funding it,” he said. “Their commitment has been long-term; it’s been very impressive and has had a big impact on the research community.”

Other Berkeley Lab Biosciences MBIB collaborators on PHENIX include: Pavel Afonine, Dorothee Liebschner, Nigel Moriarty, Billy Poon, and Oleg Sobolev.

About Berkeley Lab

Lawrence Berkeley National Laboratory addresses the world’s most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab’s scientific expertise has been recognized with 13 Nobel Prizes. The University of California manages Berkeley Lab for the U.S. Department of Energy’s Office of Science. For more, visit www.lbl.gov.

DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.

Source: Berkeley Lab

The post NIH Awards $9.3M for Further Development of PHENIX Structural Biology Software appeared first on HPCwire.

Tuning InfiniBand Interconnects Using Congestion Control

Wed, 07/26/2017 - 14:42

InfiniBand is among the most common and well-known cluster interconnect technologies. However, the complexities of an InfiniBand (IB) network can frustrate the most experienced cluster administrators. Maintaining a balanced fabric topology, dealing with underperforming hosts and links, and chasing potential improvements keeps all of us on our toes. Sometimes, though, a little research and experimentation can find unexpected performance and stability gains.

For example, consider a 1,300-node cluster using Intel TrueScale IB for job communication and a Panasas ActiveStor filesystem for storage. Panasas only communicates to clients via Ethernet and not IB, so a group of Mellanox switches act as gateways from the Panasas Ethernet to the TrueScale IB.

Every system has bottlenecks; in our case, the links to and from these IB/Ethernet gateways showed congestion due to the large amount of disk traffic. This adversely affects the whole cluster — jobs can’t get the data they need, and the increased congestion interferes with other IB traffic as well.

Fortunately, InfiniBand provides a congestion control mechanism that can help mitigate the effects of severe congestion on the fabric. We were able to implement this feature to save the expense and trouble of adding additional IB/Ethernet gateways.

What Is InfiniBand Congestion Control?

InfiniBand is intended to be a lossless fabric. IB switches won’t drop packets for flow control unless they absolutely have to, usually in cases of hardware failure or malformed packets. Instead of dropping packets and retransmitting, like Ethernet does, InfiniBand uses a system of credits to perform flow control.

Communication occurs between IB endpoints, which in turn are issued credits based on the amount of buffer space the receiving device has. If the credit cost of the data to be transmitted is less than the credits remaining on the receiving device, the data is transmitted. Otherwise, the transmitting device holds on to the data until the receiving device has sufficient credits free.

This method of flow control works well for normal loads on well-balanced, non-oversubscribed IB fabrics. However, if the fabric is unbalanced or oversubscribed or just heavily loaded, some links may be oversaturated with traffic beyond the ability of the credit mechanism to help.

Congestion can be observed by checking the IB error counters. When an IB device attempts to transmit data but the receiving device cannot receive data due to congestion, the PortXmitWait counter is incremented. If the congestion is so bad that the data cannot be transmitted before the time-to-live on the packet expires, the packet is discarded and the PortXmitDiscards counter is incremented. If you’re seeing high values of PortXmitWait and PortXmitDiscards counters, enabling congestion control may help manage congestion on your InfiniBand fabric.

How Does InfiniBand Congestion Control Work?

When an IB switch detects congestion on a link, it enables a special bit, called the Forward Explicit Congestion Notification (FECN) bit, which informs the destination device that congestion has been detected on the link. When the destination receives a packet marked with the FECN bit, the destination device notifies the sending device of the congestion via a Backwards Explicit Congestion Notification bit (BECN.)

When the source receives the BECN bit notification from the destination, the sending (source) device begins to throttle the amount of data it sends to the destination. The mechanism it uses is the credits system – by reducing the credits available to the destination, the size and rate of the packets are effectively decreased. The sending device may also add a delay between packets to provide the destination device time to catch up on data.

Over time, the source device increases credits for the destination device, gradually increasing the amount of packets sent. If the destination device continues to receive FECN packets from its switch, it again transmits BECN packets to the source device and the throttling is increased again. Without the reception of BECN packets from the destination device, the source device eventually returns to normal packet transmission. This balancing act is managed by congestion control parameters which require tuning for each environment.

After enabling InfiniBand congestion control and proper tuning, we realized a 15 percent improvement in our Panasas file system benchmark testing. PortXmitDiscards counters were completely clear, and PortXmitWait counters were significantly smaller, indicating that congestion control was doing its job.

Given that no additional hardware or other costs were required to achieve these results, a speed increase of 15 percent plus increased stability of the IB fabric was a nice result.

How Can I Enable InfiniBand Congestion Control?

Congestion control must be enabled on all IB devices and hosts, as well as on the IB subnet manager. This process includes turning on congestion control and setting a congestion control key on each device, as well as tuning the congestion control tables and parameters on each host and switch.

After congestion control is enabled on each IB device, the OpenSM configuration file must be modified to tune the subnet manager’s congestion control manager. Please note that mistuned parameters will either wreak havoc on a fabric or be completely ineffectual, so be careful – and do plenty of testing on a safe “test” system. Never attempt this on a live or production system.

Enabling InfiniBand congestion control had an immediate positive effect on our IB fabric. If you are suffering from issues with fabric congestion, enabling congestion control may provide the similar relief for your fabric as well, without the cost of adding additional hardware.

About the Author

Adam Dorsey is a systems administrator and site lead for RedLine Performance Solutions.

The post Tuning InfiniBand Interconnects Using Congestion Control appeared first on HPCwire.

Velocity Micro Announces a New Line of Workstation and HPC Desktops

Wed, 07/26/2017 - 14:40

RICHMOND, Va., July 27, 2017 — Velocity Micro, the premier builder of award winning enthusiast desktops, laptops, high performance computing solutions, and professional workstations announces the availability of a suite of new workstation offerings designed for CAD design, 3D rendering, multimedia creation, and scientific calculation. Headlining the refreshed category are an AMD Epyc workstation with up to 64 physical cores, new Intel Xeon Scalable Processor offerings, and Threadripper, AMD’s newest Ryzen processor now available for pre-order. Select systems from the new offerings will also be available for display and demonstration at Siggraph August 1-3 at the Los Angeles Convention Center.

“Our heritage is deeply rooted in building ultra-performance workstation PCs, extending back to when I founded the company twenty years ago,” said Randy Copeland, President and CEO of Velocity Micro. “By refreshing these offerings and making AMD’s Threadripper processors available for preorder, we’re reasserting our commitment to this category and this customer with the most complete line of custom workstation solutions available anywhere.”

Velocity Micro’s new workstations are designed and optimized for CAD design, advanced 3D renders, multimedia creation, and scientific calculation using software such as Adobe Creative Cloud, 3DS Max, AutoCAD, Solidworks, Revit, Maya, Houdini, and many others. A few options include:

·         ProMagix HD80a – A single CPU workstation now powered by AMD Ryzen Threadripper 1950X with 16 cores (32 threads). Starting at $2,899, now available for pre-order.

·         ProMagix HD360a – A dual CPU workstation form factor featuring AMD Epyc processors for up to 32 cores (64 threads) per chip. The only of its kind.

·         ProMagix HD360i – A dual CPU workstation powered by Intel’s new Xeon Scalable processors (formerly Purley) with up to 28 cores (56 threads) per chip. Starting at $2,599.

·         ProMagix G480 – Specifically designed for single or dual precision GPU computing with up to 70 Tflops of processing power via 8x GPUs. Starting at $7,495.

·         ProMagix Smallblock – This small form factor render station powered by AMD Ryzen is optimized for multithreaded applications. Starting at $1,199.

All Velocity Micro desktops ship from and are supported in Richmond, VA with no preinstalled bloatware. To custom configure an award-winning desktop or to learn more, visit VelocityMicro.com, call 888-300-4450, or visit Siggraph booth #205 August 1-3.

Images: https://app.box.com/s/hpr2zkzth55pqksp7dsqnyas8buhrc20

About Velocity Micro

Velocity Micro is the premier high-performance personal computer provider in North America. Founded in 1992, Richmond, VA-based Velocity Micro custom builds award winning gaming, mobile, multimedia, small business, custom workstation, and visual supercomputers. Velocity Micro products are currently available in retail from Newegg.comand Amazon.com. For more information, please call (888) 300-4450 or visit www.VelocityMicro.com

Source: Velocity Micro

The post Velocity Micro Announces a New Line of Workstation and HPC Desktops appeared first on HPCwire.

Information Scientist Herbert Van de Sompel to Receive Paul Evan Peters Award

Wed, 07/26/2017 - 10:46

LOS ALAMOS, N.M., July 26, 2017 — Herbert Van de Sompel, research scientist at the Research Library of the Los Alamos National Laboratory, has been named the 2017 recipient of the Paul Evan Peters Award from the Coalition for Networked Information (CNI), the Association of Research Libraries, and EDUCAUSE. The award recognizes notable, lasting achievements in the creation and innovative use of network-based information resources and services that advance scholarship and intellectual productivity.

“For the last two decades Herbert, working with a range of collaborators, has made a sustained series of key contributions that have helped shape the current networked infrastructure to support scholarship,” noted CNI executive director Clifford Lynch. “While many people accomplish one really important thing in their careers, I am struck by the breadth and scope of his contributions.” Lynch added, “I’ve had the privilege of working with Herbert on several of these initiatives over the years, and I was honored in 2000 to be invited to serve as a special external member of the PhD committee at the University of Ghent, where he received his doctorate.”

Nominated by over a dozen highly respected members of the information science community, Van de Sompel is widely recognized as having created robust, scalable infrastructures that have had a profound and lasting impact on scholarly communication. The application of some of his groundbreaking work has become an integral part of the core technology infrastructure for thousands of libraries worldwide, helping to connect information across the Internet, and constantly working to further his dream of “a scholarly communication system that fully embraces the Web.”

An accomplished researcher and information scientist, Van de Sompel is perhaps best known for his role in the development of protocols designed to expose data and make them accessible to other systems, forging links that connect related information, thereby enhancing, facilitating, and deepening the research process. These initiatives include the OpenURL framework (stemming from his earlier work on the SFX link resolver), as well as the Open Archives Initiative (OAI), which included the Protocol for Metadata Harvesting (OAI-PMH) and the Object Reuse and Exchange (OAI-ORE)

scheme. Other notable contributions include the Memento protocol, which enables browsers to access earlier versions of the Web easily, and ResourceSync, which allows applications to remain synchronized with evolving content collections.

“I applaud Van de Sompel’s milestones in developing robust scalable digital infrastructure for the world of research which is the crucial underpinning for a future that could promote unfettered access for all to the entire scholarly corpus,” stated Elliott Shore, Association of Research Libraries executive director.

Van de Sompel was hired by his alma mater, Ghent University (Belgium), in 1981 to begin library automation. Over time, the focus shifted to providing access to a wide variety of scholarly information sources leveraging the technologies of the day to reach the largest possible end-user base, and by the late 1990s, the work of his team was considered among the best in Europe. In 2000 he received a PhD from Ghent University, working on context-sensitive linking, which led to the OpenURL standard and library linking servers. Following stints at Cornell University and at the British Library, in 2002 he joined Los Alamos as an information scientist, where he now leads the Prototyping Team at the Research Library. He also serves as visiting professor at the DANS data archive in the Netherlands.

Widely sought after for advisory boards and panels, Van de Sompel served as a member of the European Union High Level Expert Group on Scientific Data, as well as the Core Experts Group for the Europeana Thematic Network, charged with building a digital repository of European cultural assets. He has won numerous awards, including the Los Alamos National Laboratory Fellows Prize for Outstanding Research (2015) and the SPARC Innovator Award (2006) by the Scholarly Publishing and Academic Resources Coalition (SPARC), of which he was the first recipient.

A four-member committee selected Van de Sompel for the award: Jeffrey MacKie-Mason, university librarian and chief digital scholarship officer at the University of California, Berkeley; Marilyn McMillan, (retired), former vice president for IT and chief IT officer at New York University; Winston Tabb, Sheridan dean of university libraries and museums at Johns Hopkins University; and Joan Lippincott, associate executive director of the Coalition for Networked Information.

Named for CNI’s founding director, the award will be presented during the CNI membership meeting in Washington, DC, to be held December 11–12, 2017, where Van de Sompel will deliver the Paul Evan Peters Memorial Lecture. The talk will be recorded and made available on CNI’s YouTube and Vimeo channels after the meeting concludes. Previous award recipients include Donald A.B. Lindberg (2014), Christine L. Borgman (2011), Daniel Atkins (2008), Paul Ginsparg (2006), Brewster Kahle (2004), Vinton Cerf (2002), and Tim Berners-Lee (2000).

For more information, visit the award website at www.cni.org/go/pep-award/, or contact the CNI communications coordinator at diane@cni.org.

About Los Alamos National Laboratory (www.lanl.gov)

Los Alamos National Laboratory, a multidisciplinary research institution engaged in strategic science on behalf of national security, is operated by Los Alamos National Security, LLC, a team composed of Bechtel National, the University of California, BWX Technologies, Inc. and URS Corporation for the Department of Energy’s National Nuclear Security Administration.

Source: LANL

The post Information Scientist Herbert Van de Sompel to Receive Paul Evan Peters Award appeared first on HPCwire.

Pixit Media Expands North American Team

Wed, 07/26/2017 - 08:30

July 26, 2017 — Pixit Media, the leader in Software Defined Storage for media workflows, today announce the appointment of John Aiken, Laurent Lacore and Adam Hansler to Pixit’s new North American team.

John Aiken takes the position of Vice President of Sales, Americas.  John has over 25 years of experience delivering innovative storage solutions, having worked with market leaders such as Brocade and Hitachi Data Systems. John joins Pixit Media from Software-Defined High Performance Computing solutions provider Re-Store, where he served as Director of Sales and Business Development. John will be focused on Pixit’s growth strategy across North America, delivering the value of truly “software-defined” storage solutions.

Laurent Lacore joins Pixit as a Senior Solutions Architect. With previous positions at Atempo, Quantum and Seagate, Laurent comes with over 20 years of experience in storage, data management and system integration with a focus on media and entertainment workflows. Laurent will be key in designing solutions for Pixit Media customers that address complex workflow requirements efficiently and cost-effectively.

As Technical Engineer, Adam Hansler will be supporting Pixit’s fast growing U.S customer base.  Adam has worked as a field and sales engineer for a number of years and understands the challenges our clients face on a day to day basis.

“We’re delighted to have John, Laurent and Adam join us at Pixit Media to fuel our global growth.” says Ben Leaver, Chief Executive Officer and Co-Founder.  “Their experience of architecting, delivering and supporting software-defined solutions ensures we have exactly the right team in place to serve a market that has been constrained by vendor lock-in for far too long. It’s an extremely exciting time for us all”

John, Laurent and Adam are all based in Pixit Media’s new North American headquarters in San Diego, California.

About Pixit Media

Guaranteed and predictable performance, simple licensing models and innovative data management capabilities make Pixit Media an easy choice for content creators and distributors looking to dramatically improve the efficiency of their workflows. For more information on Pixit Media, please visit www.pixitmedia.com

Source: Pixit Media

The post Pixit Media Expands North American Team appeared first on HPCwire.

Toshiba NVMe SSDs Now Available with ThinkSystem, ThinkAgile Servers

Tue, 07/25/2017 - 10:02

IRVINE, Calif., July 25, 2017 — Toshiba America Electronic Components, Inc. (TAEC), a committed technology leader, today announces its collaboration with Lenovo to integrate its PX04P Series of NVM Express (NVMe) SSDs with Lenovo’s new ThinkSystem and ThinkAgile servers to drive high performance and endurance while giving customers flexible storage and data center infrastructure options. ThinkSystem servers aim to expand traditional data center infrastructure to enable hyperscale and HPC deployments, and ThinkAgile focuses on software-defined deployments.

The PX04P Series is available in AIC and 2.5” form factors each with x4 PCIe Gen3. Offered in capacities up to 3.84TB, the PX04P boasts endurance rates up to 10DWPD. Used in flexible storage systems from Lenovo, the PX04P performs well in various high performance IO and varied workload levels. The PX04P functions at the highest performance efficiency, with 660K RR IOPS at only 18.5W max. Toshiba serves a broad portfolio of innovative, proven storage products and offers end-to-end integration of NVMe SSD with Toshiba FLASH, controller and FW.

“Lenovo is committed to advancing the data center experience and providing our customers flexibility and agility to scale as workloads change,” said Kamran Amini, General Manager, Server and Storage System Business Unit, Lenovo. “By integrating our new ThinkSystem and ThinkAgile portfolios with Toshiba’s enterprise NVMe SSDs, this collaboration allows customers both reliability and efficiency needed for better business outcomes.”

This news demonstrates Toshiba’s commitment to continued momentum and growth in the industry following the IDC market reports, which identified Toshiba as the fastest growing vendor for 2016 in the $17 billion SSD segment.

“The PX04P enterprise NVMe family combines high performance with a highly efficient power profile,” said Jeremy Werner, vice president, SSD marketing and Product Planning, Toshiba America Electronic Components, Inc. “Toshiba’s SSDs are vertically integrated, and designed to deliver quality with reliability required by data centers. Our collaboration with Lenovo provides flexible storage for multiple, demanding workloads.”

The PX04P Series SSDs are now available. For more information on Toshiba’s line of storage products, please visit: http://toshiba.semicon-storage.com/us/product/storage-products.html. For more information on our entire line of consumer storage solutions visit: http://storage.toshiba.com/consumerhdd. To learn more about Toshiba’s storage solutions, follow @ToshibaStorage on Twitter.

About Toshiba Corp. and Toshiba America Electronic Components, Inc. (TAEC)

Through proven commitment, lasting relationships and advanced, reliable electronic components, Toshiba enables its customers to create market-leading designs. Toshiba is the heartbeat within product breakthroughs from OEMs, ODMs, CMs, VARs, distributors and fabless chip companies worldwide. A committed electronic components leader, Toshiba designs and manufactures high-quality flash memory-based storage solutions, solid state drives (SSDs), hard disk drives (HDDs), discrete devices, custom SoCs/ASICs, imaging products, microcontrollers, wireless components, mobile peripheral devices, advanced materials and medical tubes that make possible today’s leading smartphones, tablets, cameras, medical devices, automotive electronics, industrial applications, enterprise solutions and more.

Source: Toshiba

The post Toshiba NVMe SSDs Now Available with ThinkSystem, ThinkAgile Servers appeared first on HPCwire.

Synopsys Introduces DesignWare High Bandwidth Memory 2 IP solution

Tue, 07/25/2017 - 09:57

MOUNTAIN VIEW, Calif., July 25, 2017 — Synopsys, Inc. (Nasdaq: SNPS) today introduced its complete DesignWare High Bandwidth Memory 2 (HBM2) IP solution consisting of controller, PHY and verification IP, enabling designers to achieve up to 307 GB/s aggregate bandwidth, which is 12 times the bandwidth of a DDR4 interface operating at 3200 Mb/s data rate. In addition, the DesignWare HBM2 IP solution delivers approximately ten times better energy efficiency than DDR4. Advanced graphics, high-performance computing (HPC) and networking applications are requiring more memory bandwidth to keep pace with the increasing compute performance brought by advanced process technologies. With the DesignWare HBM2 IP solution, designers can achieve their memory throughput requirements with minimal power consumption and low latency. The new DesignWare HBM2 IP solution is built on Synopsys’ silicon-proven HBM and DDR4 IP, which has been validated in hundreds of designs and shipped in millions of systems-on-chips (SoCs), enabling designers to lower integration risk and accelerate adoption of the new standard.

“We selected Synopsys’ DesignWare HBM2 IP solution to take full advantage of the bandwidth and power efficiency of the 16GB of HBM2 memory in our Radeon Vega Frontier Edition graphics cards,” said Joe Macri, corporate VP and product CTO at AMD. “Synopsys’ deep expertise in memory interfaces enabled us to successfully integrate HBM2 IP into the ‘Vega’ GPU architecture and achieve aggressive power and memory bandwidth targets to serve machine learning and advanced graphics applications.”

The complete DesignWare HBM2 IP solution provides unique functionality that enables designers to achieve their memory bandwidth, latency and power objectives. The DesignWare HBM2 Controller supports pseudo-channel operation in either lock step or memory interleaved mode, allowing users to maximize bandwidth based on their unique traffic pattern. Both the HBM2 controller and PHY utilize a DFI 4.0-compatible interface to simplify integration with custom DFI-compliant controllers and PHYs.

The DesignWare HBM2 PHY IP offers four trained power management states and fast frequency switching that allows the SoC to manage power consumption by quickly changing between operating frequencies. The DesignWare HBM2 PHY enables a microbump array that matches the JEDEC HBM2 SDRAM standard for the shortest possible 2.5D package routes and highest signal integrity. To simplify HBM2 SDRAM testing, the DesignWare HBM2 PHY IP provides an IEEE 1500 port with an access loopback mode for testing and training the link between the SoC and HBM2 SDRAM.

Synopsys VC Verification IP for HBM is fully compliant to HBM JEDEC specification (including HBM2) and provides protocol, methodology, verification and productivity features including built-in protocol checks, coverage and verification plans, and Verdi protocol-aware debug and performance analysis, enabling users to achieve rapid verification of HBM-based designs.

“Increasing memory bandwidth without overtaxing power and area budgets is critical for graphics, HPC and networking applications,” said John Koeter, vice president of marketing for IP at Synopsys. “As the leading providing of memory IP, Synopsys has engaged closely with several lead customers to develop an HBM2 IP solution that enables designers to address increasing throughput requirements, with improved latency and power efficiency for their high-performance SoC designs.”

Availability & Resources

The DesignWare HBM2 PHY and VC Verification IP are available now for 14- and 7-nm process technologies, with additional process technologies in development. For availability information on the DesignWare HBM2 Controller IP, please contact Synopsys.

About DesignWare IP

Synopsys is a leading provider of high-quality, silicon-proven IP solutions for SoC designs. The broad DesignWare IP portfolio includes logic libraries, embedded memories, embedded test, analog IP, wired and wireless interface IP, security IP, embedded processors and subsystems. To accelerate prototyping, software development and integration of IP into SoCs, Synopsys’ IP Accelerated initiative offers IP prototyping kits, IP software development kits and IP subsystems. Synopsys’ extensive investment in IP quality, comprehensive technical support and robust IP development methodology enables designers to reduce integration risk and accelerate time-to-market. For more information on DesignWare IP, visit www.synopsys.com/designware.

About Synopsys

Synopsys, Inc. (Nasdaq: SNPS) is the Silicon to Software partner for innovative companies developing the electronic products and software applications we rely on every day. As the world’s 15th largest software company, Synopsys has a long history of being a global leader in electronic design automation (EDA) and semiconductor IP and is also growing its leadership in software security and quality solutions. Whether you’re a system-on-chip (SoC) designer creating advanced semiconductors, or a software developer writing applications that require the highest security and quality, Synopsys has the solutions needed to deliver innovative, high-quality, secure products. Learn more at www.synopsys.com.

Source: Synopsys

The post Synopsys Introduces DesignWare High Bandwidth Memory 2 IP solution appeared first on HPCwire.

ORNL Acquires D-Wave 2000Q Cloud Services for Hybrid Computing

Tue, 07/25/2017 - 09:47

HANOVER, M.D., July 25, 2017 — D-Wave Systems Inc., the leader in quantum computing systems and software, and Oak Ridge National Laboratory (ORNL), the largest U.S. Department of Energy science and energy laboratory, today announced an agreement aimed at advancing hybrid computing applications, particularly targeted at helping accelerate future exascale applications. Under the agreement, ORNL scientists will have cloud access to a D-Wave 2000Q system to allow for exploration of hybrid computing architectures as a way to achieve better solutions for scientific applications.

ORNL is a multi-program research laboratory dedicated to helping ensure America’s security and prosperity by addressing its energy, environmental, and nuclear challenges through transformative science and technology solutions. ORNL employs almost 5000 people, including scientists and engineers in more than 100 disciplines, and houses Titan, the nation’s fastest supercomputer.

“ORNL researchers are investigating the use of quantum, neuromorphic, and other new computing architectures with the potential to accelerate applications and programs important to the Department of Energy,” said Dr. Jeff Nichols, Associate Laboratory Director of Computing and Computational Sciences at Oak Ridge National Laboratory. “This agreement fits squarely within our objective of providing distinctive equipment and unique facilities to our researchers to solve some of the nation’s most compelling computing challenges. This program is also a natural extension of the lab’s leadership in high-performance computing, with the next step being to accelerate the nation’s exascale program.”

As part of the joint effort, D-Wave personnel will work with ORNL to map applications to the D-Wave architecture in order to solve new types of problems, and to solve existing problems faster by combining computing architectures.

“Advancing the problem-solving capabilities of quantum computing takes dedicated collaboration with leading scientists and industry experts,” said Robert “Bo” Ewald, president of D-Wave International. “Our work with ORNL’s exceptional community of researchers and scientists will help us understand the potential of new hybrid computing architectures, and hopefully lead to faster and better solutions for critical and complex problems.”

In addition to advancing ORNL’s own applications and programs, ORNL and D-Wave aim to share these results with the scientific user community to enable improved hybrid computing applications.

About D-Wave Systems Inc.
D-Wave is the leader in the development and delivery of quantum computing systems and software, and the world’s only commercial supplier of quantum computers. Our mission is to unlock the power of quantum computing for the world. We believe that quantum computing will enable solutions to the most challenging national defense, scientific, technical, and commercial problems. D-Wave’s systems are being used by some of the world’s most advanced organizations, including Lockheed Martin, Google, NASA Ames, USRA, USC, and Los Alamos National Laboratory. With headquarters near Vancouver, Canada, D-Wave’s U.S. operations are based in Palo Alto, CA and Hanover, MD. D-Wave has a blue-chip investor base including Goldman Sachs, Bezos Expeditions, DFJ, In-Q-Tel, BDC Capital, Growthworks, 180 Degree Capital Corp., International Investment and Underwriting, and Kensington Partners Limited. For more information, visit: www.dwavesys.com.

About the Oak Ridge National Laboratory
Oak Ridge National Laboratory is the largest US Department of Energy science and energy laboratory, conducting basic and applied research to deliver transformative solutions to compelling problems in energy and security. ORNL is managed by UT-Battelle for the Department of Energy’s Office of Science.

Source: D-Wave

The post ORNL Acquires D-Wave 2000Q Cloud Services for Hybrid Computing appeared first on HPCwire.

Rescale Names Gabriel Broner VP, GM of HPC

Tue, 07/25/2017 - 08:30

SAN FRANCISCO, Calif., July 25, 2017 — Rescale, the leading enterprise big compute platform provider for engineering simulation and scientific research on cloud high-performance computing (HPC), today announced that Gabriel Broner has joined the company as Vice President and General Manager of High-Performance Computing, and that Jonathan Oakley has joined the company as Vice President of Marketing.

Broner brings 25 years of industry experience to Rescale. He has held roles as operating system architect at Cray, general manager at Microsoft, head of innovation at Ericsson, and most recently vice president and general manager of high-performance computing at SGI.

Broner’s experience with customers, understanding their needs and translating those needs into HPC products and solutions has helped multiple enterprises innovate in their core business. Rescale’s customers and potential customers will directly benefit from this experience, which will help them to understand the benefits of adopting cloud and ultimately allow them to significantly increase product design throughput.

“Rescale offers HPC users the possibility to instantly run simulations on large systems with the architecture of their choice, which enables companies to accelerate the pace of innovation,” said Broner. “I am very excited to join this talented group of people at Rescale who are driving the next big disruption in HPC.”

“Gabriel brings many years of experience in technical and executive roles in the HPC industry leading teams through disruptive technology transitions,” said Joris Poort, CEO at Rescale. “We look forward to leveraging Gabriel’s experience to help customers accelerate innovation through the transition from fixed legacy on-premise systems to more flexible, scalable, and cost-effective big compute solutions.”

Oakley has a background in engineering software and 15 years of experience in North American enterprise sales, marketing and business development for CST, a high-growth German-based simulation software company, through to its successful acquisition by Dassault Systemes in 2016.

Oakley will form part of the Rescale executive team and will be responsible for building the marketing team at Rescale and developing its core messaging, branding, and product awareness strategy, ensuring that the market understands the full potential and benefits of Rescale’s turnkey cloud HPC solution.

“Jonathan brings a mix of engineering and business growth experience to Rescale,” said Joris Poort, CEO at Rescale. “The company looks forward to accelerating market awareness and bringing the Rescale solution to a market that is eager to find cost-effective and scalable alternatives to on-premise HPC.”

Previously, Oakley worked at BAE Systems, Cobham, and CST of America. At CST, as VP of Sales & Marketing he helped to grow the North American operation from 3 to 50 people over a period of 15 years and opened two California offices in addition to the Boston headquarters.  During his tenure, CST North America maintained year-over-year revenue growth with a CAGR of 14%.

“AT CST I saw an increasing requirement for robust high-performance computing as the software capability increased and analysts started to create larger, system-level simulations,” said Oakley. “On-premise hardware is limited, aging, and very expensive to maintain. Rescale offers a great alternative turnkey solution by moving the big compute problem securely to the multi-cloud, where the latest hardware is always on tap. I am very excited to be at the start of a new revolution in cloud computing and look forward to building global awareness of Rescale’s solutions.”

With the growing acceptance of cloud for SaaS and IaaS, HPC and big compute in the cloud is the logical progression. The flexibility and scalability of Rescale’s solution make it attractive for all stakeholders in the enterprise, from CIO to end-user. With a huge software library and worldwide infrastructure, Rescale is able to offer solutions for the startup to the largest multinational.

About Rescale

Rescale is the global leader for high-performance computing simulations and deep learning in the cloud. Trusted by the Global Fortune 500, Rescale empowers the world’s top scientists and engineers to develop the most innovative new products and perform groundbreaking research and development faster and at lower cost. Rescale’s ScaleX platform transforms traditional fixed IT resources into flexible hybrid, private, and public cloud resources—built on the largest and most powerful high-performance computing network in the world. For more information on Rescale’s ScaleX platform, visit www.rescale.com.

Source: Rescale

The post Rescale Names Gabriel Broner VP, GM of HPC appeared first on HPCwire.

NSF Project Sets Up First Machine Learning Cyberinfrastructure – CHASE-CI

Tue, 07/25/2017 - 07:10

Earlier this month, the National Science Foundation issued a $1 million grant to Larry Smarr, director of Calit2, and a group of his colleagues to create a community infrastructure in support of machine learning research. The ambitious plan – Cognitive Hardware and Software Ecosystem, Community Infrastructure (CHASE-CI) – is intended to leverage the high-speed Pacific Research Platform (PRP) and put fast GPU appliances into the hands of researchers to tackle machine learning hardware, software, and architecture issues.

Given the abrupt rise of machine learning and its distinct needs versus traditional FLOPS-dominated HPC, the CHASE-CI effort seems a natural next step in learning how to harness PRP’s high bandwidth for use with big data projects and machine learning. Perhaps not coincidentally Smarr is also principal investigator for PRP. As described in the NSF abstract, CHASE-CI “will build a cloud of hundreds of affordable Graphics Processing Units (GPUs), networked together with a variety of neural network machines to facilitate development of next generation cognitive computing.”

Those are big goals. Last week, Smarr and co-PI Thomas DeFanti spoke with HPCwire about the CHASE-CI project. It has many facets. Hardware, including von Neumann (vN) and non von Neumann (NvN) architectures, software frameworks (e.g., Caffe and TensorFlow), six specific algorithm families (details near the end of the article), and cost containment are all key target areas. In building out PRP, the effort leveraged existing optical networks such as GLIF by building termination devices based on PCs and providing them to research scientists. The new device — dubbed FIONA (Flexible I/O Network Appliances) – was  developed by PRP co-PI Philip Papadopoulos and is critical to the new CHASE-CI effort. A little background on PRP may be helpful.

Larry Smarr, director, Calit2

As explained by Smarr, the basic PRP idea was to experiment with a cyberinfrastructure that was appropriate for a broad set of applications using big data that aren’t appropriate for the commodity internet because of the size of the of the datasets. To handle the high speed bandwidth, you need a big bucket at the end of the fiber notes Smarr. FIONAs filled the bill; the devices are stuffed with high performance, high capacity SSDs and high speed NICs but based on the humble and less expensive PC.

“They could take the high data rate without TCP backing up and thereby lowering the overall bandwidth, which traditionally has been a problem if you try to go directly to spinning disk,” says Smarr. Currently, there are on the order of 40 or 50 of these FIONAs deployed across the West Coast. Although 100 gigabit throughput is possible via the fiber, most researchers are getting 10 gigabit, still a big improvement.

DOE tests the PRP performance regularly using a visualization tool MadDash (Monitoring and Debugging Dashboard). “There are test transfers of 10 gigabytes of data, four times a day, among 25 organizations, so that’s roughly about 300 transfers four times a day. The reason why we picked that number, 10 gigabytes, was because that’s the amount of data you need to get TCP up to full speed,” says Smarr.

Thomas DeFanti, co-PI, CHASE-CI

Networks are currently testing out at 5, 6, 7, 8 and 9 gigabits per second, which is nearly full utilization. “Some of them really nail it at 9.9 gigabits per second. If you go to 40 gigabit networks that we have, we are getting 13 and 14 gigabits per second and that’s because of the [constrained] software we are using. If we go to a different software, which is not what scientists routinely use [except] the high energy physics people, then we can get 30 or 40 or 100 gigabits per second – that’s where we max out with the PC architecture and the disk drives on those high end units,” explains DeFanti.

The PRP has proven to be very successful, say Smarr and DeFanti. PRP v1, basically the network of FIONAs, is complete. PRP v2 is in the works. The latter is intended to investigate advanced software concepts such as software defined networking and security and not intended to replace PRP v1. Now, Smarr wants to soup up FIONAs with FPGAs, hook them into the PRP, and tackle machine learning. And certainly hardware is just a portion of the machine learning challenge being addressed.

Data showing increase in PRP performance over time.

Like PRP before it, CHASE-CI is a response to an NSF call for community computer science infrastructure. Unlike PRP, which is focused on applications (e.g. geoscience, bioscience) and whose architecture was largely defined by guidance from domain scientists, CHASE-CI is being driven by needs of computer scientists trying to support big data and machine learning.

The full principal investigator team is an experienced and accomplished group including: Smarr, (Principal Investigator), Calit2; Tajana Rosing (Co-Principal Investigator), Professor, CSE Department, UCSD; Ilkay Altintas (Co-Principal Investigator), Chief Data Science Officer, San Diego Supercomputer Center; DeFanti (Co-Principal Investigator), Full Research Scientist at the UCSD Qualcomm Institute, a division of Calit2;  and Kenneth Kreutz-Delgado (Co-Principal Investigator), Professor, ECE Department, UCSD.

“What they didn’t ask for [in the PRP grant] was what computer scientists need to support big data and machine learning. So we went back to the campuses and found the computers scientists, faculty and staff that were working on machine learning and ended up with 30 of them that wrote up their research to put into this proposal,” says Smarr. “We asked what was bottlenecking the work and [they responded] it was a lack of access to GPUs to do the compute intensive aspects of machine learning like training data sets on big neural nets.”

Zeroing in on GPUs, particularly GPUs that emphasize lower precision, is perhaps predictable.

“[In traditional] HPC you need 64-bit and error correction and all of that kind of stuff which is provided very nicely by Nvidia’s Tesla line, for instance, but actually because of the noise that is inherent in the data in most machine learning applications it turns out that single precision 32-bit is just fine and that’s much less expensive than the double precision,” says Smarr. For this reason, the project is focusing on less expensive “gaming GPUs” which fit fine into the slots on the FIONAs since they are PCs.

The NSF proposal first called for putting ten GPUs into each FIONA. “But we decided eight is probably optimal, eight of these front line game GPUs and we are deploying 32 of those FIONAs in this new grant across PRP to these researchers, and because they are all connected at 10 gigabits/s we can essentially treat them as a cloud,” says Smarr. There are ten campuses initially participating: UC San Diego, UC Berkeley, UC Irvine, UC Riverside, UC Santa Cruz, UC Merced, Sand Diego State University, Caltech, Stanford, and Montana State University (brief summary of researchers and their intended focus by campus is at the end of the article (taken from the grant proposal).

As shown in the cost comparison below, the premium for high end GPUs such as Nvidia P100 is dramatic. The CHASE-CI plan is to stick with commodity gaming GPUs, like the Nvidia 1080, since they are used in large volumes which keeps the prices down and the improvements coming. Nevertheless Smarr and DeFanti emphasize they are vendor agnostic and that other vendors have expressed interest in the program.

“Every year Nvidia comes out with a new set of devices and then halfway way through the year they come out with an accelerated version so in some sense you are on a six month cycle. The game cards are around $600 and every year the [cost performance] gets better,” says DeFanti. “We buy what’s available, build, test, and benchmark and then we wait for the next round. [Notably], people in the community do have different needs – some need more memory, some would rather have twice as many $250 GPUs because they are really fast and just have less memory. So it is really kind of custom and there’s some negotiation with users, but they all fit in the same chassis.”

DeFanti argues the practice of simulating networks on CPUs has slowed machine learning’s growth. “Machine learning involves looking at gigabytes of data, millions of pictures, and basically doing that is a brute force calculation that works fine in 32 bit, nobody uses 64 bit. You chew on these things literally for a week even on a GPU, which is much faster than a CPU for these kind of these things. That’s a reason why this field was sort of sluggish just using simulators; it took too much time on desktop CPUs. The first phase of this revolution is getting everybody off simulators.”

That said, getting high performance GPUs into the hands of researchers and students is only half of the machine learning story. Training is hard and compute-intensive and can take weeks-to-months depending upon the problem. But once a network is trained, the computer power required for the inference engine is considerably less. Power consumption becomes the challenge particularly because these trained networks are expected to be widely deployed on mobile platforms.

Here, CHASE-CI is examining a wide range of device types and architectures. Calit2, for example, has been working with IBM’s neuromorphic True North chip for a couple of years. It also had a strong role in helping KnuEdge develop its DSP-based neural net chip. (KnuEdge, of course, was founded by former NASA administrator Daniel Goldin.) FPGAs also show promise.

Says Smarr, “They have got to be very energy efficient. You have this whole new generation of largely non von Neumann architectures that are capable of executing these machine learning algorithms on say streams of video data, radar data, LIDAR data, things like that, that make decisions in real time like approval on credit cards. We are building up a set of these different architectures – von Neumann and non von Neumann – and making those available to these 30 machine learning experts.”

CHASE-CI is also digging into the needed software ecosystem to support machine learning. The grant states “representative algorithms from each of the following six families will be selected to form a standardized core suite of ‘component algorithms’ that have been tuned for optimal performance on the component coprocessors.” Here they are:

  • Deep Neural Network (DNN) and Recurrent Neural Network (RNN) algorithms, including layered networks having fully-connected and convolutional layers (CNNs), variational autoencoders (VAEs), and generalized adversarial networks (GANs). Training will be done using, modern approaches to backpropagation, stochastic sampling, bootstrap sampling, and restricted (and unrestricted) Boltzmann ML. NNs provide powerful classification and detection performance and can automatically extract a hierarchy of features.
  • Reinforcement Learning (RL) algorithms and related approximate Markov decision process (MDP) algorithms. RL and inverse-RL algorithms have critical applications in areas of dynamic decision-making, robotics and human/robotic transfer learning.
  • Variational Autoencoder (VAE) and Markov Chain Monte Carlo (MCMC) stochastic sampling algorithms supporting the training of generative models and metrics for evaluating the quality of generative model algorithms. Stochastic sampling algorithms are appropriate for training generative models on analog and digital spiking neurons. Novel metrics are appropriate.
  • Support Vector Machine (SVM) SVMs can perform an inner product and thresholding in a high-dimensional feature space via use of the “kernel trick”.
  • Sparse Signal Processing (SSP) algorithms for sparse signal processing and compressive sensing, including Sparse Baysian Learning (SBL). Algorithms which exploit source sparsity, possibly using learned over-complete dictionaries, are very important in domains such as medical image processing and brain-computing interfacing.
  • Latent Variable (LVA) Algorithms for source separation algorithms, such as PCA, ICA, and IVA. LV models typically assume that a solution exists in some latent variable sparse within which the components are statistically independent. This class of algorithms includes factor analysis (FA) and non-negative matrix factorization (NMF) algorithms.

Despite such ambitious hardware and software goals, Smarr suggests early results from CHASE-CI should be available sometime in the fall. Don’t get the wrong idea, he cautions. CHASE-CI is a research project for computer science not a production platform.

“We are not trying to be a big production site or anything else. But we are trying to really explore, as this field develops, not just the hardware platforms we’ve talked about but the software and algorithm issues. There’s a whole bunch of different modes of machine learning and statistical analysis that we are trying to match between the algorithms and the architectures, both von Neumann and non von Neumann.

“For 30 years I have been sort of collecting architectures and mapping a wide swath of algorithms on them to port applications. Here we are doing it again but now for machine learning.”

Sample List of CHASE-CI Researchers and Area of Work*

  • UC San Diego: Ravi Ramamoorthi (ML for processing light field imagery), Manmohan Chandraker (3D scene reconstruction), Arun Kumar (deep learning for database systems), Rajesh Gupta (accelerator-centric SOCs), Gary Cottrell (comp. cognitive neuroscience & computer vision), Nuno Vasconcelos (computer vision & ML), Todd Hylton (contextual robotics), Jurgen Schulze (VR), Ken Kreutz-Delgado (ML and NvN), Larry Smarr (ML and microbiome), Tajana Rosing (energy efficiency of running MLs on NvNs), Falko Kuester (ML on NvNs in drones).
  • UC Berkeley: James Demmel (CA algorithms), Trevor Darrell (ML libraries)
  • UC Irvine: Padhraic Smyth (ML for biomedicine and climate science), Jeffrey Krichmar 
(computational neuroscience), Nikkil Dutt (FPGAs), Anima Anandkumar (ML)
  • UC Riverside: Walid Najjar (FPGAs), Amit Roy-Chowdhury (image and video analysis)
  • UC Santa Cruz: Dimitris Achlioptas (linear layers and random bipartite graphs), Lise Getoor (large-scale graph processing), Ramakrishna Akella (multi-modal prediction and retrieval), Shawfeng Dong (Bayesian deep learning and CNN models in astronomy)
  • UC Merced: Yang Quan Chen (agricultural drones)
  • San Diego State: Baris Aksanli (adaptive learning for historical data)
  • Caltech: Yisong Yue (scalable deep learning methods for complex prediction settings)
  • Stanford: Anshul Kundaje (ML & genetics), Ron Dror (structure-based drug design using ML)
  • Montana State: John Sheppard (ML and probablistic methods to solve large systems problems)

*  Excerpted from the CHASE-CI grant proposal

Images courtesy of Larry Smart, Calit2

The post NSF Project Sets Up First Machine Learning Cyberinfrastructure – CHASE-CI appeared first on HPCwire.

DARPA Continues Investment in Post-Moore’s Technologies

Mon, 07/24/2017 - 10:58

The U.S. military long ago ceded dominance in electronics innovation to Silicon Valley, the DoD-backed powerhouse that has driven microelectronic generation for decades. With Moore’s Law clearly running out of steam, the Defense Advanced Research Projects Agency (DARPA) is attempting to reinvigorate and leverage a vibrant domestic chip sector with a $200 million initiative designed among other things to push the boundaries of chip architectures like GPUs.

DARPA recently announced that its Electronics Resurgence Initiative seeks to move beyond Moore’s Law chip scaling. Among the new fronts to be opened by the defense agency are extending GPU frameworks that underlie machine-learning tools to develop “reconfigurable physical structures that adjust to the needs of the software they support.”

While it remains unclear how enterprises might benefit directly from the chip initiative overseen by DARPA’s Microsystems Technology Office, the agency does have a reputation dating back to the earliest days of the Internet for funding high-risk technology R&D that eventually makes its way into the commercial sector.

The DARPA effort also attempts to lay the groundwork for a post Moore’s Law era where, according to the agency, research will focus on “integrating different semiconductor materials on individual chips, ‘sticky logic’ devices that combine processing and memory functions and vertical rather than only planar integration of microsystem components.”

As the focus of chip technology zeroes in on data driven enterprise applications, DARPA said it would cast a wider net to harness semiconductor innovation that would lead to a post-Moore’s Law generation of microelectronic systems benefitting military and commercial users.

The effort runs in parallel with recent attempts by DoD to tap into the sustained burst of technology and development innovation in Silicon Valley. As the technology entrepreneur Steve Blank has documented, the 20th century electronics explosion was initially funded by the U.S military beginning as early as World War II, continuing throughout the Cold War confrontation with the former Soviet Union.

The DARPA effort primarily seeks to establish new development models that go beyond chip scaling. “We need to break away from tradition and embrace the kinds of innovations that the new initiative is all about,” emphasized William Chappell, director of DARPA’s Microsystems Technology Office. The program will “embrace progress through circuit specialization and to wrangle the complexity of the next phase of advances, which will have broad implications on both commercial and national defense interests,” Chappell added.

The post-Moore’s Law research effort will complement the recently created Joint University Microelectronics Program (JUMP), a research effort in basic electronics being co-funded by DARPA and Semiconductor Research Corporation (SRC), an industry consortium based in Durham, N.C. Among the chip makers contributing to JUMP are IBM, Intel Corp., Micron Technology and Taiwan Semiconductor Manufacturing Co.

SRC members and DARPA are expected to kick in more than $150 million for the five-year project. Focus areas include high-frequency sensor networks, distributed and cognitive computing along with “intelligent memory and storage.”

As DARPA continues to invest in device technology, it is also attempting to leverage what Chappell calls the “software-defined world.” The agency sees virtualization and other software technologies as one way of addressing skyrocketing weapons costs. Hence, the agency is also investing more research funding in areas such as algorithm development and circuit design for applications such as dynamic spectrum sharing, a capability that would allow the military to squeeze more capacity out of crowded electromagnetic spectrum.

The post DARPA Continues Investment in Post-Moore’s Technologies appeared first on HPCwire.

PPPL Researchers Simulate Impact of Recycled Atoms on Plasma Turbulence

Mon, 07/24/2017 - 10:04

PRINCETON, N.J., July 24, 2017 — Turbulence, the violently unruly disturbance of plasma, can prevent plasma from growing hot enough to fuel fusion reactions. Long a puzzling concern of researchers has been the impact on turbulence of atoms recycled from the walls of tokamaks that confine the plasma. These atoms are neutral, meaning that they have no charge and are thus unaffected by the tokamak’s magnetic field or plasma turbulence, unlike the electrons and ions — or atomic nuclei — in the plasma. Yet, experiments have suggested that the neutral atoms may be significantly enhancing the edge plasma turbulence, hence the theoretical interest in their effects.

In the first basic-physics attempt to study the atoms’ impact, physicists at the U.S. Department of Energy’s (DOE) Princeton Plasma Physics Laboratory (PPPL) have modeled how the recycled neutrals, which arise when hot plasma strikes a tokamak’s walls, increase turbulence driven by what is called the “ion temperature gradient” (ITG). This gradient is present at the edge of a fusion plasma in tokamaks and represents the transition from the hot core of the plasma to the colder boundary adjacent to the surrounding material surfaces.

Extreme-scale computer code

Researchers used the extreme-scale XGC1 kinetic code to achieve the simulation, which represented the first step in exploring the overall conditions created by recycled neutrals. “Simulating plasma turbulence in the edge region is quite difficult,” said physicist Daren Stotler. “Development of the XGC1 code enabled us to incorporate basic neutral particle physics into kinetic computer calculations, in multiscale, with microscopic turbulence and macroscale background dynamics,” he said. “This wasn’t previously possible.”

The results, reported in the journal Nuclear Fusion in July, showed that neutral atoms enhance ITG turbulence in two ways:

  • First, they cool plasma in the pedestal, or transport barrier, at the edge of the plasma and thereby increase the ITG gradient.
  • Next, they reduce the sheared, or differing, rates of plasma rotation. Sheared rotation lessens turbulence and helps stabilize fusion plasmas.

Comparison with experiments

Going forward, researchers plan to compare results of their model with experimental observations, a task that will require more complete simulations that include other turbulence modes. Findings could lead to improved understanding of the transition of plasmas from low confinement to high confinement, or H-mode — the mode in which future tokamaks are expected to operate. Researchers generally consider lower recycling, and hence fewer neutrals, as conducive to H-mode operation. This work may also lead to a better understanding of the plasma performance in ITER, the international fusion facility under construction in France, in which the neutral recycling may differ from that observed in existing tokamaks.

This research was performed under the supervision of PPPL physicist C.S. Chang. Modeling was done on two DOE Office of Science User Facilities: the Titan supercomputer at the Oak Ridge Leadership Computing Facility and the Edison supercomputer at the National Energy Research Scientific Computing Center, with joint support from the DOE Offices of Advanced Scientific Computing Research and Fusion Energy Sciences.

About PPPL

PPPL, on Princeton University’s Forrestal Campus in Plainsboro, N.J., is devoted to creating new knowledge about the physics of plasmas — ultra-hot, charged gases — and to developing practical solutions for the creation of fusion energy. The Laboratory is managed by the University for the U.S. Department of Energy’s Office of Science, which is the largest single supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.

Source: PPPL

The post PPPL Researchers Simulate Impact of Recycled Atoms on Plasma Turbulence appeared first on HPCwire.

Making Cloud Bursting a Reality

Mon, 07/24/2017 - 01:01

To stay ahead of the competition, businesses today must run increasingly intricate modeling and simulation algorithms and perform more sophisticated analysis on larger datasets. And they need the results of those compute jobs faster than ever before. To accomplish this requires access to unprecedented amounts of high performance computing (HPC) capacity.

Unfortunately, most companies find there are a few significant obstacles in the way. To start, data center square footage and the amount of power and cooling that is available are fixed, preventing many companies from adding the capacity they need quickly or easily. And most companies do not have the budget or staff to install and maintain the additional systems that might only be needed sporadically to meet the compute demands of a single workload or during the early stages of a new project.

Cycle Computing has a solution to these problems. Teaming with Dell EMC and Intel, Cycle Computing complements on-premises HPC capacity by offering seamless cloud bursting capabilities to meet today’s growing and unpredictable HPC demands.

The basic idea is to make optimal use of the space, power, and cooling that is available by filling the data center with the most powerful Dell EMC HPC systems that use the newest generation Intel processors, as well as fast storage and interconnect technologies. And then when additional compute capacity is needed, give companies an easy way to run their jobs externally on cloud compute services like Amazon Web Services (AWS), Google Cloud Platform, and Microsoft Azure.

This approach provides two main benefits. First, the most demanding jobs — the ones that need the fastest execution times and require the highest performance cores, memory, and infrastructure – run in-house. When a job requires more capacity than that which is available on-premises, Cycle Computing provides a way for that job to run seamlessly on a cloud service.

Second, to ensure the premium HPC capability is used to maximum efficiency, companies can off-load less demanding workflows to compute cloud services. In this way, a job that has a greater impact on the business gets higher priority on the in-house HPC systems and does not get stuck in execution queues.

Bringing the right elements together

Rather than having one set of software that runs on-site and another that runs on the cloud, Cycle’s approach is more of an infrastructure extension. “Companies need a way to migrate internal workloads to an external cloud without changing code, said Jason Stowe, CEO at Cycle Computing.

Working with Dell EMC and Intel, Cycle helps companies make use of best of breed equipment to run their most demanding workloads on-premises. And then gives them essentially an infrastructure extension to complement internal clusters with cloud instances. “When you have infrastructure extension off of a local cluster, it provides higher capacity when needed and keeps queue waits short so you no longer have to wait to run a job.”

Cycle Cloud bursting capabilities are available from Dell EMC. The goal is to make efficient use on-site capacity and ensure critical jobs get to run on the most powerful systems. “If you own a Maserati, you can certainly use it to take the kids to school and do other chores,” said Stowe. “But when you need it to race, you want to be sure it is available.”

This is accomplished using Cycle Computing’s CycleCloud, which is designed to enable enterprise IT organizations to provide secure and flexible cloud HPC and Big Compute environments to their end users.

The solution has workflow features that allow secure, company-controlled, yet easy access to cloud resources. Some of the key features include job submission, monitoring, and administration; scalability from ones to thousands of instances; dynamic scaling of Big Data, Big Compute, and HPC workloads based on work queues; and SubmitOnce™ technology for seamless submission to internal or cloud resources.

The newest release of CycleCloud lets companies easily set cost alerts on a per-cluster basis. Companies can set the alert to be dollars per day or per month. This gives organizations a great way to manage consumption and assure that users are not blowing through budgets.

Cycle Computing’s cloud bursting technology is helping companies in financial services, oil and gas, life sciences, and other fields make the transition to this new mode of computing. The common traits of such organizations are that they want to speed time to market and time to results, but they are limited by access to compute capacity.

In some cases, the solution allowed organizations to carry out work on a single project – work that would not have been possible with existing HPC resources.

That was the case with a University of Arizona studies into pharmacological treatments for pain, which includes research that uses protein binding to develop possible treatments. Using CycleCloud software to manage a 5,000 core Google Cloud Preemptible VM cluster running Schrödinger® Glide™ enabled research that scientists never thought possible. The cluster was used to run 20,000 hours of docking computations in four hours for $192. Past work was only able to simulate 50,000 compounds, which yielded a grand total of four possible hits.

Using the cloud cluster, the researcher was able to simulate binding of a million compounds in just a few hours. From that million, 600 were hits.

In another example, the HyperXite team from the University of California used CycleCloud when it competed in the SpaceX Hyperloop Pod competition. The HyperXite team is studying the fluid dynamics of the fuselage of its new vehicle. HyperXite has optimized the suspension, levitation, and braking systems of a model and has gone through many design changes to reduce drag and lift, and to minimize mass and maximize speed. Typically, a full simulation requires over 5000 CPU-hours. The team leveraged CycleCloud to run ANSYS Fluent™ on Microsoft Azure Big Compute to complete their iterations in 48 hours, enabling them to get results fast enough to make modifications to the design then rerun the simulations until they were able to converge on a final solution. All for less than $600 in simulation costs.

Beyond providing HPC capacity for a one-time project, CycleCloud lets companies integrate cloud bursting into their everyday workflows. Quantifiable business results obtained by some key users bear this out. They include:

Bringing products to market faster: Using the large scale afforded by the cloud, Western Digital increased the amount of simulation done before designs are physically prototyped. This saves time and money: simulations are completed in 7 hours instead of 30 days, with $2.9M TCO savings compared to purchasing hardware for in-house computation.

Improve business processes: After using the cloud for its Federal Reserve stress test, one insurance company began using cloud for month-end risk analysis reports. With a dedicated cluster 4 times larger than the internal resource, they shortened the report run time from 20 days to four. Eventually, they began running their daily reports in the cloud, increasing the resolution and frequency of the modeling and achieving results that more than offset the increased cost.

Enable worldwide expansion: Thermo Fisher Scientific’s Ion Reporter platform is used around the world to investigate DNA variation. Using CycleCloud to manage provisioning and configuration means the Thermo Fisher team can focus on developing their software instead of managing the infrastructure. Adding a China-based offering was easy: just copy the configuration and launch a new cluster in the China region.


Companies can get CycleCloud from Dell EMC. This cloud bursting solution lets companies maximize the use of their production HPC systems, allowing access to extra capacity when needed without having to expand their data centers or incur CAPEX and OPEX costs for systems that might only be used sparingly.

For more information about providing HPC environments for today’s demanding workloads, visit: http://www.dell.com/hpc


The post Making Cloud Bursting a Reality appeared first on HPCwire.

Supermicro Schedules Call for Fourth Quarter, Fiscal 2017 Financial Results

Fri, 07/21/2017 - 09:10

SAN JOSE, Calif., July 21, 2017 — Super Micro Computer, Inc. (NASDAQ: SMCI), a global leader in high-performance, high-efficiency server, storage technology and green computing, today provided preliminary information regarding its financial results for the fourth quarter ended June 30, 2017.

The Company also announced it will release fourth quarter and fiscal 2017 financial results on Thursday, August 3, 2017, immediately after the close of regular trading, followed by a teleconference beginning at 2:00 p.m. (Pacific Time).

The Company now anticipates it will report revenue for its fourth quarter of fiscal 2017 in the range of $712 million to $717 million. This compares to the Company’s previous guidance range of $655 million to $715 million. Revenue exceeded expectations primarily due to stronger sales in Asia and at storage customers with strong shipments that accelerated late in the quarter.

The Company also anticipates its non-GAAP earnings per diluted share will be in a range of $0.35 to $0.37. This includes an estimated negative impact of $0.09 related to three major items: expiring customer agreements with unfavorable DRAM and SSD pricing; urgent new projects with unanticipated R & D expense from major partners with NRE to be recovered in later quarters; and tax impact due to our global corporate tax structure. This compares to the Company’s previous guidance of $0.40 to $0.50.

“While Supermicro exceeded revenue expectations with record high revenues due to growth in our Asia business as well as new storage customers, earnings were lower than forecast. We were negatively impacted by DRAM and SSD price increases under a number of expiring fixed priced long-term customer agreements. Operating expenses were also higher due to urgent new project R & D expense with major partners for development of programs such as AI and autonomous driving. We will provide more detail on the fourth quarter at the time of our earnings call,” said Charles Liang, Chairman and Chief Executive Officer. “Notwithstanding these transitory operational impacts, Supermicro continues to expand its customer base and grow market share. With over 100 SKU’s launched for Skylake and NVMe Storage solutions, Supermicro is prepared for strong growth in the upcoming technology cycle. At this time, with the Company’s strong pipeline of business exiting the fourth quarter, we expect that the September quarter will exceed the seasonally adjusted revenue that we typically expect.”

Conference Call/Webcast Information for August 3, 2017

Supermicro will hold a teleconference to announce its fourth quarter and fiscal 2017 financial results on Thursday, August 3, 2017, beginning at 2:00 p.m. (Pacific Time). Those wishing to participate in the conference call should dial 1-888-352-6793 (International callers dial 1-719-325-4753) a few minutes prior to the call’s start to register. The conference ID is 7567416. A replay of the call will be available through 11:59 p.m. (Eastern Time) on Thursday, August 17, 2017, by dialing 1-844-512-2921 (International callers dial 1-412-317-6671) and entering replay PIN 7567416.

Those wishing to access the live or archived webcast via the Internet should go to the Investor Relations tab of the Supermicro website at www.Supermicro.com.

About Super Micro Computer, Inc.

Supermicro, a global leader in high-performance, high-efficiency server technology and innovation is a premier provider of end-to-end green computing solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop/Big Data, HPC and Embedded Systems worldwide. Supermicro’s advanced Server Building Block Solutions offer a vast array of components for building energy-efficient, application-optimized, computing solutions. Architecture innovations include Twin, TwinPro, FatTwin, Ultra Series, MicroCloud, MicroBlade, SuperBlade, Double-sided Storage, Battery Backup Power (BBP) modules and WIO/UIO.

Source: Supermicro

The post Supermicro Schedules Call for Fourth Quarter, Fiscal 2017 Financial Results appeared first on HPCwire.

Graphcore Readies Launch of 16nm Colossus-IPU Chip

Thu, 07/20/2017 - 20:20

A second $30 million funding round for U.K. AI chip developer Graphcore sets up the company to go to market with its “intelligent processing unit” (IPU) in 2017 with scale-up production for enterprise datacenters and cloud environments planned in 2018. The company emerged from stealth mode in the fall of 2016 with $30 million in series A funding. Total raised capital now stands at $60 million.

The funding infusion will facilitate the launch of Graphcore’s first chip, codenamed Colossus. The 16nm massively parallel, mixed-precision floating point processor is expected to ship to early-access customers before the end of 2017 with general availability to follow in early in 2018.

Presenting at the Research and Applied AI Summit (RAAIS) on June 30, 2017, Graphcore CTO Simon Knowles shared high-level details of the architecture, captured in this slide (and first reported on by EE News Europe):

“[With 1,000 processor units per chip], it’s designed from the ground up for machine intelligence,” said Knowles. “It’s nothing like a GPU; it’s a very large state-of-the-art 16nm FinFET chip and we put two of those on a PCI Express card and bolt them together.

“It’s designed to be clustered into arrays but [for] our first product we put two on a PCI Express card and that’s equivalent to buying a GPU card. You can plug it into the same socket and you can run the same software that you’ve written in TensorFlow or something like that. It will just run faster; same power budget, same physical form factor.”

Knowles emphasized that Graphcore is not building a machine specifically for neural networks. “Neural networks are clearly important but it has to think more fundamentally [to have long-term utility],” he said, presenting the following listing of desirable attributes:

Knowles claims performance will be “well beyond the Nvidia Volta and well beyond the Google TPU2 card.”

Graphcore is also working on establishing a community of developers and partners for its Poplar graph framework software, which provides an interface to multiple machine learning frameworks, including Tensorflow, MxNet, Caffe2 and PyTorch.

When the company debuted last year, Graphcore CEO Nigel Toon told us he sees AI as an opportunity for disruption in the silicon status quo much like mobile was — notably for ARM — but with potentially an even larger market. Research firm Tractica forecasts that AI will grow to tens of billions of dollars a year in less than a decade (source).

In addition to its planned launch and software ecosystem development strategies, Graphcore already has its sights set on securing a 2018 funding round and future growth opportunities. The company intends to more than double its 60-person workforce over the next year.

The series B funding round, announced today (July 20), is led by London-based venture capital firm Atomico. The money comes out of the $765 million Atomico IV venture fund, one of the largest VC funds ever raised in Europe. Returning investors include Amadeus Capital, Robert Bosch Venture Capital, C4 Ventures, Dell Technologies Capital, Draper Esprit, Foundation Capital, Pitango and Samsung Catalyst Fund.

Notable solo investors include new Uber chief scientist Zoubin Ghahramani; cofounder of London-based machine learning AI startup Deepmind, Demis Hassabis; and UC Berkeley professor and OpenAI researcher Pieter Abbeel. Cofounders of the non-profit AI research firm OpenAI Greg Brockman and Ilya Sutskever are also investing.

Watch Simon Knowles’ entire RAAIS talk below:

The post Graphcore Readies Launch of 16nm Colossus-IPU Chip appeared first on HPCwire.

Buchanan Named Deputy for Science and Technology at ORNL

Thu, 07/20/2017 - 14:27

OAK RIDGE, Tenn., July 20, 2017 — Michelle Buchanan, an accomplished scientific leader and researcher, has been appointed Deputy for Science and Technology at the Department of Energy’s Oak Ridge National Laboratory by new Lab Director Thomas Zacharia. Her appointment is effective Oct. 1, 2017.

“Dr. Buchanan’s research accomplishments, programmatic expertise, and reputation for achievement support ORNL’s role as a premier research institution that provides scientific expertise and breakthroughs that are critical to national priorities in energy, industry, and national security,” said Zacharia, who served in the deputy’s position until becoming lab director on July 1.

Buchanan has been Associate Laboratory Director for Physical Sciences since 2004, with responsibilities including the lab’s Chemical Sciences, Physics, and Materials Science and Technology divisions, as well as its Center for Nanophase Materials Sciences, a DOE Office of Science user facility. The lab will conduct an international search for her replacement.

As Deputy for S&T, Buchanan’s responsibilities will cover the range of ORNL research—computing and computational sciences, neutron science, nuclear science and engineering, the physical sciences, energy and environmental science, and national security—as well as the lab’s leadership role in U.S. ITER, the Exascale Computing Project, and ORNL research centers and institutes.

“The scientific challenges and impact of Oak Ridge’s research has compelled me for many years,” said Buchanan, who came to the lab as a chemist in 1978. “It is a great privilege to be entrusted with shaping our future as a laboratory. My focus will be on strengthening collaborations across our diverse disciplines and promoting scientific achievement among ORNL staff, as well as the world-leading scientists who use ORNL facilities and benefit from our expertise.”

Buchanan is a fellow of the American Chemical Society and the American Association for the Advancement of Science. She has written or contributed to more than 100 scientific publications and reports, holds two patents, edited a book on Fourier transform mass spectrometry, and worked extensively at the national level helping shape research directions for DOE as well as the National Science Foundation. She has held multiple positions in the American Chemical Society and the American Society for Mass Spectrometry. She is currently a member of the Board on Chemical Sciences and Technology, National Academy of Sciences, and serves on advisory boards for the University of Wisconsin-Madison Department of Chemistry, the University of Tennessee Department of Chemistry, the National Science Foundation Advisory Committee for Environmental Research and Education, and the Georgia Institute of Technology Southeastern Nanotechnology Infrastructure Corridor (SENIC). Her stature in the research community has made her an effective advocate for increased opportunities for women, girls, and other underrepresented groups in STEM-based careers.

Buchanan earned her bachelor’s degree in chemistry from the University of Kansas and her doctorate in chemistry from the University of Wisconsin-Madison. Her research focused on the development of mass spectrometry for trace detection of materials related to energy, health, and the environment for multiple DOE offices and other federal agencies.

UT-Battelle manages ORNL for DOE’s Office of Science. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, please visit http://science.energy.gov/.

Source: ORNL

The post Buchanan Named Deputy for Science and Technology at ORNL appeared first on HPCwire.

Fine-Tuning Severe Hail Forecasting with Machine Learning

Thu, 07/20/2017 - 12:55

Depending on whether you’ve been caught outside during a severe hail storm, the sight of greenish tinted clouds on the horizon may cause serious knots in the pit of your stomach, or at least give you pause. There’s good reason for that instinctive reaction. Just consider that a massive hail storm that battered the Denver metro area with golf ball-size hail on May 8, 2017, is expected to result in more than 150,000 car insurance claims and an estimated $1.4 billion in damage to property in and around Denver. The fact is that even in 2017, emergency responders, airports and everyone else going about their business must gamble with forecast uncertainties about hail. So how great would it be if you could get accurate warnings highlighting the path of severe hail storms, along with expected hail size, 1–3 hours before a storm passes through?

If the Severe Hail Analysis and Prediction (SHARP) project, which is funded through a grant from the National Science Foundation (NSF), accomplishes its goal of developing an accurate “warn-on-forecast” model for severe hail storms, this could happen in the next five to 10 years. Of course, there is a lot of scientific work to be done in the meantime, along with a need for significantly more computing power.

A two-pronged approach to hail research

The Center for Analysis and Prediction of Storms (CAPS) at the University of Oklahoma (OU) undertook the SHARP project in 2014 after hypothesizing that hail representation in numerical weather prediction (NWP) models, which mathematically model atmospheric physics to predict storms, could be improved by assimilating data from a host of data sources, and that advanced data-mining techniques could improve predictions of hail size and coverage.

Nathan Snook and Amy McGovern, two of the co-principal investigators on the project, say that CAPS pursues its hail research on two fronts. On one front, large amounts of data from various weather observing systems are ingested into weather models to create very high resolution forecasts. The other uses machine learning to sift through weather model output from CAPS and the National Center for Atmospheric Research to discover new knowledge hidden in large data sets and perform post-model correction calibrations to produce more skillful forecasts. For nearly four years, these projects have relied on the Texas Advanced Computer Center’s (TACC) Stampede system, an important part of NSF’s portfolio for advanced computing infrastructure that enables cutting-edge foundational research for computational and data-intensive science and engineering.

The high-resolution modeling is currently post-event and area specific, while the machine learning analysis is done in real time on a nationwide basis. The reason for the difference comes down to workload sizes. “For the high-resolution work, we use data from interesting historical cases to try to accurately predict the size and scope of hail that passes through a specific area,” explains Snook, a CAPS research scientist who focuses on the warn-on-forecast work. “We deal with 1 to 5 TB of data for each case study that we run, and run different experiments on different days, so our computing demands are enormous and the current available resources simply aren’t powerful enough for real-time analysis.”

McGovern, an associate professor of computer science and adjunct associate professor in the School of Meteorology at OU, says that although the machine learning algorithms are computationally intensive to train, it’s no problem to run them in real time because they are at a much coarser resolution than the data sets that Snook’s team uses (3km vs. 500m) and require fewer resources. “Our resource challenges are mainly around having enough storage and bandwidth to transfer all of the data we need on a daily basis…the data sets come from all over the U.S. and they are quite large, so there are a lot of I/O challenges,” explains McGovern.

Both research efforts rely heavily on data from the NOAA Hazardous Weather Testbed (HWT) to support their experiments. “The HWT gathers a wealth of numerical forecast data by collecting forecasts from various research institutions for about five to six weeks every spring. We use that data for a lot of our high-resolution experiments as well for our machine learning. It’s ideal for the machine learning work because it’s a big data set that is relatively stable from year to year,” says Snook.

Chipping away at high-resolution, real time forecasts

CAPS primarily uses two models for its high-resolution research, including the Weather and Research Forecasting (WRF) model, a widely used mesoscale numerical weather prediction system, and in-house model called the Advanced Regional Prediction System (ARPS). Snook says ARPS is also tuned for mesoscale weather analysis and is quite effective at efficiently assimilating radar and surface observations from a lot of different sources. In fact, to achieve greater accuracy in its warn-on-forecast modeling research, the CAPS team uses models with grid points spaced every 500m, as opposed to the 3km spacing typical in many operational high-resolution models. CAPS made the six-fold increase to better support probabilistic 1-3 hour forecasts of the size of hail and the specific counties and cities it will impact. Snook notes that the National Weather Service is moving toward the use of mesoscale forecasts in severe weather operations and that his team’s progress so far has been promising. In several case studies, their high-resolution forecasts have skillfully predicted the path of individual hailstorms up to three hours in advance—one such case is shown in figure 1.

Figure 1: A comparison of radar-indicated hail versus a 0–90 minute CAPS hail forecast for a May 20, 2013 storm in Oklahoma (inset photo shows image of actual hail from the storm).

While the CAPS team is wrapping up the first phase of its research, Snook and his team have identified areas where they need to further improve their model, and are submitting a new proposal to fund additional work. “As you can imagine, we’re nowhere near the computing power needed to track every hailstone and raindrop, so we’re still dealing with a lot of uncertainty in any storm… We have to make bulk estimates about the types of particles that exist in a given model volume, so when you’re talking about simulating something like an individual thunderstorm, it’s easy to introduce small errors which can then quickly grow into large errors throughout the model domain,” explains Snook. “Our new focus is on improving the microphysics within the model—that is, the parameters the model uses to define precipitation, such as cloud water, hail, snow or rain. If we are successful at that, we could see a large improvement in the quality of hail forecasts.”

Going deeper into forecast data with machine learning

Unlike with the current high-resolution research, CAPS runs the machine learning prediction portion of the project using near real-time daily forecast data from the various groups participating in the HWT. CAPS compares daily realtime forecast data against historical HWT data sets using a variety of algorithms and techniques to flush out important hidden data in forecasted storms nationwide. “Although raw forecasts provide some value, they include a lot of additional information that’s not immediately accessible. Machine learning methods are better at predicting the probability and potential size, distribution and severity of hail 24 to 48 hours in advance,” explains McGovern. “We are trying to improve the predictions from what SPC and the current models do.”

Figure 2 is a good illustration for how the machine learning models have improved the prediction of events for a specific case study. The figure, which highlights storms reported in the southern plains on May 27, 2015, compares the predictions using three different methods:

• Machine learning (left)

• A single parameter from the models, currently used to estimate hail (middle)

• A state-of-the-art algorithm currently used to estimate hail size (right)

The green circles show a 25 mile or 40 km radius around hail reports from that day, and the pink colors show the probability of severe hail, as predicted by each model. Although the updraft helicity model (middle) has the locations generally right, the probabilities are quite low. HAILCAST (right) overpredicts hail in the southeast while missing the main event in Oklahoma, Kansas, and Texas. The machine learning model (left) has the highest probabilities of hail exactly where it occurred. In general, this is a good example for how machine learning is now outperforming current prediction methods.

Currently, McGovern’s team is focusing on two aspects of hail forecasts: “First, we are working to get the machine learning methods into production in the Storm Prediction Center to support some high-resolution models they will be releasing. Second, we are improving the predictions by making use of the full 3D data available from the models,” explains McGovern.

Figure 2: A case study that shows the superior accuracy of the machine learning methods (left) compared to other methods.

A welcome resource boost

Snook says that the machine learning and high resolution research have generated close to 100TB of data each that they are sifting through, so access to ample computing resources is essential to ongoing progress. That’s why Snook and McGovern are looking forward to being able to utilize TACC’s Stampede2 system which, in May, began supporting early users and will be fully deployed to the research community later this summer. The new system from Dell includes 4,200 Intel Xeon Phi processors and 1,736 Intel Xeon processors as well as Intel Omni-Path Architecture Fabric, a 10GigE/40GigE management network, and more than 600 TB of memory. It is expected to double the performance of the previous Stampede system with a peak performance of up to 18 petaflops.

McGovern’s team also runs some of the machine learning work locally on the Schooner system at the OU Supercomputing Center for Education and Research (OSCER). Schooner, which includes a combination of Dell PowerEdge R430 and R730 nodes that are based on the Intel Xeon processor E5-2650 and E5-2670 product families as well as more than 450TB of storage, has a peak performance of 346.9 teraflops. “Schooner is a great resource for us because they allow ‘condo nodes’ so we can avoid lines and we also have our own disk that doesn’t get wiped every day,” says McGovern. “It’s also nice being able to collaborate directly with the HPC experts at OSCER.”

Between the two systems, Snook and McGovern expect to continue making steady progress on their research. That doesn’t mean real-time, high-resolution forecasts are right around the corner, however. “I hope that in five to 10 years, the warn-on-forecast approach becomes a reality, but it’s going to take a lot more research and computing power before we get there,” says Snook.

The post Fine-Tuning Severe Hail Forecasting with Machine Learning appeared first on HPCwire.