HPC Wire

Subscribe to HPC Wire feed
Since 1987 - Covering the Fastest Computers in the World and the People Who Run Them
Updated: 5 hours 3 min ago

Gartner Reveals the 2017 Hype Cycle for Data Management

Fri, 09/29/2017 - 11:41

STAMFORD, Conn., Sept. 29, 2017 — As data becomes ever more distributed across multiple systems, organizations have to cope with increasingly complex ecosystems and digital business requirements. The Gartner, Inc. Hype Cycle for Data Management helps CIOs, chief data officers (CDOs) and other senior data and analytics leaders understand the maturity of the data management technologies they are evaluating to provide a cohesive data management ecosystem in their organizations.

“Data management continues to be central to the move toward digital business. As requirements change within the architecture of the organization and place greater demands on underlying technology, the maturity and capability of many of the technologies highlighted in the Hype Cycle will advance rapidly,” said Donald Feinberg, vice president and distinguished analyst at Gartner. “Recent years have seen many new additions to the Hype Cycle, including in-memory, cloud, data virtualization, advanced analytics, data as a service, machine learning, graph, non-relational and Hadoop.”

Two technologies are of particular interest, in that they show the impact cloud computing is having on the data management discipline. Hadoop distributions are deemed to be obsolete before reaching the Plateau of Productivity because the complexity and questionable usefulness of the entire Hadoop stack is causing many organizations to reconsider its role in their information infrastructure. Instead, organizations are looking at increasingly competitive and convenient cloud-based options with on-demand pricing and fit-for-purpose data processing options.

As part of the same cloud-led trend, SQL interfaces to cloud object stores have appeared at the Innovation Trigger stage. “We expect these interfaces to represent the future of cloud database Platform as a Service (Paas) and reach the Plateau within two to five years because they are the focus of most cloud vendors and products in this space,” said Mr. Feinberg. “They enable organizations to interact with data stored in the cloud, using a familiar SQL syntax. Object stores are well suited to storing large volumes of multistructured data, typical of data lakes.”

Of the 35 other technologies highlighted on the 2017 Hype Cycle for Data Management, four are judged to be transformational in nature. Two — event stream processing (ESP) and operational in-memory database management system (IMDBMS) — are expected to reach the Plateau of Productivity within two to five years, while both blockchain and distributed ledgers are expected to take five to 10 years.

Event Stream Processing

ESP is one of the key enablers of digital business, algorithmic business and intelligent business operations. ESP technology, including distributed stream computing platforms (DSCPs) and event processing platforms (EPPs), is maturing rapidly. Stream analytics provided by ESP software improves the quality of decision-making by presenting information that could otherwise be overlooked.

Operational In-Memory DBMS

Operational In-Memory database management systems (IMDBMS) technology is maturing and growing in acceptance, although the infrastructure required to support it remains relatively expensive. Another inhibitor to the growth of operational IMDBMS technology is the need for persistence models that support the high levels of availability required to meet transaction SLAs. Nevertheless, operational IMDBMSs for transactions have the potential to make a tremendous impact on business value by speeding up data transactions 100 to 1,000 times.

Blockchain

Public distributed ledgers, including blockchain, continue to have high visibility, although organizations remain cautious about the future of public (permission-less) distributed ledger concepts due to scalability, risk and governance issues. Most business use cases have yet to be proven and extreme price volatility in bitcoin persists. Presupposing the technical and business challenges of distributed ledgers can be overcome; in the short term, organizations are most likely to use distributed ledger for operational efficiency gains via the use of shared information and infrastructure. Longer term, Gartner expects a complete reformation of whole industries and commercial activity as the programmable economy develops and ledgers contribute to the monetization of new ecosystems.

Distributed Ledgers

The requirements for more standards and enterprise-scale capabilities are evolving slowly, but distributed ledgers are still not adoptable in a mission-critical at-scale context. Their value propositions, compared with existing technology, are also not clearly established, making the widespread acceptance of the technology problematic. Private distributed ledger concepts are gaining traction, because they hold the promise to transform industry operating models and overcome some of the issues of scalability, risk management and governance that plague public ledgers. As with blockchain, however, many business use cases are unproven at this time.

Gartner clients can learn more in the report “Hype Cycle for Data Management 2017.” This research is part of the Gartner Trend Insight Report “2017 Hype Cycles Highlight Enterprise and Ecosystem Digital Disruptions.”With over 1,800 profiles of technologies, services and disciplines spanning over 100 Hype Cycles focused on a diversity of regions, industries and roles, this Trend Insight Report is designed to help CIOs and IT leaders respond to the opportunities and threats affecting their businesses, take the lead in technology-enabled business innovations and help their organizations define an effective digital business strategy. Information on further 2017 Hype Cycles covering data and analytics can be found on the Gartner Blog Network.

*Previously titled “Hype Cycle for Information Infrastructure, 2016, “Hype Cycle for Data Management, 2017”covers the broad aspects and technologies that describe, organize, integrate, share and govern data.

About the Gartner Data & Analytics Summits 2017/18

Gartner analysts will provide additional analysis on data and analytics leadership trends at the Gartner Data & Analytics Summits, taking place: November 20-21, 2017 in Frankfurt; February 26-27, 2018 in Sydney; March 5-8, 2018 in Grapevine, Texas; and March 19-21, 2018 in London. Follow news and updates from the events on Twitter using #GartnerDA.

About Gartner

Gartner, Inc. (NYSE: IT) is a leading research and advisory company. The company helps business leaders across all major functions in every industry and enterprise size with the objective insights they need to make the right decisions. Gartner’s comprehensive suite of services delivers strategic advice and proven best practices to help clients succeed in their mission-critical priorities. Gartner is headquartered in Stamford, Connecticut, U.S.A., and has more than 13,000 associates serving clients in 11,000 enterprises in 100 countries. For more information, visit www.gartner.com.

Source: Gartner

The post Gartner Reveals the 2017 Hype Cycle for Data Management appeared first on HPCwire.

NSF Gives Grants to Three Universities to Create Online Data Collaboration Platform

Fri, 09/29/2017 - 09:56

SALT LAKE CITY, Utah, Sept. 29, 2017 — The National Science Foundation has awarded the collective universities of Utah, Chicago and Michigan a $4 million, four-year grant to produce SLATE, a new online platform for stitching together large amount of data from multiple institutions that reduces friction commonly found in multi-faceted collaborations.

When complete, SLATE, which stands for Services Layer At The Edge, will be a local software and hardware platform that connects and interacts with cloud resources. This online platform will reduce the need for technical expertise, amount of physical resources like servers and time demands on individual IT departments, especially for smaller universities that lack the resources of larger institutions and computing centers.

From the cosmic radiation measurements by the South Pole Telescope to the particle physics of CERN, multi-institutional research collaborations require computing environments that connect instruments, data and storage servers. Because of the complexity of the science, and the scale of the data, these resources are often distributed among university research computing centers, national high-performance computing centers or commercial cloud providers. This resource disparity causes scientists to spend more time on the technical aspects of computation than on discoveries and knowledge creation.

A partner of the project, the University of Utah will be contributing to the following aspects of the project. Reference architecture, advanced networking aspects, core design, implementation and outreach to other principal investors from science disciplines and partner universities. Senior IT architect Joe Breen explained the goal is to have one simple platform for the end user.

“Software will be updated just like an app by experts from the platform operations and research teams. The software will need little to no assistance required from their local IT personnel. The SLATE platform is designed to work in any data center environment and will utilize advanced network capabilities, if available.”

The platform

Once installed, central research teams will be able to connect with far-flung research groups allowing the exchange of data to be automated. Software and computing tasks among institutions will no longer burden local system administrators with installation and operation of highly customized scientific computing services. By stitching together these resources, SLATE will also expand the reach of these domain-specific “science gateways.”

SLATE works by implementing “cyberinfrastructure as code,” increasing bandwidth science networks with a programmable “underlayment” edge platform. This platform hosts advanced services needed for higher-level capabilities such as data and software delivery, workflow services and science gateway components.

“A central goal of SLATE is to lower the threshold for campuses and researchers to create research platforms within the national cyberinfrastructure,” said University of Chicago senior fellow Robert Gardner.

Practical applications

Today’s most ambitious scientific investigations are too large for a single university or laboratory to tackle alone. Dozens of international collaborations comprised of scientific groups and institutions must coordinate the collection and analysis of immense data streams. These data streams include dark matter searches, the detection of new particles at the high-energy frontier and the precise measurement of radiation from the early universe. The data can come from telescopes, particle accelerators and other advanced instruments.

Today, many universities and research laboratories use a “Science DMZ” architecture to balance the need for security with the ability to rapidly move large amounts of data in and out of the local network. As sciences from physics to biology to astronomy become more data-heavy, the complexity and need for these subnetworks grows rapidly, placing additional strain on local IT teams.

Since 2003, a team of computation and Enrico Fermi Institute scientists led by Gardner has partnered with global projects to create the advanced cyberinfrastructure necessary for rapidly sharing data, computer cycles and software between partner institutions.

User benefits

“Science, ultimately, is a collective endeavor. Most scientists don’t work in a vacuum, they work in collaboration with their peers at other institutions,” said Shawn McKee, director of the Center for Network and Storage-Enabled Collaborative Computational Science at the University of Michigan. “They often need to share not only data, but systems that allow execution of workflows across multiple institutions. Today, it is a very labor-intensive, manual process to stitch together data centers into platforms that provide the research computing environment required by forefront scientific discoveries.”

With SLATE, local research groups will be able to fully participate in multi-institutional collaborations and contribute resources to their collective platforms with minimal hands-on effort from their local IT team. When joining a project, the researchers and admins can select a package of software from a cloud-based service — a kind of app store — that allows them to connect and work with the other partners.

By reducing the technical expertise and time demands for participating in multi-institution collaborations, the SLATE platform will be especially helpful to smaller universities that lack the resources and staff of larger institutions and computing centers. The SLATE functionality can also support the development of “science gateways” that make it easier for individual researchers to connect to HPC resources such as the Open Science Grid and XSEDE.

Source: University of Utah

The post NSF Gives Grants to Three Universities to Create Online Data Collaboration Platform appeared first on HPCwire.

US Exascale Program – Some Additional Clarity

Thu, 09/28/2017 - 12:12

The last time we left the Department of Energy’s exascale computing program in July, things were looking very positive. Both the U.S. House and Senate had passed solid Fiscal Year 2018 (FY-18) appropriations for the exascale activities for both the National Nuclear Security Administration (NNSA) and the Office of Science (SC). However, it also looked like there would be potential major challenges with other parts of the DOE’s budget. These included significant differences with some programs, such as ARPA-E that was over $300 million apart between the House and Senate appropriation bills.

After its August recess, Congress was expected to have some major budget fights in September. This not only included reconciling the difference between the versions of House and Senate appropriations, but also the question of raising the U.S. debt ceiling. Then on September 6th, those potential fights came to a sudden end when President Trump reached an agreement with the House and Senate Democratic leaders for a FY-18 Continuing Budget Resolution (CR) and to raise the government debt ceiling until early December 2017. That effectively maintains the Exascale FY-17 status quo in the short term. From a funding perspective, things for the exascale program continue to look very good.

On September 26th and 27th, some more clarity about the technical aspects of the program was provided during the public SC Advanced Scientific Computing Advisory Committee (ASCAC) meeting. The ASCAC is a regular meeting of the officially endorsed Federal Advisory Committee Act (FACA) group that provides advice to Advanced Scientific Computing Research (ASCR) program. During the meeting, Barb Helland, the associate director for the ASCR office, provided a presentation about the status of the their activities (link). The presentation included some very interesting information about the status of the SC Exascale program.

On slide number 7, she told the ASCAC that there had been a shift in the delivery of the Argonne National Laboratory (ANL) Aurora system. That computer had originally been scheduled to be delivered in 2018 with a performance of 180 petaflops. However, the revised plan for the system is for a 1,000 petaflops (or 1 exaflops) computer to be delivered in 2021. The machine would use “novel technology choices” and would focus on the three pillars of simulation, big data, and machine learning. This shift in the program seems to explain the House’s concern “that the deployment plan for an exascale machine has undergone major changes without an appropriately defined cost and performance baseline.”

Ms. Helland reported that the shift in the machine architecture had been subject to a review in September of 2017 and had received very favorable comments. These included, “The system as presented is exciting with many novel technology choices that can change the way computing is done. The committee supports the bold strategy and innovation, which is required to meet the targets of exascale computing. The committee sees a credible path to success.” Another comment was, “The hardware choices/design within the node is extremely well thought through. Early projections suggest that the system will support a broad workload.” She also reported that a Rebaseline Independent Project Review was scheduled for November 7th to 9th.

Another important piece of news was about the status of the installation of the Oak Ridge National Laboratory (ORNL) Leadership Computing Facility’s Summit computer. This is expected to be a 150 petaflops computer based on the IBM Power9 processors with Nvidia Volta graphic processing units (GPUs). During the meeting, it was reported that the system cabinets had been installed along with the interconnection switches. The computer node boards are expected to arrive sometime towards the end of October and that acceptance testing would start soon after that. It was also reported, that installation of the NNSA’s Lawrence Livermore National Laboratory (LLNL) Sierra computer (similar to Summit) was also underway. One interesting feature of the ORNL computers is that they are installed on a concrete slab with all of the supporting wiring and cooling coming from overhead.

During her presentation, Barb Helland made the point that ASCR would soon be releasing information about the procurement of additional exascale systems to be delivered in the 2022 timeframe. No details were provided, but she explained that these systems would be follow-on systems to the ones delivered as part of the CORAL procurement.

Finally, there were two other interesting exascale revelations during the ASCAC meeting. One was the clarification of the differences between the acronyms of ECI and ECP that appeared in the President’s budget request. Slide number 5 provides the definitions of the terms and states that the ECI (Exascale Computing Initiative) is the partnership between the NNSA and SC. On the other hand, ECP (Exascale Computing Project) is a subprogram within ASCR (SC-ECP) and includes only support for research and development activities in applications, and in partnership with NNSA, investments in software and hardware technology and co- design required for the design of capable exascale computers. The other revelation is that Paul Messina of ANL, the founding director of ECP, is stepping down and will be replaced as of October 1st by Doug Kothe of ORNL. The ASCAC thanked Paul for his service to the country in establishing the foundations for the ECP.

All in all, the most recent ASCAC meeting provided some valuable insights into the U.S. exascale program. Certainly not all of the questions have been answered, but the information provided at the meeting helps to clarify the Department of Energy cutting edge computing program. Perhaps the best news is that the program is still receiving strong Presidential and Congressional support. However, the new December 2017 budget deadline continues to lurk in the background. Once again, more to come.

About the Author

Alex Larzelere is a senior fellow at the U.S. Council on Competitiveness, the president of Larzelere & Associates Consulting and HPCwire’s policy editor. He is currently a technologist, speaker and author on a number of disruptive technologies that include: advanced modeling and simulation; high performance computing; artificial intelligence; the Internet of Things; and additive manufacturing. Alex’s career has included time in federal service (working closely with DOE national labs), private industry, and as founder of a small business. Throughout that time, he led programs that implemented the use of cutting edge advanced computing technologies to enable high resolution, multi-physics simulations of complex physical systems. Alex is the author of “Delivering Insight: The History of the Accelerated Strategic Computing Initiative (ASCI).”

The post US Exascale Program – Some Additional Clarity appeared first on HPCwire.

Penguin Computing Announces NVIDIA Tesla V100-based Servers

Thu, 09/28/2017 - 12:05

FREMONT, Calif., Sept. 28, 2017 — Penguin Computing, provider of high performance computing, enterprise datacenter and cloud solutions, today announced strategic support for the field of artificial intelligence through availability of its servers based on the highly-advanced NVIDIA Tesla V100 GPU accelerator, powered by the NVIDIA Volta GPU architecture.

“Deep learning, machine learning and artificial intelligence are vital tools for addressing the world’s most complex challenges and improving many aspects of our lives,” said William Wu, Director of Product Management, Penguin Computing. “Our breadth of products covers configurations that accelerate various demanding workloads – maximizing performance, minimizing P2P latency of multiple GPUs and providing minimal power consumption through creative cooling solutions.”

NVIDIA Tesla V100 GPUs join an expansive GPU server line that covers Penguin Computing’s Relion servers (Intel-based) and Altus servers (AMD-based) in both 19” and 21” Tundra form factors. Penguin Computing will debut a high density 21” Tundra 1OU GPU server to support 4x Tesla V100 SXM2, and 19” 4U GPU server to support 8x Tesla V100 SXM2 with NVIDIA NVLink interconnect technology optional in single root complex.

The NVIDIA Volta architecture is bolstered by pairing NVIDIA CUDA cores and NVIDIA Tensor Cores within a unified architecture. A single server with Tesla V100 GPUs can replace hundreds of CPU servers for AI. Equipped with 640 Tensor Cores, Tesla V100 delivers 125 TeraFLOPS of deep learning performance. That’s 12X Tensor FLOPS for deep learning training, and 6X Tensor FLOPS for deep learning inference when compared to NVIDIA Pascal GPUs.

“Penguin Computing continues to demonstrate leadership by providing Volta-based systems to support critical AI research,” said Paresh Kharya, Group Product Marketing Manager, NVIDIA. “Tesla V100 systems will enable their customers to create innovative AI products and services by accelerating their AI research and deployments.”

Today’s announcement reinforces Penguin Computing’s philosophy and broader capabilities as a full-spectrum provider offering complete solutions. This includes tailored, custom designs that are supportable and scale to large deployments, and fully engineered and architected designs.

About Penguin Computing

Penguin Computing is one of the largest private suppliers of enterprise and high-performance computing solutions in North America and has built and operates a specialized public HPC cloud service, Penguin Computing On-Demand (POD). Penguin Computing pioneers the design, engineering, integration and delivery of solutions that are based on open architectures and comprise non-proprietary components from a variety of vendors. Penguin Computing is also one of a limited number of authorized Open Compute Project (OCP) solution providers leveraging this Facebook-led initiative to bring the most efficient open data center solutions to a broader market, and has announced the Tundra product line which applies the benefits of OCP to high performance computing. Penguin Computing has systems installed with more than 2,500 customers in 40 countries across eight major vertical markets. Visit www.penguincomputing.com to learn more about the company and follow @PenguinHPC on Twitter.

Source: Penguin Computing

The post Penguin Computing Announces NVIDIA Tesla V100-based Servers appeared first on HPCwire.

Students from Underrepresented Groups Research Data Science with Brookhaven Lab

Thu, 09/28/2017 - 11:55

Sept. 28, 2017 — Computing is one of the least diverse science, technology, engineering, and mathematics (STEM) fields, with an underrepresentation of women and minorities, including African Americans and Hispanics. Leveraging this largely untapped talent pool will help address our nation’s growing demand for data scientists. Computational approaches for extracting insights from big data require the creativity, innovation, and collaboration of a diverse workforce.

As part of its efforts to train the next generation of computational and computer scientists, this past summer, the Computational Science Initiative (CSI) at the U.S. Department of Energy’s (DOE) Brookhaven National Laboratory hosted a diverse group of high school, undergraduate, and graduate students. This group included students from Jackson State University and Lincoln University, both historically black colleges and universities. The Lincoln University students were supported through the National Science Foundation’s Louis Stokes Alliances for Minority Participation program, which provides research and other academic opportunities for minority students to advance in STEM. Two of the students are recipients of prestigious fellowship programs: the Graduate Education for Minorities (GEM) Fellowship, through which qualified students from underrepresented minorities receive funding to pursue STEM graduate education; and the DOE Computational Science Graduate Fellowship (CSGF), which supports doctoral research using mathematics and computers to solve problems in many scientific fields of study, including astrophysics, environmental science, and nuclear engineering.

“To address challenges in science, we need to bring together the best minds available,” said CSI Director Kerstin Kleese van Dam. “Great talents are rare but can be found among all groups, so we reach out to the broadest talent pools in search of our top researchers at every education level and career stage. In return, we offer them the opportunity to work on some of the most exciting problems with experts who are pushing the state of the art in computer science and applied mathematics.”

The students’ research spanned many areas, including visualization and machine learning techniques for big data analysis, modeling and simulation applications, and automated approaches to data validation and verification.

To read the full story, with graphics, please visit the original story at: https://www.bnl.gov/newsroom/news.php?a=212478

Source: Ariana Tantillo, Brookhaven National Laboratory

The post Students from Underrepresented Groups Research Data Science with Brookhaven Lab appeared first on HPCwire.

US-Based CryoEM Company, SingleParticle.com, Partners with Bright

Thu, 09/28/2017 - 10:57

SAN JOSE, Calif., Sept. 28, 2017 — Bright Computing, a leader in cluster and cloud infrastructure automation software, today announced a reseller agreement with San Diego-based SingleParticle.com.

SingleParticle.com is the US subsidiary of Chinese company, BlueJay Imaging, and specializes in turn-key HPC infrastructure designed for high performance and low total cost of ownership (TCO), serving the global research community of cryo-electron microscopy (cryoEM).

The partnership with Bright Computing enables SingleParticle.com to add cluster management to its portfolio. By offering Bright technology to its customer base, SingleParticle.com reduces the IT burden on its customers’ cryoEM facilities. With Bright Cluster Manager, managing SingleParticle.com clusters become much less time-consuming and onerous, enabling customers to focus on solving new structures that can lead to new scientific discoveries and even new drug designs.

Recognized as the Method of the Year 2015 by Nature Methods, SingleParticle.com offers single-particle cryo-electron microscopy (cryoEM), working with macromolecular structures at high resolution. The advancement of cryoEM in recent years brings about two challenges to many labs; a) the large amount of image data being generated every day from state-of-the-art direct electron detectors, b) intensive computation needed for 3D reconstruction software, such as the RELION package, developed by MRC-LMB in the UK.

SingleParticle.com’s solution features:

  • Fully scalable solution from 8 nodes to hundreds of nodes
  • Hybrid CPU/GPU cluster nodes in collaboration with AMAX information technologies, a leading provider of enterprise computing infrastructure solutions
  • High performance data storage with a proprietary file system, offering 300PB in single volume and great TCO with no hardware lock-in using commodity hardware
  • Expert support and service with a focus on cryoEM software

Dr. Clara Cai, Manager at SingleParticle.com, commented; “With Bright, the management of an HPC cluster becomes very straightforward, empowering end users to administer their workloads, rather than relying on HPC experts. We are confident that with Bright’s technology, our customers can maintain our turn-key cryoEM cluster with little to no prior HPC experience.”

Clemens Engler, Director Alliances at Bright Computing, added; “We welcome SingleParticle.com to the Bright partner community. This is an exciting opportunity for Bright technology to serve the cryoEM researchers.”

About Bright Computing

Bright Computing is a leading provider of hardware-agnostic cluster and cloud management software in the world. Bright Cluster Manager, Bright Cluster Manager for Big Data, and Bright OpenStack provide a unified approach to installing, provisioning, configuring, managing, and monitoring HPC clusters, big data clusters, and OpenStack clouds. Bright’s products are currently deployed in more than 650 data centers around the world. Bright Computing’s customer base includes global academic, governmental, financial, healthcare, manufacturing, oil/gas/energy, and pharmaceutical organizations such as Boeing, Intel, NASA, Stanford University, and St. Jude Children’s Research Hospital. Bright partners with Amazon, Cray, Dell, Intel, Nvidia, SGI, and other leading vendors to deliver powerful, integrated solutions for managing advanced IT infrastructure such as high-performance computing clusters, big data clusters, and OpenStack-based private clouds.  www.brightcomputing.com

Source: Bright Computing

The post US-Based CryoEM Company, SingleParticle.com, Partners with Bright appeared first on HPCwire.

Exxact Announces HPC Solutions Featuring NVIDIA Tesla V100 GPU Accelerators

Thu, 09/28/2017 - 10:44

FREMONT, Calif., Sept. 28, 2017 — Exxact Corporation, a provider of high performance computing, today announced its planned production of HPC solutions using the new NVIDIA Tesla V100 GPU accelerator. Exxact will be integrating the Tesla V100 into its Quantum series of servers, which are currently offered with NVIDIA Tesla P100 GPUs. The NVIDIA Tesla V100 was first introduced at the GPU Technology Conference 2017 held in San Jose, California. 

The NVIDIA Tesla V100 is engineered for the convergence of AI and HPC. It offers a platform for Exxact HPC systems to excel at both computational science for scientific simulation and data science for finding insights in data. By pairing NVIDIA CUDA cores and Tensor Cores within a unified architecture, a single Exxact Quantum server with Tesla V100 GPUs can replace hundreds of commodity CPU-only servers for both traditional HPC and AI workloads. Every researcher and engineer can now afford an AI supercomputer to tackle their most challenging work with Exxact Quantum servers featuring NVIDIA Tesla V100 GPUs.

“The NVIDIA Tesla V100 GPU accelerator introduces a new foundation for artificial intelligence.” said Jason Chen, Vice President of Exxact Corporation. “With its key compute features coupled with top tier performance and efficiency, the Tesla V100 GPUs will enable us to create exceptional HPC systems designed to power the most computationally intensive workloads.”

NVIDIA Tesla V100 is the world’s most advanced data center GPU ever built to accelerate AI, HPC, and graphics. Powered by the latest GPU architecture, NVIDIA Volta, Tesla V100 offers the performance of 100 CPUs in a single GPU—enabling data scientists, researchers, and engineers to tackle challenges that were once impossible. The Tesla V100 comes in two form factors:

  • Tesla V100 for NVIDIA NVLink (SXM2): Ultimate performance for deep learning
  • Tesla V100 for PCIe: Highest versatility for all workloads

With 640 Tensor Cores, Tesla V100 is the world’s first GPU to break the 100 teraflops (TFLOPS) barrier of deep learning performance. The next generation of NVIDIA NVLink high-speed interconnect technology connects multiple V100 GPUs at up to 300 GB/s to create the world’s most powerful computing servers. AI models that would consume weeks of computing resources on previous systems can now be trained in a few days. With this dramatic reduction in training time, a whole new world of problems will now be solvable with AI.

Tesla V100 Specifications:

  • 5,120 CUDA cores
  • 640 New Tensor Cores
  • 7.8 TFLOPS double-precision performance with NVIDIA GPU Boost
  • 15.7 TFLOPS single-precision performance with NVIDIA GPU Boost
  • 125 TFLOPS mixed-precision deep learning performance with NVIDIA GPU Boost
  • 300 GB/s bi-directional interconnect bandwidth with NVIDIA NVLink
  • 900 GB/s memory bandwidth with CoWoS HBM2 Stacked Memory
  • 16 GB of CoWoS HBM2 Stacked Memory
  • 300 Watt

About Exxact Corporation

Exxact develops and manufactures innovative computing platforms and solutions that include workstation, server, cluster, and storage products developed for Life Sciences, HPC, Big Data, Cloud, Visualization, Video Wall, and AV applications. With a full range of engineering and logistics services, including consultancy, initial solution validation, manufacturing, implementation, and support, Exxact enables their customers to solve complex computing challenges, meet product development deadlines, improve resource utilization, reduce energy consumption, and maintain a competitive edge. Visit Exxact Corporation at www.exxactcorp.com.

Source: Exxact Corporation

The post Exxact Announces HPC Solutions Featuring NVIDIA Tesla V100 GPU Accelerators appeared first on HPCwire.

AMAX Deep Learning Solutions Upgraded with NVIDIA Tesla V100 GPU Accelerators

Thu, 09/28/2017 - 09:09

FREMONT, Calif., Sept. 28, 2017 — AMAX, a provider of Deep Learning, HPC, Cloud/IaaS servers and appliances, today announced that its GPU solutions, including Deep Learning platforms, are now available with the latest NVIDIA Tesla V100 GPU accelerator. Solutions featuring the V100 GPUs are expected to begin shipping in Q4 2017.

Powered by the new NVIDIA Volta architecture, AMAX’s V100-based computing solutions are the most powerful GPU solutions on the market to accelerate HPC, Deep Learning, and data analytic workloads. The solutions combine the latest Intel Xeon Scalable Processor series with Tesla V100 GPUs to enable 6x the Tensor FLOPS for DL inference when compared to the previous generation NVIDIA Pascal GPUs.

“We are thrilled about the biggest breakthrough we’ve ever seen on data center GPUs,” said James Huang, Product Marketing Manager, AMAX. “This will deliver the most dramatic performance gains and cost savings opportunities for HPC and the AI industry that we cannot wait to see.”

NVIDIA Tesla V100 GPU accelerators are the most advanced data center GPUs ever built to accelerate AI, HPC and graphics applications. Equipped with 640 Tensor Cores, a single V100 GPU offers the performance of up to 100 CPUs, enabling data scientists, researchers, and engineers to tackle challenges that were once thought to be impossible. The V100 features six major technology breakthroughs:

  • New Volta Architecture: By pairing CUDA cores and Tensor Cores within a unified architecture, a single server with Tesla V100 GPUs can replace hundreds of commodity CPU servers for traditional HPC and Deep Learning.
  • Tensor Core: Equipped with 640 Tensor Cores, Tesla V100 delivers 125 TeraFLOPS of deep learning performance. That’s 12X Tensor FLOPS for Deep Learning training, and 6X Tensor FLOPS for DL inference when compared to NVIDIA Pascal GPUs.
  • Next-Generation NVIDIA NVLink Interconnect Technology: NVLink in Tesla V100 delivers 2X higher throughput compared to the previous generation. Up to eight Tesla V100 accelerators can be interconnected at up to 300 GB/s to unleash the highest application performance possible on a single server.
  • Maximum Efficiency Mode: The new maximum efficiency mode allows data centers to achieve up to 40% higher compute capacity per rack within the existing power budget. In this mode, Tesla V100 runs at peak processing efficiency, providing up to 80 percent of the performance at half the power consumption.
  • HBM2: With a combination of improved raw bandwidth of 900 GB/s and higher DRAM utilization efficiency at 95 percent, Tesla V100 delivers 1.5X higher memory bandwidth over Pascal GPUs as measured on STREAM benchmark.
  • Programmability: Tesla V100 is architected from the ground up to simplify programmability. Its new independent thread scheduling enables finer-grain synchronization and improves GPU utilization by sharing resources among small jobs.

AMAX solutions that will feature the V100 include:

  • MATRIX DL-in-a-Box Solutions — The MATRIX Deep-Learning-in-a-Box solutions provide everything a data scientist needs for Deep Learning development. Powered by Bitfusion Flex, the product line encompasses powerful dev workstations, high-compute density servers, and rackscale clusters featuring pre-installed Docker containers with the latest DL frameworks, and GPU virtualization technology to attach local and remote GPUs. The MATRIX solutions can be used as standalone platforms or combined to create the perfect infrastructure for on-premise AI clouds or elastic DL-as-a-Service platforms.
  • [SMART]Rack AI — [SMART]Rack AI is a turnkey Machine Learning cluster for training and inference at scale. The solution features up to 96x NVIDIA® Tesla® GPU accelerators to deliver up to 1344 TFLOPs of compute power when populated with Tesla V100 PCle cards. Delivered plug-and-play, the solution also features an All-Flash data repository, 25G high-speed networking, [SMART]DC Data Center Manager, an In-Rack Battery for graceful shutdown during a power loss scenario.
  • ServMax G480 — The G480 is a robust 4U 8x GPU platform for HPC and Deep Learning workloads, delivering 56 TFLOPs of double precision or 112 TFLOPs of single precision when populated with Tesla V100 PCle cards.

As an Elite member of the NVIDIA Partner Network Program, AMAX is stringent in providing cutting-edge technologies, delivering enhanced, energy-efficient performance for the Deep Learning and HPC industries featuring NVIDIA Tesla V100, P100, P40 GPU accelerators, and NVIDIA DGX systems. AMAX is now accepting pre-orders, quotes and consultations for the Tesla V100-based systems. To learn more about AMAX and GPU solutions, please visit www.amax.com or contact AMAX.

About AMAX

AMAX is an award-winning leader in application-tailored data center, HPC and Deep Learning solutions designed towards highest efficiency and optimal performance. Recognized by several industry awards, including First Place at ImageNet Large Scale Visual Recognition Challenge, AMAX aims to provide cutting-edge solutions to meet specific customer requirements. Whether you are a Fortune 1000 company seeking significant cost savings through better efficiency for your global data centers, or you’re a software startup seeking an experienced manufacturing partner to design and launch your flagship product, AMAX is your trusted solutions provider, delivering the results you need to meet your specific metrics for success. To learn more or request a quote, contact AMAX.

Source: AMAX

The post AMAX Deep Learning Solutions Upgraded with NVIDIA Tesla V100 GPU Accelerators appeared first on HPCwire.

US Coalesces Plans for First Exascale Supercomputer: Aurora in 2021

Wed, 09/27/2017 - 17:34

At the Advanced Scientific Computing Advisory Committee (ASCAC) meeting, in Arlington, Va., yesterday (Sept. 26), it was revealed that the “Aurora” supercomputer is on track to be the United States’ first exascale system. Aurora, originally named as the third pillar of the CORAL “pre-exascale” project, will still be built by Intel and Cray for Argonne National Laboratory, but the delivery date has shifted from 2018 to 2021 and target capability has been expanded from 180 petaflops to 1,000 petaflops (1 exaflop).

The fate of the Argonne Aurora “CORAL” supercomputer has been in limbo since the system failed to make it into the U.S. DOE budget request, while the same budget proposal called for an exascale machine “of novel architecture” to be deployed at Argonne in 2021. Until now, the only official word from the U.S. Exascale Computing Project was that Aurora was being “reviewed for changes and would go forward under a different timeline.”

Officially, the contract has been “extended,” and not cancelled, but the fact remains that the goal of the Collaboration of Oak Ridge, Argonne, and Lawrence Livermore (CORAL) initiative to stand up two distinct pre-exascale architectures was not met.

According to sources we spoke with, a number of people at the DOE are not pleased with the Intel/Cray (Intel is the prime contractor, Cray is the subcontractor) partnership. It’s understood that the two companies could not deliver on the 180-200 petaflops system by next year, as the original contract called for. Now Intel/Cray will push forward with an exascale system that is some 50x larger than any they have stood up.

It’s our understanding that the cancellation of Aurora is not a DOE budgetary measure as has been speculated, and that the DOE and Argonne wanted Aurora. Although it was referred to as an “interim,” or “pre-exascale” machine, the scientific and research community was counting on that system, was eager to begin using it, and they regarded it as a valuable system in its own right. The non-delivery is regarded as disruptive to the scientific/research communities.

Another question we have is that since Intel/Cray failed to deliver Aurora, and have moved on to a larger exascale system contract, why hasn’t their original CORAL contract been cancelled and put out again to bid? With increased global competitiveness, it seems that the DOE stakeholders did not want to further delay the non-IBM/Nvidia side of the exascale track. Conceivably, they could have done a rebid for the Aurora system, but that would leave them with an even bigger gap if they had to spin up a new vendor/system supplier to replace Intel and Cray. Starting the bidding process over again would delay progress toward exascale – and it might even have been the death knell for exascale by 2021, but Intel and Cray now have a giant performance leap to make and three years to do it. Will they stay on the same Phi-based technology path with Knights Hill or come up with something more “novel,” like the co-packaged Xeon/FPGA processor that Intel is working on and which could provide further efficiencies to meet strict exascale power targets.

These events beg the question regarding the IBM-led effort and whether IBM/Nvidia/Mellanox are looking very good by comparison. The other CORAL thrusts — Summit at Oak Ridge and Sierra at Lawrence Livermore — are on track, although it remains to be seen whether one, both or neither of these systems will make the cut for the November Top500 list.

We reached out to representatives from Cray, Intel and the Exascale Computing Project (ECP) seeking official comment on the revised Aurora contract. Cray declined to comment and we did not hear back from Intel or ECP by press time. We will update the story as we learn more.

The post US Coalesces Plans for First Exascale Supercomputer: Aurora in 2021 appeared first on HPCwire.

Pages