HPC Wire

Subscribe to HPC Wire feed
Since 1987 - Covering the Fastest Computers in the World and the People Who Run Them
Updated: 2 hours 40 min ago

AMD EPYC Video Takes Aim at Intel’s Broadwell

Mon, 08/14/2017 - 12:11

Let the benchmarking begin. Last week, AMD posted a YouTube video in which one of its EPYC-based systems outperformed a ‘comparable’ Intel Broadwell-based system on the STREAM benchmark and on a test case running ANSYS’s CFD application, Fluent. The intent was to showcase the new AMD chip’s (EPYC) strength on memory-bound HPC applications.

In the video, presenter Joshua Mora, senior manager field applications engineering, AMD, touts the EYPC’s memory controller and the memory bandwidth delivered. AMD has high hopes for its new EPYC line both in head-to-head competition with Intel as well as potentially creating a single socket market (see HPCwire article, AMD Charges Back into the Datacenter and HPC Workflows with EPYC Processor). Intel, of course, has been busy with its own introductions (see HPCwire article, Intel Unveils Xeon Scalable Processors). AMD EPYC will ultimately have to compete with Intel’s Skylake and IBM’s Power9 chip

The tested Intel system featured Xeon E5-2699 v4 processors (22 cores) and the AMD system featured EPYC 7601 (32 cores). Both were dual socket systems. “It is two clusters, tightly coupled with high speed, low latency InfiniBand interconnect running Windows OS,” according to Mora.

The AMD system was roughly 2X better on the STREAM benchmark which is intended to measure sustainable memory bandwidth. The dual socket Intel system ran at roughly 116 GB/s while the AMD system ran at roughly 266 GB/s. AMD says the STREAM performance is a good proxy for a range of HPC applications. Intel would no doubt offer a different view of the system’s set up and comparability of results.

AMD was roughly 78 percent faster running the Fluent simulation, which was a 14 million cell simulation of various aerodynamic effects on a jet. Mora cited AMD’s greater number of cores as well as its memory bandwidth as factors.

It seems clear AMD is ramping up its effort to win chunks of the datacenter and HPC landscaping following its absence from those markets in recent years.

At the time of EPYC’s launch, Scott Aylor, AMD corporate VP and GM of enterprise solutions business, said “It’s not enough to come back with one product, you’ve got to come back with a product cadence that moves as the market moves. So not only are we coming back with EPYC, we’re also [discussing follow-on products] so when customers move with us today on EPYC they know they have a safe home and a migration path with Rome.” AMD has committed to socket compatibility between EPYC 7000 line and Rome, code name of the next scheduled generation AMD processor aimed at the datacenter.

Based on the Zen core, EPYC is a line of system on a chip (SoC) devices designed with enhanced memory bandwidth and fast interconnect in mind. AMD also introduced a one-socket device, optimized for many workloads, which AMD says will invigorate a viable one-socket server market. With EPYC, “we can build a no compromise one-socket offering that will allow us to cover up to 50 percent of the two-socket market that is today held by the [Intel Broadwell] E5-2650 and below,” said Aylor.

Intel is the giant here, and not standing still. It will be interesting to watch the competition. EPYC seems to be a serious contender, though with a lot of mindshare ground to make up.

The post AMD EPYC Video Takes Aim at Intel’s Broadwell appeared first on HPCwire.

Mellanox to Present at Upcoming Investor Conference

Mon, 08/14/2017 - 09:38

SUNNYVALE, Calif. & YOKNEAM, Israel, Aug. 14, 2017 — Mellanox Technologies, Ltd. (NASDAQ: MLNX), a leading supplier of end-to-end interconnect solutions for servers and storage systems, today announced that it will present at the following conference during the third quarter of 2017:

  • Deutsche Bank Technology Conference in Las Vegas, Nevada, Wednesday, Sept. 13th at 3:20 p.m., Pacific Daylight Time.

When available, a webcast of the live event, as well as a replay, will be available on the company’s investor relations website at: http://ir.mellanox.com.

About Mellanox

Mellanox Technologies (NASDAQ: MLNX) is a leading supplier of end-to-end InfiniBand and Ethernet smart interconnect solutions and services for servers and storage. Mellanox interconnect solutions increase data center efficiency by providing the highest throughput and lowest latency, delivering data faster to applications and unlocking system performance capability. Mellanox offers a choice of fast interconnect products: adapters, switches, software and silicon that accelerate application runtime and maximize business results for a wide range of markets including high performance computing, enterprise data centers, Web 2.0, cloud, storage and financial services. More information is available at: www.mellanox.com.

Source: Mellanox

The post Mellanox to Present at Upcoming Investor Conference appeared first on HPCwire.

AI: Simplifying and Scaling for Faster Innovation

Mon, 08/14/2017 - 01:05

The AI revolution has already begun, right?

In some ways it has. Deep learning applications have already bested humans in complex games, including chess, Jeopardy, Go, and poker, and in practical tasks, such as image and speech recognition. They are also impacting our everyday lives, introducing human-like capabilities into personal digital assistants, online preference engines, fraud detection systems and more.

However, these solutions were developed primarily by organizations with deep pockets, deep expertise and high-end computing resources.[1] For the AI revolution to move into the mainstream, cost and complexity must be reduced, so smaller organizations can afford to develop, train, and deploy powerful deep learning applications.

It’s a tough challenge. Interest in AI is high, technologies are in flux and no one can reliably predict what those technologies will look like even five years from now. How do you simplify and drive down costs in such an inherently complex and changing environment?

Intel has a strategy, and it involves software as much as hardware. It also involves HPC.

Optimized Software Building Blocks that are Flexible—and Fast Figure 1. Intel provides highly-optimized software tools, libraries, and frameworks to simplify the development of fast and scalable AI applications.

Most of today’s deep learning algorithms were not designed to scale on modern computing systems. Intel has been addressing those limitations by working with researchers, vendors and the open-source community to parallelize and vectorize core software components for Intel® Xeon® and Intel® Xeon Phi™ processors.

The optimized tools, libraries, and frameworks often provide order-of-magnitude and higher performance gains, potentially reducing the cost and complexity of the required hardware infrastructure. They also integrate more easily into standards-based environments, so new AI developers have less to learn, deployment is simpler and costs are lower.

Bring AI and HPC Together to Unleash Broad and Deep Innovation Figure 2. Intel® HPC Orchestrator simplifies the design, deployment, and use of HPC clusters, and includes optimized development and runtime support for AI applications.

Optimized software development tools help, but deep learning applications are compute-intensive, data sets are growing exponentially, and time-to-results can be key to success. HPC offers a path to scaling compute power and data capacity to address these requirements.

However, combining AI and HPC brings additional challenges. AI and HPC have grown up in relative isolation, and there is currently limited overlap in expertise between the two areas. Intel is working with both communities to provide a better and more open foundation for mutual development.

Intel is also working to extend the benefits of AI and HPC to a broader audience. One example of this effort is Intel® HPC Orchestrator, an extended version of OpenHPC that provides a complete, integrated system software stack for HPC-class computing. Intel HPC Orchestrator will help the HPC ecosystem deliver value to customers more quickly by eliminating the complex and duplicated work of creating, testing, and validating a system software

Figure 3. The Intel® Scalable System Framework brings hardware and software together to support the next-generation of AI applications on simpler, more affordable, and massively scalable HPC clusters.

stack. Intel has already integrated its AI-optimized software building blocks into Intel HPC Orchestrator to provide better development and runtime environments for AI applications. Work has also been done to optimize other core components, such as MPI, to provide higher performance and better scaling for the data- and compute-intensive demands of deep learning.

Powerful Hardware to Run It All

Of course, AI software can only be as powerful as the hardware that runs it. Intel is delivering disruptive new capabilities in its processors, and supporting them with synchronized advances in workstation and server platforms. Intel engineers are also integrating these advances—along with Intel HPC Orchestrator—into the Intel® Scalable System Framework (Intel SSF), a reference architecture for HPC clusters that are simpler, more affordable, more scalable, and designed to handle the full range of HPC and AI workloads. It’s a platform for the future of AI.

Click on a link to learn more about the benefits Intel SSF brings to AI at each layer of the solution stack: overview, compute, memory, fabric, storage.


[1] For example, Libratus, the application that beat five of the world’s top poker players, was created by a team at Carnegie Mellon University and relied on 600-nodes of a University of Pittsburgh supercomputer for overnight processing during the competition.

The post AI: Simplifying and Scaling for Faster Innovation appeared first on HPCwire.

YT Project Awarded NSF Grant to Expand to Multiple New Science Domains

Fri, 08/11/2017 - 09:39

URBANA, Ill., Aug. 11, 2017 — The yt Project, an open science environment created to address astrophysical questions through analysis and visualization, has been awarded a $1.6 million dollar grant from the National Science Foundation(NSF) to continue developing their software project. This grant will enable yt to expand and begin to support other domains beyond astrophysics, including weather, geophysics and seismology, molecular dynamics and observational astronomy. It will also support the development of curricula for Data Carpentry, to ease the onramp for scientists new to data from these domains.

The yt project, led by Matt Turk along with Nathan Goldbaum, Kacper Kowalik, and Meagan Lang at the National Center for Supercomputing Applications (NCSA) at the University of Illinois’ Urbana campus and in collaboration with Ben Holtzman at Columbia University in the City of New York and Leigh Orf at the University of Wisconsin-Madison, is an open source, community-driven project working to produce an integrated science environment for collaboratively asking and answering questions about simulations of astrophysical phenomena, leading to the application of analysis and visualizations to many different problems within the field. It is built in an ecosystem of packages from the scientific software community and is committed to open science principles and emphasizes a helpful community of users and developers. Many theoretical astrophysics researchers use yt as a key component of all stages of their computational workflow, from debugging to data exploration, to the preparation of results for publication.

yt has been used for projects within astrophysics as diverse as studying mass-accretion onto the first stars in the Universe, to studying the outflows from compact objects and supernovae, to the star formation history of galaxies. It has been used to analyze and visualize some of the largest simulations ever conducted, and visualizations generated by yt have been featured in planetarium shows such as Solar Superstormscreated by the Advanced Visualization Lab at NCSA.

“I’m delighted and honored by this grant, and we hope it will enable us to build, sustain and grow the thriving open science community around yt, and share the increase in productivity and discovery made possible by yt in astrophysics with researchers across the physical sciences,” said Principal Investigator Matt Turk.

This NSF SI2-SSI award is expected to last from October 2017 – September 2022. A copy of the grant proposal may be found here.

Source: NCSA

The post YT Project Awarded NSF Grant to Expand to Multiple New Science Domains appeared first on HPCwire.

Nvidia Records Record Revenue for Second Quarter Fiscal 2018

Thu, 08/10/2017 - 20:21

August 10, 2017 — NVIDIA today reported record revenue for the second quarter ended July 30, 2017, of $2.23 billion, up 56 percent from $1.43 billion a year earlier, and up 15 percent from $1.94 billion in the previous quarter.

  • Record revenue of $2.23 billion, up 56 percent from a year ago
  • GAAP EPS of $0.92, up 124 percent from a year ago
  • Non-GAAP EPS of $1.01, up 91 percent from a year ago
  • Broad growth across all platforms

GAAP earnings per diluted share for the quarter were $0.92, up 124 percent from $0.41 a year ago and up 16 percent from $0.79 in the previous quarter. Non-GAAP earnings per diluted share were $1.01, up 91 percent from $0.53 a year earlier and up 19 percent from $0.85 in the previous quarter.

“Adoption of NVIDIA GPU computing is accelerating, driving growth across our businesses,” said Jensen Huang, founder and chief executive officer of NVIDIA. “Datacenter revenue increased more than two and a half times. A growing number of car and robot-taxi companies are choosing our DRIVE PX self-driving computing platform. And in Gaming, increasingly the world’s most popular form of entertainment, we power the fastest growing platforms – GeForce and Nintendo Switch.

“Nearly every industry and company is awakening to the power of AI. Our new Volta GPU, the most complex processor ever built, delivers a 100-fold speedup for deep learning beyond our best GPU of four years ago. This quarter, we shipped Volta in volume to leading AI customers. This is the era of AI, and the NVIDIA GPU has become its brain. We have incredible opportunities ahead of us,” he said.

Capital Return

During the first half of fiscal 2018, NVIDIA paid $758 million in share repurchases and $166 million in cash dividends. For fiscal 2018, NVIDIA intends to return $1.25 billion to shareholders through ongoing quarterly cash dividends and share repurchases.

NVIDIA will pay its next quarterly cash dividend of $0.14 per share on September 18, 2017, to all shareholders of record on August 24, 2017.

For more details and second quarter highlights, see the entire Nvidia press release.

Source: Nvidia Corp.

The post Nvidia Records Record Revenue for Second Quarter Fiscal 2018 appeared first on HPCwire.

Livermore Computing, Reddit Asked Them Anything

Thu, 08/10/2017 - 18:57

In case you missed it, the staff of Livermore Computing (LC) at the Lawrence Livermore National Laboratory (LLNL) recently fielded some questions from the internet, part of Reddit’s Science Ask Me Anything (AMA) series. Livermore is home to Sequoia, currently the fifth fastest machine in the world, benchmarked at 17 Linpack petaflops. The IBM BlueGene/Q machine enables the National Nuclear Security Administration (NNSA) to fulfill its stockpile stewardship mission through simulation in lieu of underground testing.

This fall Livermore expects to take delivery of Sierra, the pre-exascale supercomputer that is part of the tri-lab CORAL collaboration. Built by IBM and Nvidia, Sequoia will offer about six times more computing power than Sequoia with a planned peak computational capacity of ~150 petaflops. Like Sequoia, Sierra will serve the NNSA’s program to ensure the safety, security, and effectiveness of the nation’s nuclear deterrent without testing.

Below we round up a few highlights from the AMA — so read on to find out what Livermore Computing representatives think about AI, how they are preparing for the coming Sierra system and the benefits of being in wine country. Plus, they answer the all-important question: how much money could Sequoia make mining bitcoins (hypothetically)?


What is one thing about your work that the general public pushes back against and what one thing would you like them to understand?


Sometimes people ask us why we need ever more powerful supercomputers. No matter how much computing power we have provided, our national missions create a need for more complex simulations and thus a need for more powerful supercomputers.

Find more information on the nation’s exascale computing project see https://exascaleproject.org/. To see examples of application areas requiring exascale computing, check out the ECP Research Areas.


Do the future prospects of Artificial Intelligence ring a positive or negative tune, and why?


It depends on how AI is used and the intentions of the user. With the increasing complexity of programs (multiple millions of lines of code), we might reach a point where further development of some applications require help from an AI agent. One useful area to explore is monitoring the condition of supercomputers and predicting failures. We will always need more intelligence, both human and artificial. Several LLNL projects are investigating AI use in HPC. One effort leverages the TrueNorth brain-inspired supercomputer, you can read more at https://www.llnl.gov/news/lawrence-livermore-and-ibm-collaborate-build-new-brain-inspired-supercomputer.


Is alcohol bad for computing? Or does it hurt your Livermore?


Alcohol is bad for computers, especially the non-liquid cooled ones. Livermore is known for its wine. Alcohol is good for computer scientists. See the following for reference: https://xkcd.com/323/


What’s the silliest things that can be done on supercomputers?


We often need to test the heat tolerance of supercomputers, so one of our engineers was asked to write a computation to generate heat. Not do otherwise productive work, just get the computer as hot as we possibly could.


This is a kinda dumb question but how do super computers work


No such thing as a dumb question, only an opportunity to learn! Supercomputers or HPC, work by distributing a task across numerous computers. They benefit by high speed communications between these computers working on the parts of the task. They also leverage message passing between the processes on the various computers to coordinate working on the overall task. This high speed, coordinated, distributed computation across numerous computers is what makes a supercomputer work.


How excited are you for Sierra?


We are extremely excited for Sierra!

We have people developing tools, programming models, porting millions of lines of code, and just generally trying new and interesting things. A lot of them open source. We have sysadmins getting familiar with new hardware, our networking folks are looking at how to make networks keep up with the incredible computational speedup. We also have several teams focused on helping applications developers prepare for running efficiently on Sierra.

Power efficiency is extremely important. We are proud to be active contributors, users, and supporters of the open source community. For info on our software go to https://software.llnl.gov/.


How do you start working in your field? What degrees or certifications are you looking for in perspective candidates?


Well… we have a geologist who is a Linux kernel hacker!

Seriously, we have a range of degrees represented across Livermore Computing. Some of us have no degree at all. We have employees with associates degrees, bachelors, masters, and PhD’s. Some are straight out of school, others have been in HPC since before the internet existed.

Fields range from Computer Science and Computer Engineering, to Mathematics and Statistics degrees. We also staff who come from the sciences directly, including those from physical and life sciences such as Computational Biology, Physics, and others!

Look for employment opportunities here: http://careers-ext.llnl.gov/ Apply for an internship: http://students.llnl.gov/ Here’s the page for this summer’s HPC Cluster Engineer Academy:


This is where my uncle works! Say hi to Greg T for me!!!

Now for my question: -Given the rise of cryptocurrency and it’s dependence on computing power to solve blocks and earn currency. Are there any plans to use existing supercomputing power to mine for cryptocurrency?


Greg says hi!

DOE supercomputers are government resources for national missions. Bitcoin mining would be a misuse of government funds.

In general, though, it’s fun to think about how you could use lots of supercomputing power for Bitcoin mining, but even our machines aren’t big enough to break the system. The number of machines mining bitcoin worldwide has been estimated to have a hash rate many thousands of times faster than all the Top 500 machines combined, so we wouldn’t be able to decide to break the blockchain by ourselves (https://www.forbes.com/sites/peterdetwiler/2016/07/21/mining-bitcoins-is-a-surprisingly-energy-intensive-endeavor/2/#6f0cae8a30f3). Also, mining bitcoins requires a lot of power, and it’s been estimated that even if you used our Sequoia system to mine bitcoin, you’d only make $40/day (https://bitcoinmagazine.com/articles/government-bans-professor-mining-bitcoin-supercomputer-1402002877/). The amount we pay every day to power the machine is a lot more than that. So even if it were legal to mine bitcoins with DOE supercomputers, there’d be no point. The most successful machines for mining bitcoins use low-power custom ASICs built specifically for hashing, and they’ll be more cost-effective than a general purpose CPU or GPU system any day.


Do you have to get every response reviewed and approved before you post it?


This answer is currently under review… Yes.


Read the entire AMA here.

The Livermore Computing group hopes to do another AMA in the future, so start thinking of your questions.

Livermore Computing staff in front of Sequoia

The post Livermore Computing, Reddit Asked Them Anything appeared first on HPCwire.

One Stop Systems Introduces the 4U Value Expansion System

Thu, 08/10/2017 - 14:07

ESCONDIDO, Calif., Aug. 10, 2017 — One Stop Systems, Inc. (OSS) introduces the 4U Value Expansion System (4UV). The 4UV is a flexible 18-slot rackmount expansion platform that provides thousands of expansion possibilities at a value price. The two standard configurations support up to eight GPUs or up to sixteen PCIe NVMe SSDs with almost any x86 or Power server on the market today.

The first configuration allows any server to add up to 16 PCIe NVMe add-in card (AIC) SSDs to any node by utilizing 16 PCIe 3.0 slots, two host connections and 2000W of redundant, load-sharing, hot-swap power supplies.  By using AIC NVMe SSDs in an expansion system like the 4UV, servers are not limited to only a few SSDs due to lack of available slots. AIC NVMe SSDs outperform 2.5” NVMe SSDs in throughput, available PCIe lanes and capacities. The 4UV allows any server to scale AIC SSDs into large all-flash arrays for the best performance possible. The second configuration favors high powered compute accelerators such as GPUs and FPGAs by combining ten PCIe 3.0 slots with two PCIe 3.0 host connections and 4000W of load sharing power. This configuration provides support for the highest powered, two slot wide GPUs on the market today allowing for the most TFLOPS of compute power per node. In addition, two fan choices allow for both high-capacity GPU systems or manual speed control PWM capability in the same 4UV platform.

“One Stop Systems has combined our expertise in PCIe expansion, our history of custom products and our penchant for high-density, all at a value price,” said Steve Cooper, CEO of OSS. “The Value Expansion System is ideal for customers on a tight budget who need high-density PCIe expansion. Customers can utilize the 4UV for GPUs, flash or a combination of both, providing performance gains in many applications like deep learning, oil and gas exploration, financial calculations, and video rendering.”

Pricing for the 4UV’s two standard configurations starts at $7,995 and are available to order online at https://www.onestopsystems.com/flash-storage-expansion for the flash version and https://www.onestopsystems.com/gpu-expansion for the GPU version. OSS sales engineers are available to assist with pricing for custom configurations.

About One Stop Systems

One Stop Systems designs and manufactures ultra-dense high performance computing (HPC) systems for deep learning, oil and gas exploration, financial trading, media and entertainment, defense and other applications requiring the fastest and most efficient data processing. By utilizing the power of the latest GPU accelerators and flash storage cards, our systems stay on the cutting edge of the latest technologies. We have a reputation as innovators using the very latest technology and design equipment to operate with the highest efficiency. Now OSS offers these exceptional systems to customers who prefer to lease time on them instead of or in addition to purchasing them. OSS is always working to meet our customers’ greater needs. For more information, visit www.onestopsystems.com.

Source: One Stop Systems

The post One Stop Systems Introduces the 4U Value Expansion System appeared first on HPCwire.

SIGCOMM 2017 Showcases Latest in Computer Networking

Thu, 08/10/2017 - 14:04

NEW YORK, Aug. 10, 2017 — The Association for Computing Machinery’s (ACM) Special Interest Group on Data Communication (SIGCOMM) today announced highlights of SIGCOMM 2017, its annual flagship conference to be held at the University of California, Los Angeles (UCLA) August 21 – 25, 2017. The five-day conference will bring together scholars, practitioners, and students from around the world to discuss the latest in in the field of communications and computer networks.

Communication networks and their underlying infrastructure continue to evolve with the rise of the information economy, bringing new challenges as well as new avenues for development. In addition to the main conference, SIGCOMM 2017 will also feature nine workshops and eight tutorials on the latest advances in communication networks, including kernel-bypass networks; Big Data analytics and machine learning; networking and programming languages; and virtual/augmented reality, among others.

“As communication over the Internet becomes ubiquitous, innovations in the communications infrastructure are driving faster speeds, lower latency, and new services, as well as improved reliability and security,” said SIGCOMM 2017 General Co-chair K.K. Ramakrishnan of the University of California, Riverside. “At SIGCOMM 2017, we look forward to showcasing innovative research related to technical design and engineering, regulation, operations and the social implications of computer networking.”

Conference organizers accept research submissions on areas ranging from network architecture and design to analysis, measurement and simulation. “On average, 30-50 papers are selected for presentation at SIGCOMM from the hundreds of papers that are submitted,” added General Co-chair Lixia Zhang. “The recent growth of the conference and the competitive nature of the selection process mean that attendees have the opportunity to access the most current and ground-breaking research in the field.”

2017 ACM SIGCOMM Highlights 

The main conference will open on Tuesday, August 22 with a keynote address by the 2017 SIGCOMM Award recipient, Raj Jain (Washington University, St. Louis). The annual award recognizes lifetime contributions to the field of communication networks and was awarded to Jain “for life-long contributions to computer networking including traffic management, congestion control, and performance analysis.”

As a SIGCOMM first, on Wednesday, August 23, the conference will also feature a keynote talk by Jennifer Rexford (Princeton University), the winner of the 2016 ACM Athena Lecturer Award. This ACM award celebrates women researchers who have made fundamental contributions to computer science and was awarded last year to Rexford “for innovations that improved the efficiency of the Border Gateway Protocol (BGP) in routing Internet traffic, for laying the groundwork for software-defined networks (SDNs), and for contributions in measuring and engineering IP networks.”

Other awards that will be presented during SIGCOMM 2017 include the Test of Time Paper Award and the Doctoral Dissertation Award. The Test of Time award recognizes a paper published 10 to 12 years in the past in Computer Communication Review or any SIGCOMM-sponsored or co-sponsored conference that is deemed to be an outstanding paper whose contents are still a vibrant and useful contribution today.

The recipients of the 2017 SIGCOMM Test of Time Paper Award are “Ethane: Taking Control of the Enterprise” by Martin Casado, Michael J. Freedman, Justin Pettit, Jianying Luo, Nick McKeown, Scott Shenker (SIGCOMM 2007) and ”Measurement and Analysis of Online Social Networks”, by Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee (IMC 2007).

The Doctoral Dissertation Award recognizes excellent thesis research by doctoral candidates in the field of computer networking and data communication. The 2016 award winners are Justine Sherry for her dissertation “Middleboxes as a Cloud Service” and Vamsi Talla for his dissertation “Power Communication and Sensing Solutions for Energy Constrained Platforms.”

In addition to the two keynote talks, the main conference will feature 11 technical sessions that showcase the research findings described in the 36 accepted papers (out of 250 submissions). Among the topics covered by these technical sessions are programmable devices, network function virtualization, network monitoring, network verification, protocol design, data centers, wireless communication, internet peering, and internet routing. Rounding out the technical program are a session highlighting the “Best of CCR,” (CCR is the ACM SIGCOMM newsletter), poster and demo sessions, and topic preview sessions.

Workshops and Tutorials 

SIGCOMM 2017 also features nine full-day workshops and three full-day and five half-day tutorials. The workshops cover the following topics:

– Mobile edge communication

– Kernel-bypass networks

– Big Data analytics and machine learning

– Internet Quality-of-Experience

– Networking and programming languages

– Container networking

– Mobility in the evolving Internet architecture

– Virtual reality and augmented reality networks

– Reproducibility

The tutorials introduce participants to topics such as:

– Millimeter-wave wireless networking and sensing

– Named data networking

– Adaptive streaming

– P4 -> NetFPGA

– Programming the data plane

– Understanding latency

– The Netmap framework for NFV applications

– Low latency communication for connected cars

Additional information about SIGCOMM 2017, including a full program and schedule of events, may be found athttp://conferences.sigcomm.org/sigcomm/2017.


SIGCOMM (http://www.sigcomm.org/) is ACM’s professional forum for the discussion of topics in the field of communications and computer networks, including technical design and engineering, regulation and operations, and the social implications of computer networking. The SIG’s members are particularly interested in the systems engineering and architectural questions of communication.

About ACM 

ACM, the Association for Computing Machinery (www.acm.org), is the world’s largest educational and scientific computing society, uniting computing educators, researchers and professionals to inspire dialogue, share resources and address the field’s challenges. ACM strengthens the computing profession’s collective voice through strong leadership, promotion of the highest standards, and recognition of technical excellence. ACM supports the professional growth of its members by providing opportunities for life-long learning, career development, and professional networking.

Source: ACM

The post SIGCOMM 2017 Showcases Latest in Computer Networking appeared first on HPCwire.

OSC Helps Researchers Unveil Most Accurate Map of the Invisible Universe

Thu, 08/10/2017 - 10:38

COLUMBUS, Ohio, Aug. 10, 2017 — The Ohio Supercomputer Center played a critical role in helping researchers reach a milestone mapping the growth of the universe from its infancy to present day.

The new results released Aug. 3 confirm the surprisingly simple but puzzling theory that the present universe is composed of only 4 percent ordinary matter, 26 percent mysterious dark matter, and the remaining 70 percent in the form of mysterious dark energy, which causes the accelerating expansion of the universe.

The findings from researchers at The Ohio State University and their colleagues from the Dark Energy Survey (DES) collaboration are based on data collected during the first year of the DES, which covers more than 1,300 square degrees of the sky or about the area of 6,000 full moons. DES uses the Dark Energy Camera mounted on the Blanco 4m telescope at the Cerro Tololo Inter-American Observatory high in the Chilean Andes.

According to Klaus Honscheid, Ph.D., professor of physics and leader of the Ohio State DES group, OSC was critical to getting the research done in a timely manner. His computational specialists – Michael Troxel and Niall MacCrann, postdoctoral fellows – used an estimated 300,000 core hours on OSC’s Ruby Cluster through a condo arrangement between OSC and Ohio State’s Center of Cosmology and Astro-Particle Physics (CCAPP).

The team took advantage of OSC’s Anaconda environment for standard work, Anaconda, an open-source package of the Python and R programming languages for large-scale data processing, predictive analytics and scientific computing. The group then used its own software to evaluate the multi-dimensional parameter space using Markov Chain Monte Carlo techniques, which is used to generate fair samples from a probability. The team also ran validation code, or null tests, for object selection and fitting code to extract information about objects in the images obtained by simultaneously fitting the same object in all available exposures of the particular object.

The bulk of the team’s 4 million computational allocations are at the National Energy Research Scientific Computing Center (NERSC), a federal supercomputing facility in California. However, due to a backlog at NERSC, OSC’s role became key.

According to Honscheid, for the next analysis round the team is considering increasing the amount of work done through OSC. The total survey will last five years, he said, meaning the need for high performance computing will only increase.

In order to collect the data, the team built an incredibly powerful camera for the Blanco 4m telescope.

“We had to construct the most powerful instrument of its kind. It is sensitive enough to collect light from galaxies 8 billion light years away,” said Honscheid.

Key components of the 570 mega-pixel camera were built at Ohio State.

Paradoxically, it is easier to measure the structure of the universe in the distant past than it is to measure it today. In the first 400,000 years after the Big Bang, the universe was filled with a glowing gas, the light from which survives to this day. This cosmic microwave background (CMB) radiation provides a snapshot of the universe at that early time. Since then, the gravity of dark matter has pulled mass together and made the universe clumpier. But dark energy has been fighting back, pushing matter apart. Using the CMB as a start, cosmologists can calculate precisely how this battle plays out over 14 billion years.

“With the new results, we are able for the first time to see the current structure of the universe with a similar level of clarity as we can see its infancy. Dark energy is needed to explain how the infant universe evolved to what we observe now,” said MacCrann, a major contributor to the analysis.

DES scientists used two methods to measure dark matter. First, they created maps of galaxy positions as tracers; secondly, they precisely measured the shapes of 26 million galaxies to directly map the patterns of dark matter over billions of light years, using a technique called gravitational lensing. Ashley Ross of CCAPP, leader of the DES large-scale structure working group, said “For the first time we were able to perform these studies with data from the same experiment allowing us to obtain the most accurate results to date.”

To make these ultra-precise measurements, the DES team developed new ways to detect the tiny lensing distortions of galaxy images, an effect not even visible to the eye, enabling revolutionary advances in understanding these cosmic signals. In the process, they created the largest guide to spotting dark matter in the cosmos ever drawn (see image). The new dark matter map is 10 times the size of the one DES released in 2015 and will eventually be three times larger than it is now.

A large scientific team achieved these results working in seven countries across three continents.

“Successful collaboration at this scale represents many years of deep commitment, collective vision, and sustained effort,” said Ami Choi, CCAPP postdoctoral fellow who worked on the galaxy shape measurements.

Michael Troxel, CCAPP postdoctoral fellow and leader of the weak gravitational lensing analysis, added, “These results are based on unprecedented statistical power and detailed understanding of the telescope and potential biases in the analysis. Crucially, we performed a ‘blind’ analysis, in which we finalized all aspects of the analysis before we knew the results, thereby avoiding confirmation biases.”

The DES measurements of the present universe agree with the results obtained by the Planck satellite that studied the cosmic microwave background radiation from a time when the universe was just 400,000 years old.

“The moment we realized that our measurement matched the Planck result within 7% was thrilling for the entire collaboration,” said Honscheid. “And this is just the beginning for DES with more data already observed. With one more observing season to go, we expect to ultimately use five times more data to learn more about the enigmatic dark sector of the universe.”

The new results from the Dark Energy Survey will be presented by Kavli fellow Elisabeth Krause at the TeV Particle Astrophysics Conference in Columbuson Aug. 9, and by CCAPP’s Troxel at the International Symposium on Lepton Photon Interactions at High Energies in Guanzhou, China, on Aug. 10.

The publications can be accessed on the Dark Energy Survey website.

Ohio State University is an institutional member of the Dark Energy Survey collaboration. Funding for this research comes in part from  Ohio State’s Center for Cosmology and Astro-Particle Physics. The Ohio Supercomputer Center provided a portion of the computing power for this project.

The Ohio State DES team includes Honscheid; Paul Martini and David Weinberg, both professors of astronomy; Choi, Ross, MacCrann, and Troxel, all postdoctoral fellows at CCAPP; and doctoral students Su-Jeong Lee and Hui Kong.

Source: OSC

The post OSC Helps Researchers Unveil Most Accurate Map of the Invisible Universe appeared first on HPCwire.

Cavium Announces Support for FC-NVMe Standard

Thu, 08/10/2017 - 10:07

SAN JOSE, Calif., Aug. 10, 2017 — Cavium, Inc. (NASDAQ: CAVM) a leading provider of semiconductor products that enable secure and intelligent processing for enterprise, data center, cloud, wired and wireless networking, announces support for the newly ratified NVMe over Fibre Channel (FC-NVMe) 1.0 standard. Cavium’s industry-leading QLogic 2700 series Gen 6 and 2690 Series Enhanced Gen 5 Fibre Channel host bus adapters (HBAs) support connecting NVMe storage over Fibre Channel networks concurrently with the existing storage by using the updated firmware and drivers.

The new Cavium FC-NVMe drivers are targeted towards customers connecting flash-based storage arrays to servers over Fibre Channel networks. Pre-release software for the Linux OS is available for download by contacting Cavium Sales. Three usage scenarios are supported:

  • Initiator mode: Drivers and firmware for hosts containing initiator mode QLogic 2700 Series Gen 6 or 2690 Series Enhanced Gen 5 HBAs
  • Target mode: Drivers for storage servers or array controllers containing target mode QLogic 2700 Series Gen 6 or 2690 Series Enhanced Gen 5 HBAs
  • Target mode (SPDK): Drivers (based on user mode SPDK technology) for storage servers or array controllers containing target mode QLogic 2700 Series Gen 6 or 2690 Series Enhanced Gen 5 HBAs

INCITS/T11 committee for Fibre Channel Interfaces recently approved the standard ensuring concurrent support and interoperability for FC-NVMe and the existing Fibre Channel protocols. Craig Carlson, Chairman of the T11 standards committee on FC-NVMe, said, “Cavium was a leading contributor to the definition and development of the FC-NVMe standard. Cavium actively participated by contributing resources and talent to the effort, and when the committee was investigating the new technology, Cavium engineers took many of the ideas and created POC code and provided valuable and timely feedback.”

Workloads that demand higher throughput, IOPs and lower latency are moving to flash. The NVMe protocol has been designed from the ground up for flash, and features deep parallelism, random access, and allows access to flash over PCI Express (PCIe) to maximize bandwidth. FC-NVMe extends these benefits over a Fibre Channel fabric. The low latency, lossless and efficient data handling capabilities of Fibre Channel are ideally suited to extend the performance and latency advantages of NVMe over a network.

Key benefits of adopting Cavium Fibre Channel HBAs for Flash environments include:

  • Technology leadership: Cavium chairs the T11 committee working group that developed the FC-NVMe standards, and contributed significant resources to work with ecosystem partners to develop this technology.
  • Innovation – Cavium is driving innovation in software defined storage platforms with target mode drivers for the Storage Performance Developer Kit (SPDK) project, which provides high-performance, user-space device drivers enabling the next generation of storage platforms.
  • Performance: Cavium’s QLogic Fibre Channel adapters have shown to deliver exceptional performance of up to 2.6 million IOPS, and low latency, enabling the most demanding enterprise workloads.
  • Standards compliance: Cavium QLogic solutions meet IT standards for interoperability and support. This ensures that customers can deploy Fibre Channel storage of their choice without worries about vendor lock-in or limited choice.
  • Investment protection: With Cavium technology, FC-NVMe workloads can be seamlessly introduced into existing FCP-SCSI fabrics. With QLogic 2700 and 2690 Series FC HBAs, FCP-SCSI and FC-NVMe protocol traffic can run concurrently without requiring any rip and replace of existing infrastructure.
  • Advanced SAN fabric management: Cavium QLogic StorFusion technology delivers a full suite of diagnostics, rapid provisioning and Quality of Service (QoS) throughout the fabric which automate and simplify SAN deployment and orchestration.

“NVMe is a great advancement for the storage industry, driving down latencies and increasing IOPs.  And FC-NVMe is an ideal storage protocol to take advantage of this technology transition. We believe Fibre Channel is the right choice for NVMe storage because of its deterministic performance, resiliency, reliability and ubiquitous presence in the data center,” said Vikram Karvat, Vice President and General Manager, Cavium Fibre Channel Storage Group.

About Cavium

Cavium, Inc. (NASDAQ: CAVM), offers a broad portfolio of infrastructure solutions for compute, security, storage, switching, connectivity and baseband processing. Cavium’s highly integrated multi-core SoC products deliver software compatible solutions across low to high performance points enabling secure and intelligent functionality in Enterprise, Data Center and Service Provider Equipment. Cavium processors and solutions are supported by an extensive ecosystem of operating systems, tools, application stacks, hardware reference-designs and other products. Cavium is headquartered in San Jose, CA with design centers in California, Massachusetts, India, Israel, China and Taiwan. For more information about the Company, please visit www.cavium.com.

Source: Cavium

The post Cavium Announces Support for FC-NVMe Standard appeared first on HPCwire.

Galactic Winds Push Researchers to Probe Galaxies at Unprecedented Scale

Wed, 08/09/2017 - 21:32

August 9 — When astronomers peer into the universe, what they see often exceeds the limits of human understanding. Such is the case with low-mass galaxies—galaxies a fraction of the size of our own Milky Way.

These small, faint systems made up of millions or billions of stars, dust, and gas constitute the most common type of galaxy observed in the universe. But according to astrophysicists’ most advanced models, low-mass galaxies should contain many more stars than they appear to contain.

A leading theory for this discrepancy hinges on the fountain-like outflows of gas observed exiting some galaxies. These outflows are driven by the life and death of stars, specifically stellar winds and supernova explosions, which collectively give rise to a phenomenon known as “galactic wind.” As star activity expels gas into intergalactic space, galaxies lose precious raw material to make new stars. The physics and forces at play during this process, however, remain something of a mystery.

To better understand how galactic wind affects star formation in galaxies, a two-person team led by the University of California, Santa Cruz, turned to high-performance computing at the Oak Ridge Leadership Computing Facility (OLCF), a US Department of Energy (DOE) Office of Science User Facility located at DOE’s Oak Ridge National Laboratory (ORNL). Specifically, UC Santa Cruz astrophysicist Brant Robertson and University of Arizona graduate student Evan Schneider (now a Hubble Fellow at Princeton University), scaled up their Cholla hydrodynamics code on the OLCF’s Cray XK7 Titan supercomputer to create highly detailed simulations of galactic wind.

“The process of generating galactic winds is something that requires exquisite resolution over a large volume to understand—much better resolution than other cosmological simulations that model populations of galaxies,” Robertson said. “This is something you really need a machine like Titan to do.”

After earning an allocation on Titan through DOE’s INCITE program, Robertson and Schneider started small, simulating a hot, supernova-driven wind colliding with a cool cloud of gas across 300 light years of space. (A light year equals the distance light travels in 1 year.) The results allowed the team to rule out a potential mechanism for galactic wind.

Now the team is setting its sights higher, aiming to generate nearly a trillion-cell simulation of an entire galaxy, which would be the largest simulation of a galaxy ever. Beyond breaking records, Robertson and Schneider are striving to uncover new details about galactic wind and the forces that regulate galaxies, insights that could improve our understanding of low-mass galaxies, dark matter, and the evolution of the universe.

Lead InstitutionUniversity of California, Santa Cruz Read morehttps://www.olcf.ornl.gov/2017/08/08/galactic-winds-push-researchers-to-probe-galaxies-at-unprecedented-scale/

Source: Oak Ridge Leadership Computing Facility

The post Galactic Winds Push Researchers to Probe Galaxies at Unprecedented Scale appeared first on HPCwire.

Oak Ridge to Cut Up to 350 Jobs in 2017; Will Other Labs Follow Suit?

Wed, 08/09/2017 - 11:00

It’s not yet clear if staff cuts announced yesterday at Oak Ridge National Laboratory are just the first of others at other national labs as the Department of Energy feels pressure from President Trump to cut costs. ORNL director Thomas Zacharia announced the planned layoffs in an email to ORNL employees Tuesday morning.

“From time to time, sustaining our work effectively and efficiently requires the most difficult of decisions, which is to reduce our staff in certain areas of the lab. To allow us to provide for our research missions and to allocate resources most productively, the Department of Energy has approved a Workforce Restructuring Plan proposed by UT-Battelle that will reduce ORNL’s workforce by up to 350 positions by the end of the calendar year,” wrote Zacharia.

A brief account of the planned layoffs and the text of Zacharia’s email is posted on the Oak Ridge Today news outlet.

President Trump’s initial budget called for slashing about $900 million from DOE’s Office of Science including $185 million from ORNL’s budget. Worry has permeated the academic and government science communities since (see HPCwire articles: Trump Budget Targets NIH, DOE, and EPA; No Mention of NSF; Exascale Escapes 2018 Budget Axe; Rest of Science Suffers). The targeted 350 ORNL jobs are fewer than the 1,600 number floated by Senator Diane Feinstein (D-CA) back in June. ORNL employs roughly 4,800 people.

The final U.S. FY2018 budget numbers are not yet settled although the new budget is supposed to take effect October 1.  In any event, ORNL is moving quickly. Meetings to explain the cuts are planned to start this week. The hope is voluntary cuts will be sufficient. The cuts are planned to be completed by year-end.

Here’s a bit more from Zacharia’s email:

ORNL Director Thomas Zacharia

“These reductions will be made primarily among staff who charge to indirect accounts, along with some research staff affected by FY17 funding who could not be placed elsewhere in the Lab. By reducing these positions, ORNL will be able to maintain competitive chargeout rates while freeing resources for discretionary investments that will modernize Lab infrastructure and maintain core research capabilities in the mission areas assigned to ORNL.

Initially, a Self-Select Voluntary Separation Program (VSP) will open on Monday, August 14. Employees can apply for the VSP from August 14 to September 27. Management reserves the right to deny any application, and employees will be notified whether their application has been accepted. Accepted employees will leave the payroll by December 31.”

Link to Oak Ridge Today account: http://oakridgetoday.com/2017/08/08/ornl-reduce-workforce-350-end-year/

The post Oak Ridge to Cut Up to 350 Jobs in 2017; Will Other Labs Follow Suit? appeared first on HPCwire.

Supermicro Previews 1U Petabyte NVMe Storage Supporting “Ruler” Form Factor for Intel SSDs at FMS

Wed, 08/09/2017 - 09:46

SAN JOSE, Calif., Aug. 9, 2017 — Super Micro Computer, Inc. (NASDAQ: SMCI) will showcase its petabyte scale all-flash Non-Volatile Memory Express (NVMe) system at Flash Memory Summit at the Santa Clara Convention Center in Santa Clara, California from August 8th through August 10th, 2017.

With a total of 32 “ruler” form factor SSDs in a 1U system, Supermicro’s new NVMe solution will provide all NVMe capacity at petabyte scale in 1U of rack space as the company plans to support 32TB Rulers in the near future. Compared to current U.2 SSD 2U storage systems, the new “ruler” form factor for Intel SSDs delivers more than double the capacity per rack unit and is 40% more thermal efficient.

Today at FMS 2017, Supermicro announced its role as a key technology partner to Intel as the company rolls out its “ruler” form factor technology.  NVMe technology was developed to unleash the best possible latency and provide faster CPU to data storage performance for advanced computing. The new “ruler” form factor optimizes rack efficiency, delivers unparalleled space-efficient capacity, and simplifies serviceability.

“Our new ‘ruler’ based system with 32 ‘ruler’ form factor SSDs in 1U is the latest example of how Supermicro continues to push the innovation envelope for NVMe technology,” said Charles Liang, President and CEO of Supermicro.  “With more than double the capacity and 40% more thermal efficiency, this Supermicro ‘ruler’ system will take us to Petabyte scale in a single 1U system in the near future – an unimaginable territory just a short time ago.”

“The ‘ruler’ form factor for Intel SSD is designed from the ground up with today’s data centers’ needs in mind, and brings dense storage and efficient management on a massive scale to the data center, breaking free from the legacy of hard drives and add-in cards,” said Bill Leszinske VP, Strategic Planning and Business Development at Intel’s NVM Solutions Group. “We are excited to, once again, transform the way data is stored, build on our long history of storage innovation and see tomorrow’s groundbreaking solutions delivered today using this technology.”

With over 100 NVMe based platforms in its X11 server and storage portfolio, Supermicro is continuously extending its position as the technology innovation leader in NVMe servers and storage.  For example, the Supermicro BigTwin system supports up to 24 NVMe drives in 2U as well as 24 memory modules per node.

“Supermicro provides industry leading support for RAM and NVMe density on the BigTwin model that we are deploying for the new Intel Xeon Scalable processors. These systems allow us to support up to 6 NVMe drives per node for a total of 24 NVMe drives in 2U. This addresses the rapidly increasing performance demands that our clients put on our platforms,” said William Bell VP of Products at PhoenixNAP, a global organization that offers a wide portfolio of cloud, bare metal dedicated servers, colocation and Infrastructure-as-a-Service (IaaS) solutions.

Supermicro’s new all-flash 32 drive NVMe 1U system supports both “ruler” and U.2 form factors to offer customers increased storage flexibility.  This 1U system will support a half petabyte of NVMe storage capacity this year and a full petabyte early next year.

In addition, Supermicro has developed 1U and 2U Ultra servers with 20 directly attached NVMe SSDs.  These new X11 servers feature a non-blocking design, allocating 80 PCI-E lanes to the 20 NVMe SSDs. This approach provides the lowest possible latency and unleashes up to 18 million IOPS in throughput performance.

For more information on Supermicro’s all-flash NVMe server solutions, please visit https://www.supermicro.com/products/nfo/NVMe.cfm.

For complete information on Supermicro SuperServer solutions, visit www.supermicro.com.

Follow Supermicro on Facebook and Twitter to receive their latest news and announcements.

About Super Micro Computer, Inc. (NASDAQ: SMCI)
Supermicro (NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology is a premier provider of advanced Server Building Block Solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop/Big Data, HPC and Embedded Systems worldwide. Supermicro is committed to protecting the environment through its “We Keep IT Green” initiative and provides customers with the most energy-efficient, environmentally-friendly solutions available on the market.

Source: Supermicro

The post Supermicro Previews 1U Petabyte NVMe Storage Supporting “Ruler” Form Factor for Intel SSDs at FMS appeared first on HPCwire.

Flash Memory Summit 2017: Liqid Partners with One Stop Systems

Wed, 08/09/2017 - 09:41

SANTA CLARA, Calif., Aug. 9, 2017 — Liqid Inc., leading provider of on-demand composable infrastructure (CI) technologies, today announced it will showcase its Composable Infrastructure (CI) solution at Flash Memory Summit 2017, demonstrating bare-metal CI paired with the Express Box 3600/3450 (EB3600/EB3450), the leading PCIe-based Multi-GPU PCIe Gen 3 Expansion Chassis series from One Stop Systems (OSS). Delivering the unprecedented ability to pool and orchestrate graphics processing alongside storage, networking, and compute, the demo shows how Liqid CI enables users to dynamically leverage multiple NVIDIA GPUs aggregated on the EB-series across the application spectrum, balancing the resource alongside solid state and other datacenter elements. The infrastructure agility achieved through the first-of-its-kind configuration can scale graphics processing to meet the emerging demands of data-hungry applications in fields as diverse as artificial intelligence (AI), the Internet of Things (IoT), Infrastructure-as-a-Service (IaaS), high performance computing (HPC), digital media and entertainment, and much more.

“The Express Box PCIe expansion chassis series from OSS provides the industry’s most sophisticated vessel for aggregating NVIDIA’s powerful GPU resources to deliver breakthrough computational capabilities,” said Steve Cooper, CEO, OSS. “When coupled with innovative CI solutions from Liqid, the value inherent to the EB-series can be amplified through Liqid’s pooling, orchestration, and scaling features to drive improvements in application performance and take greater advantage of GPU technology’s transformative potential.”

The Express Box series of PCIe expansion chassis provides independent PCIe slot subsystems that can be managed separately by different hosts. The EB-series permits IT administrators to aggregate graphics processing resources for workloads that require more GPU processing power than can be produced through installation in existing PCIe slots on servers and workstations. The EB-series is ideal for emerging applications in artificial intelligence and other high density GPU applications that demand mission-critical features.

At Flash Memory Summit, Liqid and OSS will showcase how their solutions can be jointly deployed to scale-out GPU over a composable PCIe fabric, with Liqid CI providing unprecedented infrastructure agility through its ability to pool, scale and orchestrate graphics processing alongside the latest solid-state technologies and other datacenter resources. The live product demonstration will feature the OSS EB-series GPU expansion chassis loaded with NVIDIA Tesla GPUs, and deployed on the Liqid Grid Composable PCIe Fabric Switch, demonstrating dynamic GPU assignment over a high performance PCIe fabric.

With the Liqid CI Platform built on disaggregated resource pools, users are finally freed from the restrictions of the motherboard/chassis paradigm that has remained one of the final and most stubborn physical limitations of the digital world. Liqid CI software allows IT users to orchestrate resources as needed and instantly reallocate physical resources as business needs change.

“Through partnerships with industry-leading companies like OSS, Liqid continues to demonstrate how increasingly critical GPU resources can be more effectively deployed alongside flash media and other datacenter elements to achieve exponential advancements in infrastructure agility and utilization,” said Jay Breakstone, CEO, Liqid. “With a new generation of technologies quickly steering us into the next decade of innovation, our composability solutions can be configured to scale for graphically intensive compute applications such as genetic mapping, oil and gas research, or machine learning algorithms, providing the infrastructure flexibility necessary for continued advancement in a wide variety of fields.”

About Liqid

Liqid delivers unprecedented infrastructure agility, marking the next evolution in data center technology. As a global leader in composable infrastructure (CI), Liqid’s open platform allows users, either manually or through policy-based automation, to effortlessly manage and configure “physical,” bare-metal server systems in seconds. Liqid’s software and hardware work in harmony allowing users, for the first time, to configure their physical server infrastructure on-the-fly. In this way, Liqid enables organizations to adapt to technological and business changes in real time and fully maximize opportunities in today’s digital economy. For more information, contact the Liqid team at info@liqid.com or visit www.liqid.com. Follow Liqid on TwitterLinkedIn and Google+.

Source: Liqid

The post Flash Memory Summit 2017: Liqid Partners with One Stop Systems appeared first on HPCwire.

Rescale to Enable Data Transfer to Cloud Big Compute Environment via Equinix Cloud Exchange

Wed, 08/09/2017 - 09:37

SAN FRANCISCO, Aug. 9, 2017 — Rescale today announced a collaboration with Equinix, the global data center and interconnection provider, to offer Rescale’s suite of HPC solutions via the Equinix Cloud Exchange. Equinix and Equinix Cloud Exchange enable enterprises to easily migrate their on-premise infrastructure into the world’s most interconnected data centers and connect to multiple cloud service providers via a single port. By working with Rescale and providing access to its new product, ScaleX HyperLink, Equinix will enable extremely fast and secure data transfer to and from Rescale’s multi-cloud big compute environment. ScaleX HyperLink is a fully managed service with high availability and low latency. It takes full advantage of the Equinix global platform and public cloud service providers’ high-speed networking technologies such as AWS Direct Connect, Azure ExpressRoute, GCP Cloud Interconnect or IBM Direct Link.

While the migration of high-performance computing workloads to the cloud is accelerating, enterprises still struggle with how best to manage their existing on-premise hardware and data. By working with Equinix, Rescale now offers Equinix colocation services to ‘lift and shift’ a customer’s hardware into Equinix’s secure and reliable International Business Exchange (IBX) data centers. The collaboration enables enterprises to continue to utilize their capital investment in a private cloud mode via ScaleX HyperLink and enables bursting to unlimited on-demand compute resources via Rescale’s managed global multi-cloud network. Additionally, by moving hardware into an Equinix IBX data center, enterprises will have access to state-of-the-art, private, carrier-neutral, and carrier-dense interconnections. As well as being extremely fast, these private connections are highly secure with no packets traveling over the public internet.

Enterprises that choose to keep their hardware on-site but want to leverage public cloud as part of their hybrid on-premise/public cloud, big compute strategy can also take advantage of ScaleX HyperLink, utilizing Rescale’s managed pay-as-you-go, on-demand cloud model for burst requirements. Enterprises may choose to have their data travel via private networks instead of public internet to ensure they meet the strictest data security requirements. This high-speed, low-latency interconnection between on-premise IT and the public cloud improves data throughput performance and provides more data security and privacy than public internet.

“Although many enterprises have already made the move to cloud for certain data services, many are reluctant to make the move for one of their largest investments: on-premise, high-performance computing,” said Tyler Smith, Head of Partnerships at Rescale. “Rescale helps ease the migration process with a hybrid and burst solution, making best use of existing and cloud resources in tandem, and the partnership with Equinix adds a compelling layer of ultra high-speed data transfer and best-in-class security, while allowing the enterprise to keep their preferred private/public cloud model.”

Steve Steinhilber, Vice President of Business Development at Equinix added, “By working with Rescale, we are able to add a big compute stack to our existing interconnection offerings and IaaS capabilities, and it enables our customers to run high-performance computing jobs in a turnkey and on-demand fashion. Through colocation or just leveraging the Equinix platform, Rescale and Equinix are providing a holistic approach to compute intensive workloads.”

About Rescale

Rescale is the global leader for high-performance computing simulations and deep learning in the cloud. Trusted by the Global Fortune 500, Rescale empowers the world’s top scientists and engineers to develop the most innovative new products and perform groundbreaking research and development faster and at lower cost. Rescale’s ScaleX platform transforms traditional fixed IT resources into flexible hybrid, private, and public cloud resources—built on the largest and most powerful high-performance computing network in the world. For more information on Rescale’s ScaleX platform, visit www.rescale.com.

Source: Rescale

The post Rescale to Enable Data Transfer to Cloud Big Compute Environment via Equinix Cloud Exchange appeared first on HPCwire.

Bolstering the ARM Case for HPC Workloads

Wed, 08/09/2017 - 09:26

A new report sponsored by ARM and prepared by the University of Cambridge (UK) shows strong scaling for two popular CFD programs – OpenFOAM and Cloverleaf – on Cavium ThunderX-based ARM systems.

ARM’s push to penetrate the HPC landscape has been uneven so far but momentum may be shifting. There are a couple of landmark efforts, notably the Japan’s post k computer project in which Fujitsu chose ARM as its processor and the European Mont Blanc project. Wider adoption for ARM-based systems in the academic and commercial HPC world has been slow. ARM says this latest report is one of several planned efforts to bolster ARM’s case for use with HPC workloads.

The report by the University of Cambridge (HPC Case Study: CFD Applications on ARM), is intended to demonstrate both performance scalability and ease of porting these widely-used applications.

  • OpenFOAM is a popular, general purpose solver package with many academic and industrial users. It can solve a wide range of both steady-state and time-dependent problems, in compressible and incompressible fluids, solid mechanics and electromagnetism. OpenFOAM is written in C++ and parallelized with MPI.
  • Cloverleaf is a shock hydrodynamics code solving the compressible, time-dependent Euler equations using a second-order-accurate staggered-grid method. Cloverleaf is a mini-app, part of the international Mantevo Project, replicating the computational requirements of production codes used at defense laboratories worldwide. Originally written in Fortran, with hybrid parallelization using OpenMP/MPI, it has been ported to a wide range of programming models, allowing application performance to be compared across different parallel architectures.

The Cavium ThunderX architecture as described in the report “is a System on Chip (SoC) based on the ARM-v8 architecture. A single socket is comprised of 48 physical, fully out-of-order cores with up to four DDR4 memory controllers. A compute node typically has two sockets for a total of 96 cores and up to 1TB of attached memory. The mini- cluster used in this study was configured with 128GB of attached DDR4 memory and a 40Gb Ethernet interconnect.”

A figure showing the Cloverleaf performance is below.

Strong scaling results for the Cloverleaf bm series test cases. Curves show the speedup on 2, 4 and 8 ThunderX sockets, normalised to the time of 1 socket

According to the report, “It is key to note the ease with which these legacy x86 applications ported to ARM’s AArch64 instruction set. The Cloverleaf application code was compiled with no intervention in the code base or the build system. OpenFOAM required some small (~10 lines added) changes to its custom make build system to compile on AArch64 but none of the application code was affected.”

As in all “sponsored’ benchmarks, it’s probably smart to read the results with a grain of salt. That said, the ARM camp (silicon providers and ecosystem partners) seem to be mounting a louder campaign to earn a piece of the HPC market.

Link to ARM blog and the report: https://community.arm.com/processors/b/blog/posts/arm-hpc-case-study-university-of-cambridge

The post Bolstering the ARM Case for HPC Workloads appeared first on HPCwire.

IBM Raises the Bar for Distributed Deep Learning

Tue, 08/08/2017 - 12:20

IBM is announcing today an enhancement to its PowerAI software platform aimed at facilitating the practical scaling of AI models on today’s fastest GPUs. Scaling to 256 GPUs with its new distributed deep learning (DLL) library, IBM reports that it has bested previous records set by Google and Facebook on two well-known image recognition workloads.

“This is one of the bigger breakthroughs I have seen in a while in all of the deep learning industry announcements over the last six months,” said Patrick Moorhead, president and principal analyst of Moor Insights & Strategy. “The interesting part is that it is from IBM, not one of the web giants like Google, which means it is available to enterprises from on-prem use using OpenPower hardware and PowerAI software or even through cloud provider Nimbix.”

The crux of the announcement is a new communication algorithm developed by IBM Research scientists and encapsulated as a communication library, called PowerAI DDL. The library and APIs are available today as a technical preview to Power users as part of the PowerAI version 4.0 release. Other efforts to improve multi-node communication have tended to focus on only a single deep learning framework, so it’s notable that the PowerAI DDL is being integrated into multiple frameworks. Currently TensorFlow, Caffe and Torch are supported with plans to add Chainer.

Customers who don’t have their own Power systems can access the new PowerAI software via the Nimbix Power Cloud.

“Like the hyperscalers and large enterprises, Nimbix has been working to build distributed capability into deep learning frameworks and it just so happens that what IBM is announcing is effectively a turnkey software solution that implements that in multiple frameworks,” said Nimbix CEO Steve Hebert.

“This is truly an HPC technology,” he continued. “It’s taking some of the best software components of traditional HPC and marrying those up with AI and deep learning to be able to deliver that solution. Our platform is ideally suited for scaling out in the HPC sense, very low latency for codes that get that linear scaling of problem sizes. That means for deep learning we can start to tackle enterprise-class deep learning problems basically on day one. For this to become available to any company or consumer outside of [the big hyperscalers], like Google, Baidu, etc., it really democratizes access to everybody.”

The multi-ring communication algorithm within DDL is described (see IBM Research paper) as providing a good tradeoff between latency and bandwidth, as well as being adaptable to a variety of network configurations. The full method is proprietary but section 4 of the paper provides a fairly detailed description of the library and algorithm.

The current PowerAI DDL implementation is based on Spectrum MPI. “MPI provides many needed facilities, from scheduling processes to basic communication primitives, in a portable, efficient and mature software ecosystem” state the researchers, although they add the “core API can be implemented without MPI if desired.”

To evaluate the performance of its new PowerAI Distributed Deep Learning library, IBM performed two experiments using a cluster of 64 IBM “Minsky” Power8 SL822LC servers, each equipped with four Nvidia Tesla P100 GPUs connected through Nvidia’s high-speed NVLink interconnect. The systems occupied four racks (16 nodes each), connected via InfiniBand.

IBM reports that the combination of its Power hardware and software offers better communication overhead for the Resnet-50 neural network using Caffe than what Facebook recently achieved with the Caffe2 deep learning software. The IBM Research DDL software achieved an efficiency of 95 percent using Caffe on its 256-GPU Minsky cluster whereas Facebook achieved 89 percent scaling efficiency on a 256 NVIDIA P100 GPU accelerated DGX-1 cluster using the Caffe2 framework. Implementation differences that could affect the comparison, e.g., Caffe versus Caffe2, are discussed in the IBM Research paper.

Scaling results using Caffe with PowerAI DLL to train a ResNet-50 model using the ImageNet-1K data set on 64 Power8 servers that have a total of 256 Nvidia P100 GPUs (Source: IBM)

In the second benchmark test, IBM Research reported a new image recognition accuracy of 33.8 percent for a Resnet-101 neural network trained on a very large data set (7.5 million images, part of the ImageNet-22k set). The previous record published by Microsoft in 2014 demonstrated 29.8 percent accuracy.

IBM Research fellow Hillery Hunter observed that a 4 percent increase in accuracy is a big leap forward as typical improvements in the past have been less than 1 percent.

Further, with IBM’s distributed deep learning approach, the ResNet-101 neural network model was trained in just seven hours, compared to the 10 days it took Microsoft took to train the same model. IBM reported a scaling efficiency of 88 percent.

Sumit Gupta, vice president of AI and HPC within IBM’s Cognitive Systems business unit, believes the increased speed and accuracy will be a huge boon to enterprise clients. “Part of challenge has been if it takes 16 days to train an AI model it’s not really practical,” he said. “You only have a few data scientists when you work in a large enterprise and you really need to make them productive so bringing down that 16 days to 7 hours makes data scientists much more productive.”

Certain applications are particularly time-constrained. “In security, military, fraud protection, and autonomous vehicles you often only have minutes or seconds to train a system to deal with a new exploit or problem but currently it generally takes days,” said market analyst Rob Enderle. “This effectively reduces days to hours, and provides a potential road map to get to minutes and even seconds.” It’s scenarios like these that make buying Power Systems to speed deep learning far easier to justify, he added.

The list of use cases seemingly grows longer by the day. Recommendation engines, credit card fraud detection, mortgage analysis, upsell/cross-sell to retail clients, shopping experience analysis are all getting a lot of attention from IBM’s customers.

“The giants like Microsoft and Google and others who have consumer apps, they obviously are getting on the consumer platform a lot of data all the time. So their use cases in many cases are very obvious, finding images of dogs in Google photos,” for example, said Gupta. “But we see enterprise clients have lots of data and lots of use cases they are now getting around to using these methods.”

The next step for IBM researchers is to document scaling beyond 256 GPUs as their current findings indicate that is feasible. “We don’t see a reason why the method would slow down when we double the size of the system,” said Gupta.

The post IBM Raises the Bar for Distributed Deep Learning appeared first on HPCwire.

Deep Learning Thrives in Cancer Moonshot

Tue, 08/08/2017 - 12:11

The U.S. War on Cancer, certainly a worthy cause, is a collection of programs stretching back more than 40 years and abiding under many banners. The latest is the Cancer Moonshot, launched in 2016 by then U.S. Vice President Joe Biden, and passed as part of the 21st Century Cures Act. Over the years, computational technology has become an increasingly important component in all of these cancer-fighting programs, hand-in-glove with dramatic advances in understanding cancer biology and a profusion of new experimental technologies – DNA sequencing is just one.

Recently, deep learning has emerged as a powerful tool in the arsenal and is a driving force in the CANcer Distributed Learning Environment (CANDLE) project, a joint effort between the U.S. Department of Energy (DOE) and the National Cancer Institute (NCI), that is also part of the U.S. Exascale Computing Project (ECP). (For an overview, see HPCwire article, Enlisting Deep Learning in the War on Cancer).

Think of CANDLE as an effort to develop a computational framework replete with: a broad deep learning infrastructure able to run on leadership class computers and other substantial machines; a collection of very many deep learning tools including specific networks and training data sets for use in cancer research; and potentially leading to development of new therapeutics – drugs – for use against cancer. Currently, CANDLE helps tackle the specific cancer challenges of three cancer pilots, all using deep learning as the engine. Importantly, the three pilots have begun to yield results. The first software and data set releases took place in March, and just a few weeks ago, the first stable ‘CANDLE infrastructure’ was released.

Rick Stevens, Argonne National Laboratory

In the thick of the CANDLE work is principal investigator Rick Stevens, also a co-PI of one of the pilots and associate laboratory director for the computing, environmental and life sciences directorate at Argonne National Laboratory (ANL). Stevens recently talked with HPCwire about CANDLE’s progress. The work encompasses all things deep learning – from novel network types, and algorithm development, to tuning leadership class hardware platforms (GPU and CPU) for deep learning use. Results are shared on GitHub in what is a roughly quarterly schedule.

Perhaps as important, “We are trying to plant inside the labs a critical mass of deep learning research people,” says Stevens. Just as CANDLE embodies broad hopes for applying deep learning to precision medicine, DOE has high hopes for developing deep learning tools able to run on leadership class machines and to be used in science research writ large.

“Everything is moving very fast right now,” says Stevens. “As the scientific community starts to get interesting results, two things will happen. One is we will be able to write papers and say ‘oh we took this network concept from Google and we applied it to material science and we changed it just this way and got this great result and now it is a standard way for designing materials.’ Those kinds of papers will happen.

“But then there is going to be another class of papers where I think in the labs, people that are interested in deep learning more generally and are interested in applying it to things other than driving a car or translating, but now are interested in applying it to particle physics or something – they are going to come up with some new deep learning algorithms, new network types, new layer types, new ways of doing inference for example, that are actually inspired by the problem in the scientific domain. That will take a bit longer.”

Clearly, there’s a lot going on here with CANDLE as the tip of the spear. Here’s a snapshot of the three pilots, which fall under Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program underway:

  • RAS Molecular Project. This pilot (Molecular Level Pilot for RAS Structure and Dynamics in Cellular Membranes) is intended to develop new computational approaches supporting research already being done under the RAS Initiative. Ultimately the hope is to refine our understanding of the role of the RAS (gene family) and its associated signaling pathway in cancer and to identify new therapeutic targets uniquely present in RAS protein membrane signaling complexes.
  • Pre-Clinical Screening. This pilot (Cellular Level Pilot for Predictive Modeling for Pre-clinical Screening) will develop “machine learning, large-scale data and predictive models based on experimental biological data derived from patient-derived xenografts.” The idea is to create a feedback loop, where the experimental models inform the design of the computational models. These predictive models may point to new targets in cancer and help identify new treatments. Stevens is co-PI.
  • Population Models. This pilot (Population Level Pilot for Population Information Integration, Analysis and Modeling) seeks to develop a scalable framework for efficient abstraction, curation, integration and structuring of medical record information for cancer patients. Such an ‘engine’ would be enormously powerful in many aspects of healthcare (delivery, cost control, research, etc.).

The early work has been impressive. “In March, we put out a set of seven cancer deep learning benchmarks on GitHub that represent each of the three pilot areas. They represent five or six different deep learning network types and have self-contained data that you can download. That was a first major milestone. A couple of weeks ago we pushed out our first release of reliable CANDLE infrastructure. It contains what we call the supervisor which is infrastructure for running large scale hyperparameter searches (for network development) on the leadership computing platforms,” says Stevens.

Hyperparameters are things like the network structure (number and types of layers), learning parameters (activation functions, optimizers, learning rate) and loss functions.  They are things that you set before training and are not learned during training.  For any given problem there are many combinations of hyperparameters that are possible and often it is important to search for good combinations to get improved model performance.

“It runs here are Argonne, at NERSC and Oak Ridge National Laboratory, and [with some modest modification] runs pretty much on anybody’s big machines. What it does is manage hundreds or thousands of model runs that are trying to find goods choices of model hyperparameters.” The latter is indeed a major accomplishment. (Links to CANDLE releases so far: CANDLE Version 0.1 architecture: https://github.com/ECP-CANDLE/CANDLE; CANDLE benchmarks: https://github.com/ECP-CANDLE/Benchmarks )

Already there have been surprises as illustrated here from Stevens’ pre-clinical screening project.

“We think we understand biology, and so we [were] thinking ‘we know relationships about the data and we can somehow structure the problem in a way that is going to make the biologically relevant information more accessible for the networks.’ The network surprised us by basically saying, ‘no that isn’t helping me, let me do it my own way’ in discovering the features that matter most on its own,” says Stevens.

“Of course I am anthropomorphizing but that was essentially the thought process that we went through. We were surprised by the fact that the machine could do better than we could. I think ultimately we’ll be right. It won the first round, but we’re going to get more clever in how we encode, and I think we’ll still be able to show that us working with the network will do a better job than either working alone.”

As described by Stevens, CANDLE is a layered architecture. At the top is a workflow engine – a Swift/T-based framework called Extreme-scale Model Exploration with Swift (EMEWS) was selected – that manages large numbers of runs on the supercomputers and can be adapted for other more moderate size machines.

“We are using technology like NoSQL database technology for managing the high level database component, and these things are running on GPU-based systems and Intel KNL-based systems. We are trying out some of the new AMD GPUs (Radeon) and are experimenting with some more NDA technology,” Stevens says without discussing which systems have demonstrated the best performance so far.

“For the next level down in terms of scripting environment, we decided early on to build CANDLE on top of Keras, a high level scripting interface language for deep learning that comes out of Google. All of our models get initially implemented in Keras. All of our model management tools and libraries are also implemented in Keras. We’re doing some trial implementation directly in neon (Intel) and MXNet (Amazon),” says Stevens, but there are no plans to switch.

One of the things CANDLE leaders liked about Keras is it sits easily on top of all the major frameworks – Tensorflow, Theano, MXNet, and Microsoft’s Cognitive Toolkit (CNTK). Stevens says, “We don’t program typically directly in those frameworks unless we have to. We program in the abstraction layer on top (Keras). That gives us a lot of productivity. It also gives us a lot of independence from specific frameworks and that is really important because the frameworks are still moving quite quickly.”

Google’s TPU, an inference engine, has not been tested, but the “TPU2 infrastructure is on our list of things to evaluate.” Google, of course, has shown no inclination to sell TPU2 on the open market beyond providing access in the Google cloud (See HPCwire article, Google Debuts TPU v2 and will Add to Google Cloud).

“I think our CANDLE environment in principle could run in the Amazon cloud or Google cloud or Azure cloud, in which case if you were going to do that we would certainly target the TPU/GPU for acceleration. Right now our focus is in CANDLE because it is part of the ECP to target the DOE Exascale platforms as our first priority target. Our second priority target are other clusters that we have in the labs,” says Stevens. CANDLE also runs on NIH’s Biowulf cluster.

Stevens says the project has been talking to Intel about the Nervana platform and the Knights Mill platform and is, of course, already using Nvidia P100s and will work with the V100 (Volta).

OLCF’s recently deployed DGX-1 artificial intelligence supercomputer by NVIDIA, featuring eight NVIDIA Tesla GPUs and NVLink technology, will offer scientists and researchers new opportunities to delve into deep learning technologies.

A major governance review took place about a month ago and CANDLE received the go-ahead for second year. Stevens says there are now about 100 people working on CANDLE from four different labs and NCI. “I would say we have good collaborations going on with Nvidia, Intel, and AMD in terms of deep learning optimization interactions with architecture groups. We are also talking to groups at IBM and Cray, and we’ve had a big set of demonstration runs on the three big open science computers.”

Of course the most immediate point of all this infrastructure work is to do worthwhile cancer research. Stevens is closest to the pre-clinical screening project whose premise is straight forward. Screen pairs of known drugs against cancerous cell lines and xenograft tumors, measure the effect – inhibition, growth, etc. – and build predictive models from the data. Most of the work so far has been done with cell lines grown in dishes but the effort to gather drug response data from human tumor grafts on rodents (human tumor xenograft model) is ramping up.

Says Stevens, “NCI took the 100 small molecule FDA cancer drugs and devised a high throughput screen that did all pairs of those drugs and concentration levels so we have something like 3.5 million experimental results there. We’ve been building a deep learning model that learns the relationship between the molecular properties of the tumor, in this case the cell line, and the structural properties of the pairs of drugs, and then predicts the percent growth inhibition.

“So we have millions of these pairs of drugs experiments and we have the score of how well the growth was inhibited. The machine learning model is trying to generalize from that training data, a representation that allows us to predict a result for a cell line and a drug combination that it hasn’t seen before. It’s a deep learning model. It’s a multi layered neural network and uses convolution.”

One surprise was how well a one-dimensional convolution layer neural network works on this problem.

“A priori we didn’t believe it would work, and it has better convergence than the fully connected network layers which was the way we thought we had to do it,” he says. “So that’s an interesting insight. The models are starting to become accurate enough and there is a lot of more work to do, but the initial results are encouraging. We’ve got models that are around 87 percent accurate in predicting whether the drug pair will outperform a single drug or not.

“The primary thing we are trying to do is get to a basic model architecture that has a high level of base line predictivity and then we can start optimizing it. Where we are now is encouraging. I think we can do better but we are showing that we have predictive power by building these models but we are also showing we need more data out of the tumors, not different types of data for the same tumor. We need more tumor samples from which we have highly curated molecular assay data and drug screening data,” he says.

There have already been some interesting results: “I have to be careful what I say because we are in the process of publishing those,” he says.

The RAS work is centered at Livermore. A well-known cancer promoter, the RAS family of proteins and pathways has been widely studied. It turns out ANL has the ability to synthesize “nanodiscs” with both wild type RAS and mutated RAS embedded in cell membrane and to obtain images and other biophysical measurements of the RAS proteins. That data is being used to train the simulations.

“The problem that team has been focusing on first is to reproduce computationally the phenomenon known as rafting – rafting is where more than one RAS protein aggregates [in the cell membrane] because that changes the dynamics of the signaling pathway. We have experimental data that shows there’s at least three different states of the membrane. What we are trying to do with the simulations is reproduce that basic biological behavior as a confidence builder exercise before we go to a more complex scenario,” says Stevens.

“One of the challenges is to extract the same kind of observational data from the simulation that you get from the experiment so one of the novel uses of machine learning has been to actually recognizing the different rafting states,” he says.

To a large extent the network training concepts are familiar. Train lower levels of the network to basic elements progressing to more specific assemblages of those elements at higher levels. Consider training a network to recognize cars, says Stevens. The lower network levels are trained on things like edges, shadows, colors, vertical and horizontal lines. The upper portion is trained to recognize things like windshields and hubcaps, etc. There’s lots of new and familiar tools to do this. Convoluted networks are hardly new. Generative networks are hot. He and his team are sorting through the toolkit and developing novel ones as well.

Returning to the auto recognizing analogy, he says “[If] now I want it to recognize a sailboat and different types of sailboats, the part of the network that recognized hubcaps and windshields isn’t going to be very useful but the part that learned how to recognize straight edges and colors and corners and shadows is essentially going to be the same. We think the same concept can apply to these drug models, or these drug response models. We are training the network to learn the difference between tumors and regular tissue, training it to recognize the difference between brain cancer and liver cancer, between colorectal cancer and lung cancer, for instance.

“They are recognizing what cancer is, the features that distinguish between normal cells and cancerous cells, and the features that relate to drug effectiveness. The models are trained on drugs, learning drug structures, learning the basic features of how molecules get put together – the kind of basic building blocks, side chains and rings and so on that are in molecules. Those don’t change. We are training on all the known drugs, not just cancer drugs, but all the known drugs, all the chemical compounds that are in the big libraries, so we learn basic chemistry at the bottom so we can combine the networks essentially to predict drug response. They don’t have to relearn most basic features. We just have to in some sense tune them up on the specific patient populations that you are looking for. I think these models could hop from preclinical screening application to actual use and clinical practice fairly quickly.”

In some sense, deep learning, close kin to the more familiar data analytics, is a new frontier for much of science. Squeezing it under the HPC umbrella makes some people nervous. 64-bit double precision is nowhere in sight. Low precision doesn’t just work fine, it works better. How is this HPC, ask some observers in the HPC community?

Programing frameworks also present a challenge. Consider this snippet from a recent blog by Intel’s Pradeep Dubey, Intel Fellow and director of Intel’s Parallel Computing Lab, on the convergence of HPC and AI, “…unlike a traditional HPC programmer who is well-versed in low-level APIs for parallel and distributed programming, such as OpenMP or MPI, a typical data scientist who trains deep neural networks on a supercomputer is likely only familiar with high-level scripting-language based frameworks like Caffe or TensorFlow.”

Whatever your view of the computational technology being pressed into service, deep learning is spreading in the science community, and indeed, that is one of the CANDLE project’s goals. All of the major labs have deep learning programs and CANDLE has an aggressive ongoing outreach effort to share learnings.

“The CANDLE project is having a lot of impact [just] through its existence. We have kind of turned on a lot of people at the labs who say, ‘oh machine learning is real and deep learning is real and we are making sure that these future architectures we’re building now, the ones we are deploy in the 2021 timeframe are actually well-suited for running deep learning.’ That is causing the scientific community to say I should think about whether I could use deep learning in my problem,” says Stevens.

The post Deep Learning Thrives in Cancer Moonshot appeared first on HPCwire.

Do Big IT Outsourcing Firms Abuse H-1B Program?

Tue, 08/08/2017 - 10:25

Is the H-1B visa program mostly a way to import cheaper IT talent from abroad? Yes, at least as practiced by the top IT labor outsourcing firms according to an article posted today on the IEEE Spectrum.

Written by Prachi Patel, the IEEE Spectrum article, New Data Hammers Home Problems with H-1Bs and Outsourcing Firms, reports that the dominant IT labor outsourcing firms pay less to foreign workers than prevailing U.S. salaries for the same functions. Here’s a brief excerpt:

“The top 20 employers took 37 percent of the approved visas. The top five were all IT outsourcing firms: Cognizant Tech Solutions, Infosys, Tata Consultancy Services, Accenture, and Wipro. All together, these companies took 60,000 visas.

“The average salary of H-1B visa holders was $91,000. But top outsourcing firms paid well below this number, with TCS paying as little as $72,000 on average.

“Non-outsourcing tech companies seem to be more fair. Apple, Cisco, Microsoft, and Google, for instance, pay their H-1B workers average wages of over $120,000. These higher-paying companies generally hire more workers with Master’s degrees, while most outsourcing firms hire mainly Bachelor’s degree holders.”

The debate around H-1B visa program for IT labor has long been contentious. Employers argue they are unable to find needed skills within the U.S. According to the IEEE Spectrum article Microsoft, for instance, pays H-1B software developers $126,000 on average, while the Bureau of Labor Statistics puts the average salary for that job at $132,000. Ron Hira, professor of political science at Howard University is quoted, “Any firm that is able to pay less money is going to pay less money.”

Link to IEEE Spectrum article: http://spectrum.ieee.org/view-from-the-valley/at-work/tech-careers/new-data-hammers-home-problems-with-h1bs-and-outsourcing-firms

The post Do Big IT Outsourcing Firms Abuse H-1B Program? appeared first on HPCwire.

Gen-Z Consortium Announces Gen-Z Multi-Vendor Demo at FMS 2017

Tue, 08/08/2017 - 10:24

SANTA CLARA, Calif., Aug. 8, 2017 — The Gen-Z Consortium, an organization developing an open systems interconnect designed to provide high-speed, low latency, memory-semantic access to data and devices, today announced the world’s first Gen-Z multi-vendor technology demonstration, connecting compute, memory, and I/O devices at Flash Memory Summit, Santa Clara, August 8-10. Gen-Z technology enables increased performance and scalability for existing enterprise applications and future memory-centric computing applications.

The demonstration utilizes FPGA-based Gen-Z adapters connecting compute nodes to memory pools through a Gen-Z switch, creating a fabric connecting multiple server vendors and a variety of memory vendors. The multi-vendor participation reflects strong industry support for Gen-Z and showcases how future data centers can leverage this technology to attain a unified, high-performance and scalable fabric/interconnect. Additionally, a separate demonstration will show the scalable prototype connector defined by the Gen-Z Consortium, running at 112 giga-transfers/sec.

“At Bloomberg, we provide information that powers the global capital markets and our customers depend on us to quickly deliver accurate data, despite exponentially increasing volumes,” said Justin Erenkrantz, head of compute architecture for the finance, media and tech company based in New York City. “Gen-Z’s memory-centric standards-based approach will allow us to bring even more powerful analytics to our customers through a distributed memory and compute fabric.”

“We are excited to showcase the first technology demonstration of Gen-Z that includes solutions from multiple member companies, including a variety of servers, memory and I/O devices, all connected with a Gen-Z fabric,” said Kurtis Bowman, President of the Gen-Z Consortium. “The consortium continues to meet the planned development schedule and we expect to see initial Gen-Z products in the 2019-2020 timeframe.  As an open consortium, we encourage all companies to join us in our collaborative effort to develop the architecture and products that will provide the performance required for housing and analyzing the incredible amount of information coming into the data center.”

The Gen-Z Consortium has doubled its member base since inception and now includes more than 40 companies. It has released four draft specifications to date, all of which are available openly on its website: Gen-Z Core Specification, SFF 8639 2.5” Connector Specification, SFF 8639 2.5” Compact Connector Specification, and Gen-Z Scalable Connector Specification. To watch the world’s first Gen-Z demonstration, please visit the Gen-Z Consortium booth #739 at the Flash Memory Summit, August 8-10. To see what member companies are saying about Gen-Z at FMS 2017, visit the website here.

Supporting Resources:

About Gen-Z Consortium

Gen-Z is an open systems interconnect designed to provide memory semantic access to data and devices via direct-attached, switched or fabric topologies. The Gen-Z Consortium is made up of leading computer industry companies dedicated to creating and commercializing a new data access technology. The consortium’s 12 initial members were; AMD, ARM, Broadcom, Cray, Dell EMC, Hewlett Packard Enterprise, Huawei, IDT, Micron, Samsung, SK hynix, and Xilinx with that list expanding as reflected on our Member List.

Source: Gen-Z Consortium

The post Gen-Z Consortium Announces Gen-Z Multi-Vendor Demo at FMS 2017 appeared first on HPCwire.