HPC Wire

Subscribe to HPC Wire feed
Since 1987 - Covering the Fastest Computers in the World and the People Who Run Them
Updated: 5 hours 28 min ago

XTREME DESIGN Raises $700K in Pre Series−A Round

Fri, 03/03/2017 - 05:00

March 3 — Tokyo-based XTREME DESIGN, the Japanese cloud-based virtual supercomputing on-demand service startup, announced on January 31 that it had fundraised $700K in its pre-series A round. This round is led by freebit Investment and includes individual investors, namely Kotaro Chiba (former Vice President of Japanese mobile game developer Colopl) and Yasumasa Manabe (CEO of Takamatsu-Kotohira Electric Railway). This funding is subsequent to the one of 30 million yen (about $260,000) conducted last March from its founders and angel investors. Total capital has reached $1M.

Also, XTREME DESIGN will exhibit a new UI/UX concept and data controller at the SXSW Trade Show to be held in Austin, Texas, starting March 10th. The company will exhibit the brand new UI/UX, “XTREME DNA 2.0” concept for HPDA (High Performance Data Analysis) mainly for the IoT Big Data analysis market.

About XTREME DESIGN

XTREME DESIGN started in February of 2015, based on Tokyo. It launched and demonstrated “XTREME DNA” at the global supercomputer conference Super Computing 2016, providing an unmanned service of operations monitoring / dynamic change of the configuration for effective system utilization by deploying virtual supercomputers on public cloud. The team have Industry leading technical skill sets for Supercomputing, DevOps, Cloud design and cool UI/UX.

Source: XTREME DESIGN

The post XTREME DESIGN Raises $700K in Pre Series−A Round appeared first on HPCwire.

Dr. Eng Lim Goh Hails New Frontier of Scalable Learning

Thu, 03/02/2017 - 17:07

Dr. Eng Lim Goh is one of the leading HPC visionaries of our time. He’s been the driving force behind SGI’s technical computing program for nearly two decades, and since the HPE-SGI joining last November, Goh’s role goes forward as vice president and SGI chief technology officer at HPE.

HPC’s ability to drive innovation and benefit humanity is always top of mind to Goh and he is an enthusiastic proponent of AI and deep learning, fast-moving fields that are being propelled by the confluence and availability of big compute and big data.

Eng Lim Goh

Goh was one of the lead designers on the recently-announced Tsubame AI supercomputer, which is now shipping to the Tokyo Institute of Technology (TiTech). With 47 petaflops of 16-bit AI horsepower, Tsubame3.0 will be one of the largest AI supercomputers in the world when it comes online this summer, and will be the largest in Japan. The system is organized around the concept of what Goh and HPE are calling scalable learning, not just for production deep learning as we know it but to set the stage for near-real time training, so models can learn on the fly.

In this interview with HPCwire, Dr. Goh reviews the current state of AI supercomputing, outlines his vision for where the field is headed and provides insight into the converged SGI-HPE roadmap.

HPCwire: AI is not a new field, but computational power and the availability of data have enabled this renaissance for machine learning and deep learning — how do you view the relationship between HPC and AI?

Dr. Eng Lim Goh: HPC traditionally has been one where you run simulations. You take in small amounts of data, you run your formulas, equations, and you process these data and then you produce massive amounts of data. That has always been the typical approach. Then over the years, this produced data is starting to become a load on the analytics side, to get the insights from these data. I’ve heard of customers who are saying they’ve only managed to analyze 10 percent of the data produced because things were getting a bit behind on the analysis side. They have been smart about selecting what 10 percent to analyze and they have been producing good results anyway, but more and more you’re realizing that the analysis side is becoming an issue not only just in HPC, but also for the instrument side, the Square Kilometre Array, Large Hadron Collider and so on.

So instruments and HPC are generating a large amount of data increasing the load on the analysis side. With an analysis engine, you’re taking all that data in to try to produce some insight at the other end. Lots of data goes in, distilled insight comes out. A haystack goes in and a needle comes out – that is essentially what that analysis is like. The world therefore has started to invest more and more on the analysis side.

I view this analysis as intensive human-in-the-loop analysis, where an analyst sits there, picks the data, runs some algorithm, produces the next step in that data, then the analyst again – the human-in-the-loop – runs other sets of algorithms to produce an output, going through this loop that ultimately produces insight.

At some point as data keeps growing, this intensive human-in-the-loop needs to have a better way, and I believe that is one of the reasons that AI has reemerged, because of the intensity of the human-in-loop and the questions from analysts to see if there’s a way to reduce this intensity. The motivation gives rise to machine learning where you can automate a bit more of the analysis and thus reduce the intensity of the human-in-the-loop in the analysis process.

So one approach therefore to reduce the intensity of the human-in-the-loop is through machine learning and this is how they do it. They take all the massive amounts of data and put it through a learning algorithm and then the learning algorithm ultimately achieves some decision-making capability and the next set of data that comes along, this machine learning algorithm is then applied to it – and now you can see the level of human intervention, the intensity of that human-in-loop, is reduced now because of the increased automation.

The other reason for the reemergence is there’s now enough data more easily available to do machine learning because machine learning requires massive amounts of data. If you go back 30 years the internet wasn’t there, the ability to have this availability of data was much less and the amount of data wasn’t as big. So this is the second reason – the massive amounts of data more readily available gave rise to the resurgence in machine learning.

And then you hear all these terms, machine learning, deep learning, AI, neural network. And the relationship among them I keep emphasizing among our teams and to customers and in my talks is as follows.

A machine learns to be artificially intelligent; that’s the relationship between the two, machine learning and AI. And a method of machine learning is called neural network and a multi-tier or multi-stage neural network is called deep learning.

And that’s how the four terms are related, and it turns out the more popular way today to achieve machine learning is through deep learning and deep learning is really a multi-stage neural network which is a method of machine learning and through machine learning you get artificial intelligence.

HPCwire: How do supervised and unsupervised learning intersect here?

Goh: Supervised, unsupervised and reinforcement learning are the three things. To simplify it, there are two things you have to do in supervised learning, there is a teacher and the teacher does two things. Label the inputs, so there’s a picture of cat, there’s a picture of cat, there’s not a picture of cat.

The second thing the teacher does is to run the supervision, to be the supervisor as the machine makes these wild guesses as to whether it got it right or wrong; this is called supervised learning. With reinforcement learning, you take two instances of the machine learning algorithm and get them to play each other. In the reinforcement learning the teacher has taken a step back and has less involvement.

This year in January, for the first time, the Carnegie Mellon University poker bot running on an HPE supercomputer beat four of the world’s top poker players in a game of Texas Heads-Up No-Limit poker [see our coverage here]. Last year it tried and lost, only beating one poker player out of four; this year it beat all four. Beat in a way that the sum total of all winnings was $1.7 million, while last year the machine lost $700,000.

With poker, unlike chess, you don’t know your opponent’s position; it is incomplete information and therefore the AI/ML algorithm needs to actually make guesses as to what your opponent has, and it turns out that this is actually important in real life to do AI. When you are negotiating, or when you are doing an auction, you don’t quite know what your opponents are thinking and this poker AI program that CMU has developed is not specifically just for poker; in real life it can be used as a negotiator or as a bidder in an auction.

After the CMU poker match, a player asked one of the developers, “Why did the algorithm, the poker bot, apply the bluff in this way?” The developer said they didn’t know the answer. This is the implication of what it means to have supervised and reinforcement learning. The algorithm is smart enough to make the right decision most of the time, but we are starting to not quite know why it is making those decisions.

Unsupervised learning is more, in my opinion, the frontier, where you’re saying as the teacher, I’m not going to do much, I’m not going to supervise, I’m just going to give the AI algorithm massive amounts of data that is unlabeled.

HPCwire: Let’s talk about the importance of scaling these models.

Goh: Learning takes a long time. If you’ve heard of Google’s cat experiment, they were taking in millions of pictures of cats and it took days, weeks or sometimes months depending on how big the machine is to complete that learning process before you can even start making decisions using that learned machine. You don’t have equations like in the HPC case. You essentially have an adder, you’re taking all this data and making guesses, you guess until you got it right and at the end of it, you produce a matrix of weights. In Google’s case, they call those weights tensors and that’s why they are building tensor processors.

The inference part of machine learning is based on the generation of weights or tensors at the end of that learned process. So you’re putting massive amounts of data to the machine learning algorithm and what the machine algorithm ultimately gets is a set or matrix of weights. This type of input gets more weight and this one gets less weight. The question is if you want to learn fast before you can start inferring using your weights, that is to produce your weights fast. Here you can see the funnel: massive data goes in and you produce little data out and that little data is essentially a matrix of weights.

The thing is, you want to learn fast. If you don’t want to take weeks or months to do learning because of the massive amount of data you have to ingest to make guesses on, what do you do? What you do is you scale your machine – this is where we come in right? You have to scale the machine because you can’t scale humans.

A “Co-Design” photo of the newly designed Tsubame3 blade, with TiTech Professor Satoshi Matsuoka and SGI CTO Dr. Goh. Source.

We can’t put two human minds together and hopefully reduce 20 years of education to 10 years, but machines we can scale. We can build a bigger machine to ingest more data and that’s exactly what TiTech does and that’s exactly what ABCI [short for AI Bridging Cloud Infrastructure, see our coverage here], which is the follow-on machine [to Tsubame3.0], will do even more. AIST, the new institute related to TiTech through Professor Satoshi Matsuoka, managing director of AIST, is looking at the next-generation cloud-based machine for artificial intelligence. He has this passion to build the world’s biggest AI machine. And TiTech is one part of it – the first step to scale machine learning – to take that learning process that would typically take months down to weeks, but you want to bring it down to days and even lower.

HPCwire: Can you tie this vision for scalable learning into the hardware and software challenges and what innovations are taking place there?

Goh: This was where we worked to fully understand Professor Matsuoka’s requirements. He’s the visionary and he thinks in a very scalable big way of what’s the next step for machine learning and if I were to simplify the idea it is to learn really fast through scaling, but scaling isn’t just buying more computers, just like with HPC.

High-performance computing isn’t just buying a million laptops and put them on the internet and create a virtual machine; it involves picking the right processor, picking the right interconnect and using the connection in the right way that suits the specific application and then the software layers that are on top of it. So this is the full set of requirements to achieve HPC and it is the same way with scalable machine learning; we have to pick the right processor, step one.

And in terms of the right one, in deep learning specifically, they don’t need that much precision but they need lots of FLOPS – so we look for a processor that could trade off precision for more FLOPS and it turns out the Nvidia P100 that we selected does exactly that in an almost linear fashion – meaning in HPC when you do double-precision, you get X number of FLOPS, when you drop down to single precision, you get 2X number of FLOPS, and when you drop down to half precision, you get almost 4X number of FLOPS. So this processor we’ve picked does the tradeoff in the right way that suits TiTech and that is fully trading off the precision for FLOPS.

Secondly, we need to pick the right interconnect that has high-bandwidth but not just high-bandwidth but as much bandwidth as [unintelligible]. Each node has four P100s from Nvidia and two Xeon processors from Intel. This two plus four combination in one node doesn’t just have one interconnect, we gave it all four. This is a very huge ingesting engine here and injection bandwidth engine here for each node, and then there are 540 of these in the TiTech machine, each node having four Nvidia P100s, two Xeons with not just one, not just two, but four high-bandwidth interconnects coming out of it.

The reason for this huge bandwidth is if you think of a funnel you are ingesting massive amounts of data on one side to crunch and learn from it – even if you have an engine that can learn really fast, you still need the ability to ingest all that data so you can consume it. So that was yet another innovation. I don’t remember [previously] building nodes that have up to four of these ingesting connects into each node. We’ve done many single interconnect, we’ve done fewer dual interconnect. I believe at least for me, this is the first time we’ve built a machine with four interconnects per node.

The other thing is how the interconnects are wired together in a topology. In this case we built a rich fat-tree that costs quite some portion of the money of the whole machine. As much as we dedicate the cost to the processors, we dedicated a significant portion of it to the way we connect so many interconnects coming off of each node in a rich fat tree, and then of course there is a software stack.

We listened very carefully to Professor Matsuoka to achieve his vision and we built to it and now we are pleased that many are making inquiries now for this same class of machine for their own projects.

HPCwire: Are the frameworks keeping up with the scalability on the hardware side?

Goh: The software framework in HPC has always tried to catch up with the industry that invests in scaling the hardware. In the early days, MPI, for example, to scale HPC applications did not scale as fast as we can scale the machines and also partly because the machine didn’t have the interconnect that was fast enough and low latency enough. So the answer is that the frameworks need to scale and some frameworks are scaling better than others, but essentially I think all will have to look at scaling just like the HPC world. HPC has different applications, some of them scale better. Even after so many years, decades, you get most of the applications scaling well, but you still have some applications that can’t scale to 10,000 or 100,000 cores for example. And it will be the same with the frameworks; some will scale better and others will have their niche.

We are watching the frameworks to make sure that we don’t go in a direction that frameworks cannot adapt to or scale to so it’s not just us running well ahead scaling the hardware. We have to look at frameworks too because the industry relies on a framework, especially the open source ones.

HPCwire: Did you see Baidu brought in ring all-reduce into their SVAIL deep learning framework and open sourced it as libraries?

Goh: Very interesting developments. There will be so many of these coming out, people taking AI algorithms into HPC applications too. But if you think about it, an all-reduce, which can be a common function you apply to HPC applications, each core is computing its own part of the entire problem, but ultimately you have to collect all the elements and reduce it to a big number. If you have ten cores, that’s fine, and then you have one hundred cores, one thousand cores, one million cores, going to ten million cores in an exascale machine, then all-reduce becomes a big data problem, and you’re trying to figure out ways to be smarter about it. We’ve been looking at building hardware just to do reduction and now that you built the hardware to do reduction, can the hardware be smarter about the reduction.

In the earlier days “smart” means you encode a fixed way of reducing that is the best you know, but can you be even smarter about it by being flexible about how you reduce, by looking at the pattern of the task reduction, and being smarter about the next reduction — and that’s starting to become a machine that starts to learn.

HPCwire: Switching gears a bit, it’s been about six months since the acquisition was announced and about three months since it was finalized, what can you tell us about the merged roadmap?

Goh: On the SGI side that is now part of HPE, clearly the ICE XA continues. The ICE XA can have different nodes in it and the one sold to TiTech is an AI node, a scalable learning AI node comprised of the four Nvidia P100s and two Xeons with the four interconnects and the option for Omni-Path and EDR, so this line continues: high-bandwidth, scale-out systems. The machine we shipped to TiTech, you can see from the photo that was released, it has the ICE-XA with the HPE green rectangle logo on it. This is public.

On the scale-up side, the UV architecture continues and currently it will be part of the Superdome program. The HPE Apollo line of systems includes both scale-up and scale-out solutions and the ICE-XA would be the part that comes in from the SGI side.

HPCwire: Final question – regarding supercomputing in Japan and the other big players there, Fujitsu and NEC, what are the strengths for HPE/SGI in relation to the competition?

Goh: I think with the Japanese vendors, the strengths are on the HPE side in this field and we’ve already got success stories with regards to building scalable learning engines and we want to keep that edge by bringing what we’ve learned from the HPC side of how to build scalable simulation engines to the AI side to building scalable learning engines. As HPE has acquired SGI, on the SGI side we have decades of experience building scalable systems and we believe that’s going to be a huge differentiator.

The second differentiator is our close relationship with customers all these years, and the application providers — from the ISVs to customers building their own codes. We’ve spanned decades, not just building scalable HPC systems on our own, but we’ve built them in conjunction with these software and applications that are developing also.

When we realized way back that reductions are a key factor, almost decades back, we were already starting to build algorithms for this reduction and hardware reductions. We’re going to take all that experience and bring that to the scalable learning world. We built the first in HPC machine in 1996, so two decades of scalable simulation engines. We’re going to bring that differentiator and knowledge forward to scalable learning engines — that’s the first thing. And the second thing is, as we’ve been doing that, we’ve done that in close relationship with how the software frameworks have evolved.

The post Dr. Eng Lim Goh Hails New Frontier of Scalable Learning appeared first on HPCwire.

Weekly Twitter Roundup (March 2, 2017)

Thu, 03/02/2017 - 13:49

Here at HPCwire, we aim to keep the HPC community apprised of the most relevant and interesting news items that get tweeted throughout the week. The tweets that caught our eye this past week are presented below.

David Keyes of @KAUST_HPC is a science rockstar. Love this quote he shared from David Hilbert. #SFC #BESIAMCSE #SIAMCSE17 pic.twitter.com/NzNDSca1MX

— melrom (@melrom) February 27, 2017

Our own Brian Guilfoos teaching a local high school all about #supercomputers! #HPC pic.twitter.com/m1WjcGyW9A

— OhioSupercomputerCtr (@osc) March 2, 2017

We're pleased to be part of the #HPCUserForum this week in Stuttgart #HPC pic.twitter.com/axdzvmUqrK

— Bright Computing (@BrightComputing) March 1, 2017

The @TACC Advanced Computing Systems group is installing 67 mi. of Intel's #OmniPath Architecture cables for Stampede 2, 4,000 cables total. pic.twitter.com/ZAFB6jrUYA

— TACC (@TACC) March 2, 2017

Prof. Resch on future #strategy of HLRS: less #energy, more #data, HPC #applications #exascale #engineering #training pic.twitter.com/mwKRPB2Se1

— HLRS (@HLRS_HPC) March 1, 2017

Now, those are some nice, neat, colorful cables. #Marconi #DataCenter #Supercomputing pic.twitter.com/G0BGnqZQXi

— Data Center Systems (@LenovoServers) March 2, 2017

The four ingredients for next generation algorithms by D. Keyes. #SIAMCSE17 #hpc #KAUST @TheSIAMNews pic.twitter.com/80ZY8bfrlm

— KAUST ECRC (@KAUST_ECRC) February 27, 2017

#HPC User Forum at #HLRS in Stuttgart: #MWK, #EU and participants from research and industry. #SICOSBW represents the needs of #SMEs in HPC pic.twitter.com/KcLGZ1o3G9

— SICOS-BW (@SICOS_BW) February 28, 2017

IBM #HPC #HPDA gurus in Austin Texas @ghannama & Waleed Ezzat & @Mohd_Shanawany pic.twitter.com/rup90oLtDO

— Mohamed ElShanawany (@Mohd_Shanawany) February 28, 2017

.@ETP4H at the IDC #HPC User Forum in Stuttgart pic.twitter.com/jW5GtTUY8y

— HLRS (@HLRS_HPC) February 28, 2017

Come by the #StudentsSC booth at @TheSIAMNews #SIAMCSE conference during breaks to learn about #student programs in #HPC #HPCConnects pic.twitter.com/4bzOumMu6Z

— Student Vols @SC (@StudentsSC) February 27, 2017

The #kaust alumni Ahmad Abdelfattah, the initial architect of KBLAS with @HatemLtaief #SIAMCSE17 @ICL_UTK #hpc pic.twitter.com/iNN2dd3r0n

— KAUST ECRC (@KAUST_ECRC) February 27, 2017

Juergen Kohler presents how @Daimler plans to digitise its @MercedesBenz #cars at the IDC #HPC User Forum pic.twitter.com/MyEicqz3PI

— HLRS (@HLRS_HPC) February 28, 2017

Open ACC and GPU computing with @ORNL @TheSIAMNews #BESIAMCSE #HPC #GPU #compute #HPCConnects @verolero86 pic.twitter.com/Zm2Kuz2e0e

— Sustainable Horizons (@SH_Institute) February 28, 2017

Bastian Koller of @HLRS_HPC presenting the @EXCELLERAT_CoE concept at the IDC #HPC User Forum in Stuttgart pic.twitter.com/bOI3BG2XaP

— EXCELLERAT CoE (@EXCELLERAT_CoE) February 28, 2017

.@FortissimoPro as service for #SMEs presented by Mark Parsons of @EPCCed at the IDC #HPC User Forum @HLRS pic.twitter.com/Q2OEx4phCH

— HLRS (@HLRS_HPC) February 28, 2017

Our director D. Keyes made us proud at #siamcse17 @KAUST_News @TheSIAMNews #hpc pic.twitter.com/Zcj9Hjgw6f

— KAUST ECRC (@KAUST_ECRC) February 27, 2017

Click here to view the top tweets from last week.

The post Weekly Twitter Roundup (March 2, 2017) appeared first on HPCwire.

ISC Announces Industrial Day

Thu, 03/02/2017 - 12:45

March 2 — ISC High Performance is happy to announce that it is offering a full-day program for industrial HPC users, specifically addressing challenges in the industrial manufacturing, transport and logistics sectors.

The Industrial Day, which will be offered on Tuesday, June 20, is designed and chaired by high performance computing experts, Dr. Alfred Geiger of T-Systems and Dr. Marie-Christine Sawley of Intel Lab. The new program is a reflection of their own experiences derived from working with user communities, as well as expectations of past years’ attendees.

This year’s ISC High Performance conference will be held at Messe Frankfurt from June 18 – 22, and will be attended by over 3,000 HPC community members, including researchers, scientists and business people.

The Industrial Day will focus on three main areas:

1.    Benefits of exascale computing for industrial users
2.    How to purchase HPC infrastructure
3.    Use cases for high performance data analytics (HPDA), including machine/deep learning, AI, and IoT

The day will begin with a keynote talk at 8:30 am and end at 4:45 pm with a round-table discussion between industrial HPC users and HPC service and technology providers. This discussion will set the agenda for the Industrial Day at ISC 2018.

Professor Dr. Norbert Kroll of the German Aerospace Center (DLR), Institute of Aerodynamics and Flow Technology, has been invited to deliver a keynote address on “High performance computational fluid dynamics for future aircraft design,” focusing on numerical flow simulations, which is a key element in the aerodynamic design process, complementing wind tunnel and flight testing.

In his abstract, he reveals that DLR is working on developing a next-generation computational fluid dynamics (CFD) software code, known as Flucs, to provide the basis for a consolidated flow solver. This software offers high flexibility across a wide range of multidisciplinary applications. It is also being designed to enable future HPC hardware utilization and is therefore exascale compatible.

For instance, Flucs follows a multi-level parallelization that features a shared-memory level, in addition to the established domain decomposition for distributed memory, to allow for significantly improved scalability. Moreover, Flucs features a higher-order Discontinuous Galerkin method, in addition to a 2nd-order finite-volumes discretization. These two simulation approaches are highly integrated to maximize code reuse and minimize source-code duplication.

Kroll will address the above, as well as the further design aspects of Flucs that aim to tackle the challenges of future aircraft design, and present simulation results demonstrating Flucs’ current capabilities.

About ISC High Performance

First held in 1986, ISC High Performance is the world’s oldest and Europe’s most important conference and networking event for the HPC community. It offers a strong five-day technical program focusing on HPC technological development and its application in scientific fields, as well as its adoption in commercial environments.

Over 400 hand-picked expert speakers and 150 exhibitors, consisting of leading research centers and vendors, will greet attendees at ISC High Performance. A number of events complement the Monday – Wednesday keynotes, including the Distinguished Speaker Series, the Industry Track, The Machine Learning Track, Tutorials, Workshops, the Research Paper Sessions, Birds-of-a-Feather (BoF) Sessions, Research Poster, the PhD Forum, Project Poster Sessions and Exhibitor Forums.

Source: ISC High Performance

The post ISC Announces Industrial Day appeared first on HPCwire.

Enabling Open Source High Performance Workloads with Red Hat

Thu, 03/02/2017 - 09:46

To find real value in today’s data and applications, a high-performance, rapidly scalable, resilient infrastructure foundation is key. Red Hat has technology that allows high performance workloads with a scale-out foundation that integrates multiple data sources and can transition workloads across on-premise and cloud boundaries.

The post Enabling Open Source High Performance Workloads with Red Hat appeared first on HPCwire.

Texas Multicore Technologies Announces Full Support for ARM Processors

Thu, 03/02/2017 - 07:19

AUSTIN, Tex., March 2 — Further strengthening its commitment to support all popular multicore platforms, Texas Multicore Technologies (TMT) today announced that the high performance SequenceL functional programming language and auto-parallelizing compiler now fully support ARM processors running the Linux operating system, including 32 and 64-bit architectures.

“ARM multicore processors are widely used in embedded and mobile devices, with new high core count 64-bit versions now being deployed for servers in data centers and High Performance Computing (HPC) installations,” said Doug Norton, Chief Marketing Officer for TMT. “For example, a 64-bit Cavium ThunderX server chip has 48 ARM custom cores, delivering 384 cores in a single server, well beyond the ability of most humans to program them for optimal use. But core counts are rapidly increasing across the entire ARM ecosystem, so the same need for easier programmability was there for the 32-bit versions common in embedded and IoT (Internet of Things) devices.”

TMT provides computer programming tools and services to modernize software to run on multicore computing platforms with optimal performance and portability. SequenceL is a compact, powerful functional programming language and auto-parallelizing compiler that quickly and easily converts algorithms to robust, massively parallel code. TMT has worked closely with its strategic platform partners ARMAMDIntelDellHPE, and IBM to do the hard work of building low-level platform optimizations in its tools so the broad base of software developers, engineers and scientists don’t have to.

“Energy efficiency and performance depend greatly on the ability to maximize the use of parallelism in today’s multicore systems,” said Eric Van Hensbergen, director of HPC, ARM. “The work TMT has done with their SequenceL language and compiler to support multiprocessors based on the ARMv7 and ARMv8-A architectures enables our partners to bring the most demanding scientific and industrial applications to market in cost-effective and efficient designs.”

About Texas Multicore Technologies (TMT)

TMT provides auto-parallelizing computer programming tools and services to modernize software to run on multicore computing platforms with optimal performance and portability. Founded in 2009, the company delivers easy to use, auto-parallelizing, race-free programming solutions based on the powerful SequenceL functional programming language to enable faster and better applications sooner. For more information, visit texasmulticore.com.

Source: Texas Multicore Technologies

The post Texas Multicore Technologies Announces Full Support for ARM Processors appeared first on HPCwire.

PEARC17 Paper Submission Deadline Extended to March 13

Thu, 03/02/2017 - 06:35

March 2 — The technical paper submission (Technology; Software & Data; Workforce, Diversity, and Evaluation; and Accelerating Discovery) deadline for the inaugural PEARC Conference has been extended one week to March 13, 2017. Technical papers must be submitted through the PEARC17 EasyChair site at https://easychair.org/my/conference.cgi?conf=pearc17.

The full PEARC17 Call for Participation contains details about each of the four technical tracks of papers. The technical track paper submissions may be full papers (strongly preferred) or extended abstracts.

PEARC17, which will take place in New Orleans, July 9-13, 2017, is open to professionals and students in advanced research computing. Registration for the conference and the hotel are now open. Poster, Visualization Showcase and Birds-of-a-Feather submissions are due by May 1. Opportunities for student travel support to the conference are available via the PEARC17 Student Program.

The PEARC (Practice & Experience in Advanced Research Computing) conference series is being ushered in with support from many organizations and will build upon earlier conferences’ success and core audiences to serve the broader community. Organizations supporting the new conference include XSEDE, the Advancing Research Computing on Campuses: Best Practices Workshop (ARCC), the Science Gateways Community Institute (SGCI), the Campus Research Computing Consortium (CaRC), the ACI-REF consortium, the Blue Waters project, ESnet, Open Science Grid, Compute Canada, the EGI Foundation, the Coalition for Academic Scientific Computation (CASC), and Internet2.

Source: PEARC Conference

The post PEARC17 Paper Submission Deadline Extended to March 13 appeared first on HPCwire.

SC17 Technical Paper Submissions Now Open

Wed, 03/01/2017 - 13:31

March 1 — The SC17 Conference Committee is now accepting submissions for technical papers. The Technical Papers Program at SC is the leading venue for presenting the highest-quality original research, from the foundations of HPC to its emerging frontiers. The Conference Committee solicits submissions of excellent scientific merit that introduce new ideas to the field and stimulate future trends on topics such as applications, systems, parallel algorithms, data analytics and performance modeling. SC also welcomes submissions that make significant contributions to the “state-of-the-practice” by providing compelling insights on best practices for provisioning, using and enhancing high-performance computing systems, services, and facilities.

The SC conference series is dedicated to promoting equality and diversity and recognizes the role that this has in ensuring the success of the conference series. We welcome submissions from all sectors of society.  SC17 is committed to providing an inclusive conference experience for everyone, regardless of gender, sexual orientation, disability, physical appearance, body size, race, or religion.

  • Technical Paper Submissions Open: March 1, 2017
  • Technical Paper Abstracts Submissions Close: March 20, 2017
  • Full Submissions Close: March 27, 2017
  • Notification Sent: June 15, 2017

Web Submissions: https://submissions.supercomputing.org/

Email Contact: papers@info.supercomputing.org

Source: SC17

The post SC17 Technical Paper Submissions Now Open appeared first on HPCwire.

MaX European Centre of Excellence Portal Available for PRACE Community

Wed, 03/01/2017 - 13:00

March 1 — MaX (Materials design at the Exascale) is a user-focused, problem-oriented European Centre of Excellence. It works at the frontiers of the current and future High Performance Computing (HPC) technologies, to enable the best use and evolution of HPC for materials research and innovation.

The user portal offers  services of interest to the PRACE community:  for example, info and support on use, performance & scaling of our flagship codes (Quantum Espresso, Siesta, Yambo, Fleur). For the PRACE community, these services are planned free of charge, at least in a first phase.

MaX is creating an ecosystem of capabilities, ambitious applications, data workflows and analysis, and user-oriented services. At the same time, MaX enables the exascale transition in the materials domain, by developing advanced programming models, novel algorithms, domain-specific libraries, in-memory data management, software/hardware co-design and technology-transfer actions.

Read the inaugural MaX Newsletter. To subscribe to the MaX Newsletter, please visit: http://www.max-centre.eu/subscribe-newsletter/

Source: PRACE

The post MaX European Centre of Excellence Portal Available for PRACE Community appeared first on HPCwire.

HPC Career Notes (March 2017)

Wed, 03/01/2017 - 10:15

In this monthly feature, we’ll keep you up-to-date on the latest career developments for individuals in the high performance computing community. Whether it’s a promotion, new company hire, or even an accolade, we’ve got the details. Check in each month for an updated list and you may even come across someone you know, or better yet, yourself!

Francis Alexander

(Source: Brookhaven National Laboratory)

Francis Alexander has been named the deputy director of Brookhaven National Lab’s Computational Science Initiative where he will help to further data-driven scientific discovery. Alexander joins Brookhaven from Los Alamos National Laboratory where he spent over 20 years in a variety of different positions; his latest as the acting division leader of the Computer, Computational, and Statistical Sciences Division.

“I was drawn to Brookhaven by the exciting opportunity to strengthen the ties between computational science and the significant experimental facilities—the Relativistic Heavy Ion Collider, the National Synchrotron Light Source II, and the Center for Functional Nanomaterials [all DOE Office of Science User Facilities],” said Alexander. “The challenge of getting the most out of high-throughput and data-rich science experiments is extremely exciting to me. I very much look forward to working with the talented individuals at Brookhaven on a variety of projects, and am grateful for the opportunity to be part of such a respected institution.”

Click here to view our initial coverage of the news.

Donna Cox

(Source: NCSA)

Donna Cox, the director of the Advanced Visualization Laboratory (AVL) at the National Center for Supercomputing Applications, has been honored with the Lifetime Achievement Award at the 8th annual IMERSA 2017 Summit.

IMERSA’s CEO, Dan Neafus, stated, “Donna Cox stands at the center of some of the best-known and innovative projects available in the immersive community. We all owe her and her team a great debt for their significant contributions to the world of fulldome film. This honor is truly the highest accolade we can give to someone who has had such a profound impact on our profession.”

In addition to directing the AVL, Cox is also the director of the Illinois Emerging Digital Research and Education in Arts Media Institute (eDream). More information on Cox and this achievement award can be found here.

“I feel very honored and elated to receive this award,” said Cox. “As I have been putting together my speech, it humbles me to think of all of the women who have sacrificed before, during and after my career. This award also makes me think of all those who have enabled women like me to have the opportunity to pursue educational and artistic pursuits through visualization. I am most appreciative of the teams and individuals I have collaborated with and am proud of the creative work we have been able to accomplish.”

Thom Mason

Thom Mason has stepped down as the director of Oak Ridge National Laboratory to take a new position at Battelle. In his new role as the senior vice president for laboratory operations, Mason will work with Ron Townsend, the executive vice president of global laboratory operations, to lead strategic planning for lab operations that integrate with Battelle’s overall strategic plan. Mason served as the director of ORNL for ten years.

“Thom has been an exemplary scientific leader and we’re fortunate that he will continue to be engaged with Oak Ridge National Laboratory as he uses his experience and expertise to benefit DOE, Battelle, and other labs where Battelle has a management role,” said Joe DiPietro, chairman of the UT-Battelle board of governors.

The post HPC Career Notes (March 2017) appeared first on HPCwire.

Intel Sets High Bar with Workforce Diversity Program Results

Tue, 02/28/2017 - 15:31

Intel’s impressive efforts to achieve workforce diversity and compensation equality edged up yet another notch last year according to the company’s 2016 Diversity and Inclusion Report released today. Here are a few highlights from the chipmaker:

  • Intel exceeded the 2016 hiring target with 45.1 percent diverse hiring and is committed to surpassing this in 2017.
  • Positive gains were also made in the overall representation of women, which rose 2.3 points since 2014 to 25.8 percent.
  • We hit our year-end goal of achieving 100 percent pay parity for both women and underrepresented minorities and achieved promotion parity for females and underrepresented minorities as well.
  • We met our overall diverse retention goal, retaining diverse employees better than parity, which means that we retained the overall diverse population at a higher rate than the counterpart majority.

The Intel program sets a high bar for others and in 2015 was awarded HPCwire’s Editor’s Choice Award for it Diversity in Technology Initiative. The recent progress is more evidence of Intel’s leadership in this area.

Danielle Brown, Intel

“However there is still much work to be done to achieve our 2020 goal of full representation, namely with increasing the number of underrepresented minorities and countering the retention issue. Representation of underrepresented minorities in Intel’s U.S. workforce has increased modestly from 12.3 percent in 2014 to 12.5 percent in 2016, leaving room for improvement in 2017 and informing our focus over the next three years,” wrote Danielle Brown, Chief Diversity & Inclusion Officer, VP of Human Resources, in a blog (Reflections on Intel’s Diversity & Inclusion Journey: 2016 Diversity & Inclusion Annual Report

Intel reported it maintained 100 percent pay parity for women, a goal achieved in 2015, and also achieved 100 percent pay equity for underrepresented minorities by year-end 2016. “For the first time we are announcing that we achieved promotion parity for women and URMs (under represented minorities) for our U.S. operations in 2016,” wrote Brown.

Link to Brown’s blog: http://blogs.intel.com/csr/2017/02/annual-report-2016/?cid=em-elq-24182&utm_source=elq&utm_medium=email&utm_campaign=24182&elq_cid=1192704

Link to full report: http://app.plan.intel.com/e/er?cid=em-elq-24182&utm_source=elq&utm_medium=email&utm_campaign=24182&elq_cid=1192704&s=334284386&lid=75521&elqTrackId=e4ea159d6b774829a6afd51c9fa1e480&elq=a65561b526f74f239e3ae5e49832cb3d&elqaid=24182&elqat=1

The post Intel Sets High Bar with Workforce Diversity Program Results appeared first on HPCwire.

Supermicro Announces New SuperBlade Server

Tue, 02/28/2017 - 07:04

SAN JOSE, Calif., Feb. 28 — Super Micro Computer, Inc. (NASDAQ: SMCI), a global leader in compute, storage and networking technologies including green computing, has announced its new SuperBlade server that delivers a better initial acquisition cost structure than traditional blades, rack mount and OCP designs with the density and operational efficiency of blades in an open Rack Scale Design enabled architecture.

The new 8U SuperBlade supports both current and new generation Intel Xeon processor-based blade servers with the fastest 100G EDR InfiniBand and Omni-Path switches for mission critical  enterprise as well as data center applications. It also leverages the same Ethernet switches, chassis management modules, and software as the successful MicroBlade for improved reliability, serviceability, and affordability. It maximizes the performance and power efficiency with DP and MP processors up to 205 watts in half-height and full-height blades, respectively. The new smaller form factor 4U SuperBlade maximizes density and power efficiency while enabling up to 140 dual-processor servers or 280 single-processor servers per 42U rack.

The shared infrastructure design of the new SuperBlade enables maximum power efficiency by lowering power consumption up to 20 percent, industry leading density up to 7x of 1U rack systems and reduces cabling by up to 96 percent. The SuperBlade leverages open Redfish based management and Supermicro Rack Scale Design to empower open systems management at scale.

“Our new SuperBlade optimizes not just TCO, but initial acquisition cost with industry-leading server density and maximum performance per watt, per square foot and per dollar,” said Charles Liang, President and CEO of Supermicro. “Our 8U SuperBlade is also the first and only blade system that supports up to 205W Xeon CPUs, NVMe drives and 100G EDR IB or Omni-Path switches ensuring that this architecture is optimized for today and future proofed for the next generation of technology advancements, including next generation Intel Skylake processors.”

New 8U SuperBlade Enclosure

  • Up to 20 half-height 2-socket blade servers with 40 hot-plug NVMe drives
  • Up to 10 full-height 4-socket blade servers with 80 hot-plug NVMe drives
  • One 100G EDR IB or Omni-Path switch
  • Up to 4  Ethernet (1G,10G, 25G) switches
  • One Chassis Management Module (CMM)
  • Up to 8x (N+1 or N+N redundant) 2200W Titanium Level (96%) digital power supplies

New 4U SuperBlade Enclosure

  • Up to 14 half-height 2-socket blade servers
  • Up to 28 single-socket blade server nodes
  • Up to 2 Ethernet (1G, 10G, 25G) switches
  • Up to 4 (N+1 or N+N redundant) 2200W Titanium Level (96%) digital power supplies
  • One Chassis Management Module (CMM)

SBI-4129P-C2N/T3N

  • 20 2 Intel Xeon processors (up to 205W) Up to 2TB DDR4
  • 2 hot-plug NVMe/SAS3 or 3 hot plug SATA3 drives per node
  • Up to 5 M.2 NVMe per node
  • 2x 10GbE+ 100G InfiniBand/Omnipath

SBI-8149P-T8N/C4N

  • 4 Intel Xeon processors (up to 205W ) Up to 6TB DDR4
  • 8 hot-plug NVMe/SATA3 or 4 hot plug NVMe/SAS3 drives per node
  • Up to 6 M.2 NVMe per node
  • 4x 10GbE or 2x 10GbE+100G InfiniBand/Omnipath

For complete information on SuperServer solutions from Supermicro visit www.supermicro.com.

About Super Micro Computer Inc.

Supermicro (NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology is a premier provider of advanced server Building Block Solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop/Big Data, HPC and Embedded Systems worldwide. Supermicro is committed to protecting the environment through its “We Keep IT Green” initiative and provides customers with the most energy-efficient, environmentally-friendly solutions available on the market.

Source: Super Micro Computer

The post Supermicro Announces New SuperBlade Server appeared first on HPCwire.

SC17 Experiencing Robust Exhibitor Participation

Tue, 02/28/2017 - 06:45

Feb. 28 — Even though SC17 is still more than eight months away, the SC17 Exhibits Committee is reporting significant positive exhibitor response to participating in the Exhibition in Denver this coming November.  In fact, they already eclipsed the total SC16 exhibitor count and booth space.

“As HPC’s importance only continues to grow, the most important HPC decision makers from both research and industry from all corners of the world recognize that attending SC is essential to their success,” said Bronis R. de Supinski, SC17 Exhibits Chair from Lawrence Livermore National Laboratory.  “We also spend considerable time and resources on adding new components to the exhibition that enhance both the attendee and the exhibitor experience.”

According to de Supinski some of the items being discussed for SC17 are more social media outreach for real-time exhibit floor updates as well as adding some informal meeting or networking points that are not tied to a particular booth. Further, the SC17 Emerging Technologies booth will be incorporated into the exhibit floor.

Booth space is limited, but still available. Click here to view the SC17 Exhibitors Prospectus. Click here to review the real-time floorplan or click here for the Online Application.

Source: SC17

The post SC17 Experiencing Robust Exhibitor Participation appeared first on HPCwire.

Battle Brews over Trump Intentions for Funding Science

Mon, 02/27/2017 - 12:12

The battle over science funding – how much and for what kinds of science – under the Trump administration is heating up. Today, the Information Technology and Innovation Foundation (ITIF) labeled a potential Trump plan to slash funding along the lines of a Heritage Foundation blueprint as harmful to U.S. innovation and competitiveness. Last week, Rep. Lamar Smith (R-TX), chairman of the House Science, Space and Technology Committee blasted NSF for past “frivolous and wasteful” projects, while still affirming NSF’s role as the bedrock of bedrock of taxpayer-funded basic science.

The emerging tug of war over science funding directions isn’t likely to diminish soon as competing forces struggle to influence the new administration’s policy. NSF invests about $7 billion of public funds each year on research projects and related activities. ITIF’s just released report (Bad Blueprint: Why Trump Should Ignore the Heritage Plan to Gut Federal Investment) takes direct aim at the Heritage Foundation plans (Blueprint for Balance) said to underpin Trump thinking on science and technology funding.

ITIF: “There is no doubt that many federal programs, including some that support business, could be cut, or even eliminated, with little or no negative effect on economic growth. But that doesn’t mean that most could. In fact, many programs are intended to compensate for serious market failures and effectively advance one or more of three key national goals: competitiveness, productivity, and innovation. Rather than being cut or eliminated, such programs should be improved and expanded.

“Such nuance and pragmatism, however, are not Heritage’s strengths; doctrinaire ideology is. Heritage’s analysis to support its efforts to cut $10 trillion from the deficit over 10 years is marked by profound misunderstandings about markets, technology, and the global economy. Markets sometimes work wonders, but they sometimes fail. They fail to provide sufficient incentives for innovation and knowledge creation. In an environment marked by financial market short-termism, markets fail to foster long-term investments in people and capabilities. And even if markets acting alone did maximize economic welfare, that doesn’t mean that maximization will occur on U.S. shores.”

Rep. Smith’s commentary (Fund science for a new millennium in America: Lamar Smith) presumably more reflective of the Trump position, was published in USA Today and posted on the committee web site; it seems less an attack on funding levels than a clear directive to NSF to focused on applied research directly connected to U.S. competitiveness – the defining the latter has always been a matter of debate.

Excerpt: “Despite the U.S. government spending more on research and development than any other country, American pre-eminence in several fields is slipping. Other countries are focusing investments on new technologies, advanced scientific and manufacturing facilities, and harnessing their workforces to go into STEM fields. For example, last year China launched the fastest supercomputer in the world, five times faster than any supercomputer in the United States.

“Business as usual is not the answer. NSF must be as nimble and innovative as the speed of technology, and as open and transparent as information in the digital age. NSF Director France Cordova has publicly committed NSF to accountability and transparency and restoring its original mission to support science in the national interest…When NSF is only able to fund one out of every five proposals submitted by scientists, why did it award $225,000 to study animal photos in National Geographic or $920,000 to study textile-making in Iceland during the Viking era? Why did studying tourism in northern Norway warrant $275,000 of limited federal funds?

“These grants and hundreds like them might be worthwhile projects, but how are they in the national interest and how can they justify taxpayer dollars? The federal government should not fund this type of research at the expense of other potentially ground-breaking science.”

Link to ITIF report: https://itif.org/publications/2017/02/27/trump-administration-would-torpedo-us-growth-if-it-adopts-heritage

Link to Heritage Foundation report: http://www.heritage.org/budget-and-spending/report/blueprint-balance-federal-budget-2017

Link to Rep. Smith commentary: https://science.house.gov/news/in-the-news/fund-science-new-millennium-america-lamar-smith

 

The post Battle Brews over Trump Intentions for Funding Science appeared first on HPCwire.

Google Gets First Dibs on New Skylake Chips

Mon, 02/27/2017 - 09:00

As part of an ongoing effort to differentiate its public cloud services, Google made good this week on its intention to bring custom Xeon Skylake chips from Intel Corp. to its Google Compute Engine. The cloud provider is the first to offer the next-gen Xeons, and is getting access ahead of traditional server-makers like Dell and HPE.

Google announced plans to incorporate the next-generation Intel server chips into its public could last November. On Friday (Feb. 24), Urs Hölzle, Google’s senior vice president for cloud infrastructure, said the Skylake upgrade would deliver a significant performance boost for demanding applications and workloads ranging from genomic research to machine learning.

The cloud vendor noted that Skylake includes Intel Advanced Vector Extensions (AVX-512) that target workloads such as data analytics, engineering simulations and scientific modeling. When compared to previous generations, the Skylake extensions are touted as doubling floating-point performance “for the heaviest calculations,” Hölzle noted in a blog post.

Internal testing showed improved application performance by as much as 30 percent compared to earlier generations of the Xeon-based chip. The addition of Skylake chips also gives the cloud vendor a temporary performance advantage over its main cloud rivals, Amazon Web Services and Microsoft Azure, as well as server makers. (Intel also collaborates with AWS.)

Google and Intel launched a cloud alliance last fall designed to boost enterprise cloud adoption. At the time, company executives noted that the processor’s AVX-512 extensions could help optimize enterprise and HPC workloads.

“Google and Intel have had a long standing engineering partnership working on datacenter innovation,” Diane Bryant, general manager of Intel’s datacenter group, added in a statement.

“This technology delivers significant enhancements for compute-intensive workloads” such as data analytics.

Hölzle added that Skylake was tweaked for Google Compute Engine’s family of virtual machines, ranging from standard through “custom machine types” to boost the performance of compute instances for enterprise workloads.

Google said Skylake processors are available in five public cloud regions, including those across the United States, Western Europe and the eastern Asian Pacific.

A version of this article also appears on EnterpriseTech.

The post Google Gets First Dibs on New Skylake Chips appeared first on HPCwire.

Mellanox Sets New DPDK Performance Record With ConnectX-5

Mon, 02/27/2017 - 06:59

BARCELONA, Spain, Feb. 27 — Mellanox Technologies, Ltd. (NASDAQ: MLNX), a leading supplier of high-performance, end-to-end interconnect solutions for data center servers and storage systems, has announced that its ConnectX-5 100Gb/s Ethernet Network Interface Card (NIC) has achieved 126 million packets per second (Mpps) of record-setting forwarding capabilities running the open source Data Path Development Kit (DPDK). This breakthrough performance signifies the maturity of high-volume server I/O to support large-scale, efficient production deployments of Network Function Virtualization (NFV) in both Communication Service Provider (CSP) and cloud data centers. The DPDK performance of 126 Mpps was achieved on HPE ProLiant 380 Gen9 servers with Mellanox ConnectX-5 100Gb/s interface.

The I/O intensive nature of the Virtualized Network Functions (VNFs) including virtual Firewall, virtual Evolved Packet Core (vEPC), virtual Session Border Controller (vSBC), Anti-DDoS and Deep Packet Inspection (DPI) applications have posed significant challenges to build cost-effective NFV Infrastructures that meet packet rate, latency, jitter and security requirements. Leveraging its wealth of experience in building high-performance server/storage I/O components and switching systems for High Performance Computing (HPC), Hyperscale data centers, and telecommunications operators, Mellanox has the industry’s broadest range of intelligent Ethernet NIC and switch solutions; spanning interface speeds from 10, 25, 40, 50 to 100Gb/s. In addition, both the Mellanox ConnectX series of NICs and the Spectrum series of Ethernet switches feature best-of-class packet rates with 64-Byte traffic, low and consistent latency, and enhanced security with hardware-based memory protection.

In addition to designing cutting-edge hardware, Mellanox also actively works with infrastructure software partners and open source consortiums to drive system-level performance to new levels. Mellanox has continually improved DPDK Poll Mode Driver (PMD) performance and functionality through multiple generations of ConnectX-3 Pro, ConnectX-4, ConnectX-4 Lx, and ConnectX-5 NICs.

“We have established Mellanox as the leading cloud networking vendor, by working closely with 9 out of 10 of hyperscale customers who now leverage our advanced offload and acceleration capabilities that boost total infrastructure efficiency of their cloud, analytics, machine learning deployments,” said Kevin Deierling, vice president of marketing at Mellanox Technologies. “We are extending the same benefits to our CSP customers through a distinctive blend of enhanced packet processing and virtualization and storage offload technologies, enabling them to deploy Telco cloud and NFV with confidence.”

“As CSPs deploy NFV in production, they demand reliable NFV Infrastructure (NFVI) that delivers the quality of service their subscribers demand. A critical aspect of this is making sure the NFVI offers the data packet processing performance required to support the service traffic,” said Claus Pedersen, Director, Communication Service Provider Platforms, Data Center Infrastructure Group, Hewlett Packard Enterprise. “The HPE NFV Infrastructure lab has worked closely with Mellanox to ensure that HPE ProLiant Servers with the Mellanox ConnectX series of NICs will enable our CSP customers to achieve the scale, reliability and efficiency they require of their NFV deployments.”

About Mellanox

Mellanox Technologies (NASDAQ: MLNX) is a leading supplier of end-to-end InfiniBand and Ethernet smart interconnect solutions and services for servers and storage. Mellanox interconnect solutions increase data center efficiency by providing the highest throughput and lowest latency, delivering data faster to applications and unlocking system performance capability. Mellanox offers a choice of fast interconnect products: adapters, switches, software and silicon that accelerate application runtime and maximize business results for a wide range of markets including high performance computing, enterprise data centers, Web 2.0, cloud, storage and financial services. More information is available at: www.mellanox.com.

Source: Mellanox

The post Mellanox Sets New DPDK Performance Record With ConnectX-5 appeared first on HPCwire.

PEARC17 Call for Participation Deadline Approaching

Mon, 02/27/2017 - 06:50

Feb. 27 — The first major call-for-participation deadline for the inaugural PEARC Conference is March 6, 2017. PEARC17, which will take place in New Orleans, July 9-13, 2017, is open to professionals and students in advanced research computing. Technical papers and tutorials submissions are due by March 6 and must be submitted through EasyChair, which can be found on the PEARC17 Call for Participation webpage here.

The official Call for Participation contains details about each of the four technical tracks of papers, and tutorials. The technical track paper submissions may be full papers (strongly preferred) or extended abstracts. External Program and Workshop proposals are due March 31 and Poster, Visualization Showcase and Birds-of-a-Feather submissions are due May 1.

The PEARC (Practice & Experience in Advanced Research Computing) conference series is being ushered in with support from many organizations and will build upon earlier conferences’ success and core audiences to serve the broader community. In addition to XSEDE, organizations supporting the new conference include the Advancing Research Computing on Campuses: Best Practices Workshop (ARCC), the Science Gateways Community Institute (SGCI), the Campus Research Computing Consortium (CaRC), the ACI-REF consortium, the Blue Waters project, ESnet, Open Science Grid, Compute Canada, the EGI Foundation, the Coalition for Academic Scientific Computation (CASC), and Internet2.

Follow PEARC on Twitter (PEARC_17) and on Facebook (PEARChpc).

Source: XSEDE

The post PEARC17 Call for Participation Deadline Approaching appeared first on HPCwire.

Mellanox Introduces 6WIND-Based Router and IPsec Indigo Platform

Mon, 02/27/2017 - 06:40

BARCELONA, Spain, Feb. 27 — Mellanox Technologies, Ltd. (NASDAQ: MLNX), a leading supplier of high-performance, end-to-end smart interconnect solutions for data center servers and storage systems, today announced the IDG4400 6WIND Network Routing and IPsec platform based on the combination of Indigo, Mellanox’s newest network processor, and 6WIND’s 6WINDGate packet processing software, which includes routing and security features such as IPsec VPNs.

The IDG4400 6WIND 1U platform supports 10, 40 and 100GbE network connectivity and is capable of sustaining record rates of up to 180Gb/s of encryption/decryption while providing IPv4/IPv6 Routing functions at rates up to 400Gb/s. As the result of the strong partnership between the two companies, Mellanox’s IDG4400 6WIND delivers a price/performance advantage in a turnkey solution designed for carrier, data center, cloud and Web 2.0 applications requiring high-performance cryptographic capabilities along with IP routing capabilities. The IDG4400 6WIND complements Mellanox’s Spectrum-based Ethernet switches to provide a full solution for the datacenter.

“We are proud to partner with Mellanox to include our high performance networking software in the new IDG4400 6WIND appliance,” said Eric Carmès, CEO and Founder of 6WIND. “By combining the performance and flexibility of Mellanox’s Indigo network processor together with our 6WINDGate routing and security features, we bring to market a ready-to-use networking appliance with an impressive cost/performance advantage for customers.”

“The need scalable IPsec solutions has become vital for telco, data center, hyperscale infrastructures and more,” said Yael Shenhav, vice president of product marketing at Mellanox. “As security concerns in data centers continue to rise, encryption data by default becomes a crucial requirement. The combined Mellanox and 6WIND solution provides the required security capabilities in the most efficient manner possible to our mutual customers.”

The IDG4400 delivers an effective routing and crypto alternative to traditional networking vendors at a fraction of the cost. For carriers, it enables the ability to overcome the security exposure of today’s LTE networks. In addition, scaling to millions of routes, the IDG4400 6WIND is an ideal solution for Point of Presence (POP) applications. It can also be used to enable secure data interconnect in between geographically dispersed data-centers. Customers using 6WIND software on standard x86 servers can migrate to the IDG4400 and gain a cost/performance advantage while still enjoying same software and features. The IDG4400 6WIND is a complete product backed by the extensive support capabilities of Mellanox.

For more information on the IDG4400 6WIND, come and visit us at HP’s booth no. 3E11, Hall 3.

About Mellanox

Mellanox Technologies (NASDAQ: MLNX) is a leading supplier of end-to-end InfiniBand and Ethernet smart interconnect solutions and services for servers and storage. Mellanox interconnect solutions increase data center efficiency by providing the highest throughput and lowest latency, delivering data faster to applications and unlocking system performance capability. Mellanox offers a choice of fast interconnect products: adapters, switches, software and silicon that accelerate application runtime and maximize business results for a wide range of markets including high performance computing, enterprise data centers, Web 2.0, cloud, storage and financial services. More information is available at: www.mellanox.com.

Source: Mellanox

The post Mellanox Introduces 6WIND-Based Router and IPsec Indigo Platform appeared first on HPCwire.

Thomas Sterling on CREST and Academia’s Role in HPC Research

Mon, 02/27/2017 - 06:30

The US advances in high performance computing over many decades have been a product of the combined engagement of research centers in industry, government labs, and academia. Often these have been intertwined with cross collaborations in all possible combinations and under the guidance of Federal agencies with mission critical goals. But each class of R&D environment has operated at its own pace and with differing goals, strengths, and timeframes, the superposition of which has met the short, medium, and long term needs of the nation and the field of HPC.

Different countries from the Americas, Europe, Asia, and Africa have evolved their own formulas of such combinations sometimes in cooperation with others. Many, but not all, emphasize the educational component of academic contributions for workforce development, incorporate the products of international industrial suppliers, and specialize their own government bodies to specific needs. In the US, academic involvement has provided critical long-term vision and perhaps most importantly greatly expanded the areas of pursuit.

Thomas Sterling, Director, CREST, Indiana University

The field of HPC is unique in that its success appears heavily weighted in terms of its impact on adoption by industry and community. This tight coupling sometimes works against certain classes of research, especially those that explore long term technologies, that investigate approaches outside the mainstream, or that require substantial infrastructure often beyond the capabilities or finances of academic partners. A more subtle but insidious factor is the all-important driver of legacy applications, often agency mission critical, that embody past practices constraining future possibilities.

How university research in HPC stays vibrant, advances the state of the art, and still makes useful contributions to the real world is a challenge that demands innovation in organization within schools and colleges. Perhaps most importantly, real research as opposed to important development not only involves but demands risk – it is the exploration of the unknown. Risk adverse strategies are important when goals and approaches are already determined and time to deployment is the determining factor of success. But when beyond a certain point, honesty recognizes that future methods are outside the scope of certainty, then the scientific method applies and when employed must not just tolerate by benefit from uncertainty of outcome.

Without such research into the unknown, the field is restricted to incremental perturbations of the conventional, essentially limiting the future to the cul de sac of the past. This is insufficient to drive the future means into areas beyond our sight. The power and richness of the mixed and counter balancing approaches of government labs, industry, and academia guarantee both the near-term quality of deployable hardware and software platforms and the long-term as yet understood improved concepts where the enabling technologies and their trends are distinct from the present.

This is the strength of the US HPC R&D approach and was reflected in the 2015 NSCI executive order for exascale computing. How academia conducts its component of this triad is a bit of a messy and diverse methodology sensitive to the nature of the institutions of which they are a part, the priorities of their universities, funding sources, and the vision of the individual faculty and senior administrators responsible for its direction, strategy, staffing, facilities, and accomplishments by which success will be measured. This article presents one such enterprise, the Center for Research for Extreme Scale Technologies (CREST) at Indiana University (IU) which incorporates one possible strategy balancing cost, impact, and risk on the national stage.

CREST is a medium scale research center, somewhere between small single-faculty led research groups found at many universities and those few premiere research environments such as the multiple large-scale academic laboratories at MIT and similar facilities like TACC and NCSA at UT-Austin and UIUC, respectively. While total staffing numbers are routinely in flux, a representative number is on the order of 50 people. It occupies a modern two-story building of about 20,000 square feet conveniently located within walking distance of the IU Bloomington campus and the center of the city.

CREST was established in the fall of 2011 by Prof. Andrew Lumsdaine as its founding Director, Dr. Craig Stewart as its Assistant Director, and Prof. Thomas Sterling as its Chief Scientist. Over almost six years of its existence, CREST has evolved with changes in responsibilities. Sterling currently serves as Director, Prof. Martin Swany as Associate Director, and Laura Pettit as Assistant Director. Overall staffing is deemed particularly important to ensure that all required operating functions are performed. This means significant engagement of administrative staff which is not typical of academic environments. But cost effectiveness to maximize productivity in research and education is a goal eliminating tasks that could be better performed, and at lower cost, by others. An important strategy of CREST is let everyone working as part of a team do what they are best at resulting in highest impact at lowest cost.

As per IU policy, research direction is faculty led with as many as six professors slotted for CREST augmented with another half dozen full-time research scientists including post-docs. A small number of hardware and software engineers both expedites and enhances quality of prototype development for experimentation and product delivery to collaborating institutions. CREST can support as many as three-dozen doctoral students with additional facilities for Masters and undergraduate students.

Organizationally, CREST has oversight by the Office of the Dean of the IU School of Informatics and Computing (SOIC) in cooperation with the Office of the VP of IT and the Office of the VP of Research. It coexists with the many departments making up SOIC and has the potential to include faculty and students from any and all of them. It also extends its contributions and collaborations to other departments within the university as research opportunities and interdisciplinary projects permit. While these details are appropriate, they are rather prosaic and more importantly do not describe either the mandate or the essence of CREST; that is about the research it enables.

CREST was established, not for the purposes of creating a research center, but as an enabler to conduct a focused area of research; specifically, to advance the state-of-the-art in high performance computing systems beyond conventional practices. This was neither arbitrary nor naive on the part of IU senior leadership and was viewed as the missing piece of an ambitious but realizable strategy to bring HPC leadership and capability to Indiana. Already in place was strong elements of cyber-infrastructure support and HPC data center facilities for research and education. More about this shortly. CREST was created as the third pillar of this HPC thrust by bringing original research to IU in hardware and software with a balanced portfolio of near and long term initiatives providing both initial computing environments of immediate value and extended exploration of alternative concepts unlikely to be undertaken by mainstream product oriented activities. Therefore, the CREST research strategy addresses real-world challenges in HPC including classes of applications not currently well satisfied through incremental changes to conventional practices.

One of the critical factors in the impact of CREST is its close affiliation with the Office of the Vice President for Information Technology (OVPIT), including the IU Pervasive Technology Institute (IUPTI), and University Information Technology Services (UITS). This dramatically reduces the costs and ancillary activities of CREST research by leveraging the major investments of OVPIT in support of broader facilities and services for the IU computing community permitting CREST as a work unit to be more precisely focused on its mission research while staying lean and mean. IU VP for IT and COI Brad Wheeler played an instrumental role in the creation of CREST and the recruitment of Thomas Sterling and Martin Swany to IU.

The IUPTI operates supercomputers with more than 1 PetaFLOPS aggregate processing capability, including the new Big Red II Plus, a Cray supercomputer allowing large scale testing and performance analysis of HPX+ software. This is housed and operated in a state-of-the-art 33,000 square feet data center that among its other attributes is tornado proof. IUPTI exists to aid the transformation of computer science innovation into tools usable by the practicing scientist within IU. IUPTI creates special provisions for support of CREST software on their systems and at the same time has provided two experimental compute systems (one cluster, one very small Cray test system) for dedicated use within CREST.

CREST founding director Andrew Lumsdaine (l) and current director Thomas Sterling in front of Big Red II Plus (Cray)

IUPTI staff are engaged and active in CREST activities. For example, IUPTI Executive Director Craig Stewart gave the keynote address at the 2016 SPPEXA (Special Priority Project on EXascale Applications) held in Munich, and discussed US Exascale initiatives in general and CREST technologies in particular. IUPTI coordinates their interactions with vendors with CREST so as to create opportunities for R&D partnerships and promulgation of CREST software. Last, and definitely not least, the UITS Learning Technologies Division CREST in distribution of online teaching materials created by CREST. All in all, CREST, SOIC, and OVPIT are partners in supporting basic research in HPC and rendering CS innovations to practical use for science and society while managing costs.

The CREST charter is one of focused research towards a common goal of advancing future generation of HPC system structures and applications; the Center is simply a vehicle for achieving IU’s goals in HPC and the associated research objectives, rather than is its actual existence the purpose itself. The research premise is that key factors determine ultimate delivered performance. These are: starvation, latency, overhead, waiting for contention resolution, availability including resilience, and the normalizing operation issue rate reflecting power (e.g., clock rate). Additional factors of performance portability and user productivity also contribute to overall effectiveness of any particular strategy of computation.

A core postulate of CREST HPC research and experimental development is the opportunity to address these challenge parameters through dynamic adaptive techniques through runtime resource management and task scheduling to achieve (if/when possible) dramatic improvements in computing efficiency and scalability. The specific foundational principles of the dynamic computational method used are established by the experimental ParalleX execution model which expands computational parallelism, addresses the challenge of uncertainty caused by asynchrony, permits exploitation of heterogeneity, and exhibits a global name space to the application.

ParalleX is intended to replace prior execution models such as Communicating Sequential Processes (CSP), SMP-based multiple threaded shared memory computing (e.g., OpenMP), vector and SIMD-array computing, and the original von Neumann derivatives. ParalleX has been formally specified through operational semantics by Prof. Jeremy Siek for verification of correctness, completeness, and compliance. As a first reduction to practice, a family of HPX runtime systems have been developed and deployed for experimentation and application. LSU has guided important extensions to C++ standards led by Dr. Hartmut Kaiser. HPX+ is being used to extend the earlier HPX-5 runtime developed by Dr. Luke D’Alessandro and others into areas of cyber-physical systems and other diverse application domains while supporting experiments in computer architecture.

One important area pursued by CREST in system design and operation is advanced lightweight messaging and control through the Photon communication protocol led by Prof. Martin Swany with additional work in low overhead NIC design. Many application areas have been explored. Some conventional problems exhibiting static regular data structures show little improvement through these methods. But many applications incorporating time-varying irregular data structures such as graphs found in adaptive mesh refinement, wavelet algorithms, N-body problems, particle in cell codes, and fast multiple methods among others demonstrate improvements, sometimes significant, in the multi-dimensional performance tradeoff space. These codes are developed by Drs. Matt Anderson, Bo Zhang, and others have driven this research while producing useful codes including the DASHMM library.

The CREST research benefits from both internal and external sponsorship. CREST has contributed to NSF, DOE, DARPA, and NSA projects over the last half dozen years and continues to participate in advanced research projects as appropriate. CREST represents an important experience base in advancing academic research in HPC systems for future scalable computing, employing co-design methodologies between applications and innovations in hardware and software system structures and continues to evolve. It provides a nurturing environment for mentoring of graduate students and post-docs in the context of advanced research even as the field itself continues to change under national demands and changing technology opportunities and challenges.

The post Thomas Sterling on CREST and Academia’s Role in HPC Research appeared first on HPCwire.

Advancing Modular Supercomputing with DEEP and DEEP-ER Architectures

Fri, 02/24/2017 - 14:01

Knowing that the jump to exascale will require novel architectural approaches capable of delivering dramatic efficiency and performance gains, researchers around the world are hard at work on next-generation HPC systems.

In Europe, the DEEP project has successfully built a next-generation heterogeneous architecture based on an innovative “cluster-booster” approach. The new architecture can dynamically assign individual code parts in a simulation to different hardware components based on which component can deliver the highest computational efficiency. It also provides a foundation for a modular type of supercomputing where a variety of top-level system components, such as a memory module or a data analytics module for example, could be swapped in and out based on workload characteristics. Recently, Norbert Eicker, head of the Cluster Computing research group at Jülich Supercomputing Centre (JSC), explained how the DEEP and DEEP-ER projects are advancing the idea of “modular supercomputing” in pursuit of exascale performance.

Why go DEEP?

Eicker says that the use of vectorization or multi-core processors have become the two main strategies for acceleration. He noted that the main advantages in general purpose multi-core processors include high single-thread performance due to relatively high frequency along with their ability to do out-of-order processing. Their downsides include limited energy efficiency and a higher cost per FLOP. Accelerators, such as the Intel Xeon Phi coprocessor or GPUs, on the other hand are more energy efficient but harder to program.

Given the different characteristics of general purpose processors and accelerators, it was only a matter for time before researchers began looking for ways to integrate different types of compute modules into an overall HPC system. Eicker said that most efforts have involved building heterogeneous clusters wherein standard cluster nodes are connected using a fabric and then accelerators are attached to each cluster node.

Figure 1: An example of a basic architecture for a heterogeneous cluster.

Per Eicker, this heterogeneous approach has drawbacks, including the need for static assignment of accelerators to CPUs. Since some applications benefit greatly from accelerators and others not at all, getting the ratio of CPUs to accelerators right is tricky and inevitably leads to inefficiencies. Eicker explained that the idea behind the DEEP project was to combine compute resources into a common fabric and make the accelerating resources more autonomous. The goal was to not only enable dynamic assignments between cluster nodes and the accelerator, but also to enable the accelerators to run a kind of MPI so the system could offload more complex kernels to the accelerators rather than needing to always rely on the CPU.

The building blocks of a successful prototype

Work on the prototype Dynamical Exascale Entry Platform (DEEP) system began in 2011, and was mostly finalized toward the end of 2015. It took the combined efforts of 20 partners to complete the European Commission funded project. The 500 TFLOP/s DEEP prototype system includes a “cluster” component with general-purpose Intel Xeon processors and a “booster” component with Intel Xeon Phi coprocessors along with a software stack capable of dynamically separating code parts in a simulation based on concurrency levels and sending them to the appropriate hardware component. The University of Heidelberg developed the fabric, which has been commercialized by EXTOLL and dubbed the EXTOLL 3D Torus Network.

Figure 2: The DEEP cluster-booster hardware architecture. The cluster is based on an Aurora HPC system from Eurotech. The booster includes 384 Intel Xeon Phi processors interconnected by Extoll fabric.

Given the unusual architecture, the project team knew it would need to modify and test applications from a variety of HPC fields on the DEEP system to prove its viability. The team analyzed each selected application to determine which parts would run better on the cluster and which would run better on the booster, and modified the applications accordingly. One example is a climate application from Cyprus Institute. The standard climate model part of the application runs on the cluster side while an atmospheric chemical simulation runs on the booster side, with both sides interacting with each other from time to time to exchange data.

The new software architecture

One of the most important developments of the DEEP project is a software architecture that includes new communication protocols for transferring data between network technologies, programming model extensions and other important advancements.

Figure 3: The DEEP software architecture includes standard software stack components along with some new components developed specifically for the project.

While left- and right-hand sides of the architecture in figure 3 are identical to the standard MPI-based software-stacks of most present day HPC architectures, the components in the middle add some important new capabilities. Eicker explained that in the DEEP software architecture, the main part of applications and less scalable code are only run on the cluster nodes and everything starts on the cluster side. What’s different is that the cluster part of the application can collectively start a crowd of MPI-processes on the right-hand side using a global MPI.

The spawn for the booster is a collective operation of cluster processes that creates an inter-communicator containing all parents on one side and all children on the other. For example, the MPI_COMM_WORLD or a subset of processes on the cluster side, collectively called the MPI_Comm_spawn function, can create a new MPI_COMM WORLD on the booster side that is capable of standard MPI communication. Once started, the processes on the booster side can communicate amongst each other and exchange messages, making it possible to offload complex kernels to the booster.

Using MPI to bridge between the different fabrics in the cluster and booster may seem like it would significantly complicate the lives of application developers. However, Barcelona Supercomputing Center invented what is basically a source-to-source compiler, called the OmpSs Offload Abstraction compiler that does much of the work. Developers see a familiar looking cluster side with an Infiniband-based MPI and a booster side with an EXTOLL-based MPI. Their job is to annotate the code to tell the compiler which parts should run on the cluster versus the booster. The OmpSs compiler introduces the MPI_Comm_spawn call and the other required communication calls for sharing data between the two code parts.

Eicker explained that the flexible DEEP approach has many advantages, including options for multiple operational modes that enable much more efficient use of system resources. Beyond the specialized symmetric mode described above, the booster can be used discretely, or as a pool of accelerators. He used applications that could scale on the Blue Gene system as an example, noting they be run entirely on the booster side with no cluster interaction.

From DEEP to DEEP-ER

Plans for the DEEP-ER (Dynamical Exascale Entry Platform – Extended Reach) phase include updating the booster to include the latest generation of Intel Xeon Phi processors. The team is also exploring how on-node Non-Volatile Memory (NVM), network attached memory and a simplified interface can improve the overall system capabilities.

Figure 4: The DEEP-ER cluster-booster hardware architecture.

Eicker said that since Xeon Phi processors are self-booting, the upgrade will make the hardware implementation easier. The team also significantly simplified the interface by using the EXTOLL fabric throughout the entire system. The global use of the EXTOLL fabric enabled the team to eliminate the booster interface nodes and the DEEP cluster-booster protocol. The DEEP-ER system will use a standard EXTOLL protocol running the two types of nodes. The EXTOLL interconnect also enables the system to take advantage of the network attached memory.

One of the main objectives of the DEEP-ER project is to explore scalable I/O. To that end, the project team is investigating the integration of different storage types, starting from the disks using NVM while also making use of the network attached memory. Eicker said the team is using the BeeGFS file system and extensions that enable smart caching to local NVMe devices in the common namespace of the file system to help improve performance as well as SIONlib, a scalable I/O library developed by JSC for parallel access to task-local files, to enable more efficient local tasking of I/O. Exascale10 I/O software from Seagate also sits on top of the BeeGFS file system, enabling the MPI I/O to make use of the file system cache extensions.

Beyond I/O, the DEEP-ER project is also exploring how to improve resiliency. Eicker noted that because the offloaded parts of programs are stateless in the DEEP approach, it’s possible to improve the overall resiliency of the software and make functions like checkpoint restart a lot more efficient than standard approaches.

Toward modular supercomputing

Each phase of the DEEP project is an important step forward toward modular supercomputing. Eicker said that the DEEP cluster-booster concept showed that it’s possible to integrate heterogeneous systems in new ways. With DEEP-ER, the combination of the NAM and network attached storage add what is essentially a memory booster module. Moving forward, there are all kinds of possibilities for new modules, according to Eicker. He mentioned an analytics module that might look like a cluster, but include more memory or different types of processors, or a module that acts as a graphics cluster for online visualization.

Figure 5: The end goal of the DEEP project is to create a truly modular supercomputer, which could pave the way for increasingly specialized modules for solving different types of supercomputing challenges.

The ultimate goal of the DEEP project is to build a flexible modular supercomputer that allows users to organize applications for efficient use of the various system modules. Eicker said that the DEEP-ER team hopes to extend its JURECA cluster with the next-generation Xeon Phi processor-based booster. Then the team will begin exploring new possibilities for the system, which could include adding new modules, such as a graphics, storage and data analytics modules. The next steps could even include a collaboration with the Human Brain Project on neuromorphic computing. And these ideas are only the beginning. The DEEP approach could enable scientists to dream up new modules for tackling their specific challenges. Eicker acknowledges that there is much work to be done, but he believes the co-design approach used by the DEEP team will continue to drive significant steps forward.

Watch a short video capturing highlights of Eicker’s presentation.

About the Author

Sean Thielen, the founder and owner of Sprocket Copy, is a freelance writer from Portland, Oregon who specializes in high-tech subject matter.

The post Advancing Modular Supercomputing with DEEP and DEEP-ER Architectures appeared first on HPCwire.

Pages