NeMo LLM Service: Nvidia's First Cloud Service Makes AI Less Vague |

Since 1987 - Covering the Fastest Computers in the World and the People Who Run Them

Nvidia is trying to uncomplicate AI with a cloud service that makes AI and its many forms of computing less vague and more conversational.

The NeMo LLM service, which Nvidia called its first cloud service, adds a layer of intelligence and interactivity for users to harmoniously interact with complex AI models in domains such as biotechnology and medicine.

Some AI models that have been developed or are in research can be complicated, and need to be turned into useful enterprise applications that can fit in real-world commercial settings, said Ian Buck, the general manager and vice president of Accelerated Computing at Nvidia.

“We need to tailor these large language models to answer questions in certain ways, to give them context, and the domain problem to solve,” said Buck in a press briefing ahead of the company’s fall GPU Technology Conference, which is being held virtually this week.

Large language models are seen as a foundation technology to simplify user interaction with AI. The more recent DALL-E 2, which has 3.5 billion parameters, can generate images from a natural description of a few words, like one would use to describe art.

The NeMo LLM will make large language models easier to access so enterprises can play with it, experiment and deploy these models for their specific use case.

While DALL-E 2 is a simple example of a generic use of a large language model, and Nvidia is tuning NeMo LLM service to add a conversational element to specialized domains that include finance, technology or medicine.

“This service will help bring large language models to all sorts of different use cases – to generate profit summaries, for product reviews, to build technical Q&A, for medical use cases,” Buck said.

The cloud service takes pre-existing, pre-trained models such as the NeMo Megatron model (530 billion parameters), GPT-3 (5 billion and 20 billion parameter variants) or T5 (3 billion parameter variant) and builds a domain specific framework around it. The LLM will help models answer questions in a language best suited to a specific domain.

“You don’t need to train the large language model from scratch. We’ve already done that and made it easy for you,” Buck said.

The service is easy to utilize, and doesn’t require a lot of coding. A developer has to enter domain prompts, examples of questions and how it wants to be answered, and text or summarizations. The servers then train the model to answer the questions in that particular way. The output is a cloud-based API for users to interact with the service or use in applications.

Nvidia is also kicking off the NeMo LLM cloud service with BioNeMo, which provides researchers access to pre-trained chemistry and biology language models. These services will help research interact and manipulate protein and data for applications like drug discovery.

“Luckily chemistry and biology have their own languages – SMILE strings for chemistry, amino acids for proteins, and nucleic acids for DNA and RNA,” said Kimberly Powell, vice president and general manager of healthcare at Nvidia.

The first of two BioNeMo protein models, the ESM-1, captures or encodes important biological features of large protein databases. The model was originally developed by Meta (Facebook’s parent company), and was retrained by Nvidia and is now being offered as a service. That model is designed for downstream use in the research or enterprise communities.

“Users of the service can input an amino acid sequence and the model will infer 1000s of representations per second. That can be used to train a task specific model like predicting a protein stability or solubility,” Powell said.

The BioNeMo service also provides a model developed by the OpenFold consortium, which predicts 3D protein structure from an amino acid sequence in just minutes.

“Otherwise, you have to use experiments to determine 3D structures. And they’re very difficult, expensive and can take years,” Powell said.

The OpenFold Consortium, which includes academics, startups and companies in biotechnology and pharmaceutical sectors, developed the open-source protein language model. Nvidia will serve the model, but will also continue to iterate and co-develop the models with the consortium. Unlike ESM-1, Nvidia didn’t retrain the model.

Users will get early access to BioNemo next month.

The NeMo LLM cloud service will be deployed in datacenters that Nvidia classifies as “AI factories.” Customers can throw raw data into the factory, with the output being a glossy end product that is ready to deploy.

The NeMo LLM is the latest addition to a stable of software machines deployed in Nvidia’s AI factory. Other software products in Nvidia’s AI factory include RIVA, which is a speech AI, and Merlin, which is a recommender system.

The NeMo LLM will take advantage of the new H100 GPUs based on the latest Hopper architecture, which Nvidia says is now in full production (although the full SXM capability is awaiting the availability of Intel’s Sapphire Rapids CPUs). Nvidia said eight H100 GPUs can match the output of 64 previous-generation A100 GPUs.

Large language models like NeMo in Nvidia’s cloud service are based on the transformer architecture, which helps AI understand what parts of a sentence, image, or disparate data points are related to each other. That is unlike convolutional neural networks, which look only at their immediate neighboring relationships.

“Transformers can rein in the more distinct relationships and that’s important for a whole class of problems. Natural language processing is important because in order to understand the meaning of a word, you have to look at the whole sentence, and even a paragraph, and the same is the case with a number of other domains,” Paresh Kharya, senior director of product management and marketing at Nvidia, told HPCwire.

Transformers allowed Nvidia to create more distinct relationships in languages, and also train on unlabeled datasets.

“It greatly expanded the volume of data. In the case of NLP, it’s all the data on the internet. In the case of genomics and protein sequencing, the known structures and the behaviors and patterns is the data set that we have,” Khariya said.

The Hopper architecture has transformer engines that work at FP8 precision. Along with the software, Hopper is able to dynamically tune and adapt to the precision needed by the different layers in a model, and able to speed up training without changing or impacting the accuracy.

The new pretrained models offered by NeMo LLM take advantage of an emerging method called “prompt learning.”

The prompt learning method involves taking a large language model that has already been pre-trained, and adding a few examples on the type of tasks, the answers expected, and the types of responses expected when faced with a certain type of question. At the end of the learning cycle, based on the input, the main pre-trained model doesn’t change, but a prompt token is issued, which provides the context.

“The next time you’re asking a question of a similar type, you provide that question along with that prompt token. And that token gives the model the context it needs to answer that question more accurately,” Kharya said.

The process is called P-tuning, which takes advantage of the new transformer cores in the Hopper GPU. The P-tuning process can provide up to a five times speed up in deployment of LLMs compared to the previous-generation A100 GPUs, Kharya said.

The models could be trained or tuned over multiple types of GPUs besides Hopper, and the performance nonetheless goes up with faster bandwidth and connectivity with Hopper’s HBM3 memory and NVLink interconnect.

Nvidia said access to NeMo LLM service will be direct to enterprise starting next month and won’t be available to the public.

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Nvidia is not interested in bringing software support to its GPUs for the RISC-V architecture despite being an early adopter of the open-source technology in its GPU controllers. Nvidia has no plans to add RISC-V support for CUDA, which is the proprietary GPU software platform, a company representative... Read more…

Microsoft shared details on how it uses an AMD technology to secure artificial intelligence as it builds out a secure AI infrastructure in its Azure cloud service. Microsoft has a strong relationship with Nvidia, but is also working with AMD's Epyc chips (including the new 3D VCache series), MI Instinct accelerators, and also... Read more…

In his GTC keynote today, Nvidia CEO Jensen Huang launched another new Nvidia GPU architecture: Ada Lovelace, named for the legendary mathematician regarded as the first computer programmer. The company also announced tw Read more…

Just about six months ago, Nvidia’s spring GTC event saw the announcement of its hotly anticipated Hopper GPU architecture. Now, the GPU giant is announcing that Hopper-generation GPUs (which promise greater energy eff Read more…

Nvidia is trying to uncomplicate AI with a cloud service that makes AI and its many forms of computing less vague and more conversational. The NeMo LLM service, which Nvidia called its first cloud service, adds a layer of intelligence and interactivity... Read more…

Dr. Fabio Baruffa, Sr. HPC & QC Solutions Architect Dr. Pavel Lougovski, Pr. QC Research Scientist Tyson Jones, Doctoral researcher, University of Oxford

Currently, an enormous effort is underway to develop quantum computing hardware capable of scaling to hundreds, thousands, and even millions of physical (non-error-corrected) qubits. Read more…

Insurance is a highly regulated industry that is evolving as the industry faces changing customer expectations, massive amounts of data, and increased regulations. A major issue facing the industry is tracking insurance fraud. Read more…

Nvidia is laying the groundwork for a future in which humans and robots will be collaborators in the surgery rooms at hospitals. The company announced a computer called IGX for Medical Devices, which will be populated in robots, image scanners and other computers and medical devices involved in patient care close to the point... Read more…

In his GTC keynote today, Nvidia CEO Jensen Huang launched another new Nvidia GPU architecture: Ada Lovelace, named for the legendary mathematician regarded as Read more…

Just about six months ago, Nvidia’s spring GTC event saw the announcement of its hotly anticipated Hopper GPU architecture. Now, the GPU giant is announcing t Read more…

The are many issues in quantum computing today – among the more pressing are benchmarking, networking and development of hybrid classical-quantum approaches. Read more…

Albert Einstein famously described quantum mechanics as "spooky action at a distance" due to the non-intuitive nature of superposition and quantum entangled par Read more…

The need for speed is a hot topic among participants at this week’s AI Hardware Summit – larger AI language models, faster chips and more bandwidth for AI machines to make accurate predictions. But some hardware startups are taking a throwback approach for AI computing to counter the more-is-better... Read more…

It is perhaps not surprising that the big cloud providers – a poor term really – have jumped into quantum computing. Amazon, Microsoft Azure, Google, and th Read more…

In April 2018, the U.S. Department of Energy announced plans to procure a trio of exascale supercomputers at a total cost of up to $1.8 billion dollars. Over the ensuing four years, many announcements were made, many deadlines were missed, and a pandemic threw the world into disarray. Now, at long last, HPE and Oak Ridge National Laboratory (ORNL) have announced that the first of those... Read more…

The U.S. Senate on Tuesday passed a major hurdle that will open up close to $52 billion in grants for the semiconductor industry to boost manufacturing, supply chain and research and development. U.S. senators voted 64-34 in favor of advancing the CHIPS Act, which sets the stage for the final consideration... Read more…

The 59th installment of the Top500 list, issued today from ISC 2022 in Hamburg, Germany, officially marks a new era in supercomputing with the debut of the first-ever exascale system on the list. Frontier, deployed at the Department of Energy’s Oak Ridge National Laboratory, achieved 1.102 exaflops in its fastest High Performance Linpack run, which was completed... Read more…

Amid the high-performance GPU turf tussle between AMD and Nvidia (and soon, Intel), a new, China-based player is emerging: Biren Technology, founded in 2019 and headquartered in Shanghai. At Hot Chips 34, Biren co-founder and president Lingjie Xu and Biren CTO Mike Hong took the (virtual) stage to detail the company’s inaugural product: the Biren BR100 general-purpose GPU (GPGPU). “It is my honor to present... Read more…

The first-ever appearance of a previously undetectable quantum excitation known as the axial Higgs mode – exciting in its own right – also holds promise for developing and manipulating higher temperature quantum materials... Read more…

Additional details of the architecture of the exascale El Capitan supercomputer were disclosed today by Lawrence Livermore National Laboratory’s (LLNL) Terri Read more…

Tesla has revealed that its biggest in-house AI supercomputer – which we wrote about last year – now has a total of 7,360 A100 GPUs, a nearly 28 percent uplift from its previous total of 5,760 GPUs. That’s enough GPU oomph for a top seven spot on the Top500, although the tech company best known for its electric vehicles has not publicly benchmarked the system. If it had, it would... Read more…

HPCwire takes you inside the Frontier datacenter at DOE's Oak Ridge National Laboratory (ORNL) in Oak Ridge, Tenn., for an interview with Frontier Project Direc Read more…

AMD is getting personal with chips as it sets sail to make products more to the liking of its customers. The chipmaker detailed a modular chip future in which customers can mix and match non-AMD processors in a custom chip package. "We are focused on making it easier to implement chips with more flexibility," said Mark Papermaster, chief technology officer at AMD during the analyst day meeting late last week. Read more…

Intel reiterated it is well on its way to merging its roadmap of high-performance CPUs and GPUs as it shifts over to newer manufacturing processes and packaging technologies in the coming years. The company is merging the CPU and GPU lineups into a chip (codenamed Falcon Shores) which Intel has dubbed an XPU. Falcon Shores... Read more…

The long-troubled, hotly anticipated MareNostrum 5 supercomputer finally has a vendor: Atos, which will be supplying a system that includes both Nvidia and Inte Read more…

The Universal Chiplet Interconnect Express (UCIe) consortium is moving ahead with its effort to standardize a universal interconnect at the package level. The c Read more…

Fusion, the nuclear reaction that powers the Sun and the stars, has incredible potential as a source of safe, carbon-free and essentially limitless energy. But Read more…

You may recall that efforts proposed in 2020 to remake the National Science Foundation (Endless Frontier Act) have since expanded and morphed into two gigantic bills, the America COMPETES Act in the U.S. House of Representatives and the U.S. Innovation and Competition Act in the U.S. Senate. So far, efforts to reconcile the two pieces of legislation have snagged and recent reports... Read more…

Just a couple of weeks ago, the Indian government promised that it had five HPC systems in the final stages of installation and would launch nine new supercomputers this year. Now, it appears to be making good on that promise: the country’s National Supercomputing Mission (NSM) has announced the deployment of “PARAM Ganga” petascale supercomputer at Indian Institute of Technology (IIT)... Read more…

HPCwire is a registered trademark of Tabor Communications, Inc. Use of this site is governed by our Terms of Use and Privacy Policy.

Reproduction in whole or in part in any form or medium without express written permission of Tabor Communications, Inc. is prohibited.

Fully Customizable

Focused On Continuous Improvement, Highest Business Standards And Added Value For Our Customers

Custom Printed Your Logo, Brand, Website, Or Other Information You Want

NeMo LLM Service: Nvidia's First Cloud Service Makes AI Less Vague

Featured Products

News & Blog

Proto Labs : How to Avoid ‘Over-Engineering' Your Next Part Design | MarketScreener

Attention Required! | Cloudflare

Mid-State students earn 10 medals at SkillsUSA state conference - Point/Plover Metro Wire

Burloak Technologies receives Nadcap for heat treatment

OEMs value Titletown Manufacturing's precision machining abilities - Digital Journal

MACH beats all expectations for XYZ Machine Tools - Aerospace Manufacturing

Technician, Engineering Design and Manufacturing Centre job with UNIVERSITY OF SOUTHAMPTON | 290956

Growing Automobile Sales Is Expected To Underpin Fine Blanking Tools Market Growth 2032 - Digital Journal

From universal turning to complete machining

Demand in Product Prototyping Market Rises due to Increasing Need to Reduce Manufacturing Costs - Digital Journal