LOGO

aws launches trainium, its new custom ml training chip

AVATAR Frederic Lardinois
Frederic Lardinois
Editor
December 1, 2020
aws launches trainium, its new custom ml training chip

During its yearly re:Invent conference for developers, Amazon Web Services (AWS) unveiled AWS Trainium today. This represents the company’s latest generation of a custom-designed chip specifically for the training of machine learning models. AWS asserts that it will deliver superior performance compared to other cloud providers, and will be compatible with TensorFlow, PyTorch, and MXNet.

The service will be accessible through Amazon Elastic Compute Cloud (EC2) instances and integrated within Amazon SageMaker, AWS’s comprehensive machine learning platform.

These new instances, powered by these specialized chips, are scheduled for release in the coming year.

The primary benefits of these custom chips center around enhanced speed and reduced expenses. AWS projects a 30% increase in throughput and a 45% reduction in cost-per-inference when contrasted with standard AWS GPU instances.

Furthermore, AWS is collaborating with Intel to introduce EC2 instances utilizing Habana Gaudi chips for machine learning training. These instances, also planned for release next year, are anticipated to provide up to 40% improved price/performance relative to the existing GPU-based EC2 instances used for machine learning. These chips will offer support for both TensorFlow and PyTorch.

These innovative chips are slated to become available on the AWS cloud platform during the first six months of 2021.

These new developments build upon AWS Inferentia, which the company introduced at the previous year’s re:Invent event. Inferentia serves as the inference counterpart to these machine learning components, also leveraging a custom chip.

Notably, Trainium will employ the same Software Development Kit (SDK) as Inferentia.

“Inferentia tackled the expense associated with inference, which can account for as much as 90% of machine learning infrastructure costs, but many development groups also face constraints due to limited machine learning training budgets,” explains the AWS team. “This restricts the extent and frequency of training required to refine their models and applications. AWS Trainium resolves this issue by offering the highest performance and lowest cost for machine learning training within the cloud. With both Trainium and Inferentia, customers will have a complete workflow for machine learning computation, from scaling training workloads to deploying accelerated inference.”

#AWS#Trainium#machine learning#ML#AI#chip

Frederic Lardinois

From 2012 to 2025, Frederic contributed his expertise to TechCrunch. Beyond his work there, he established SiliconFilter and previously authored articles for ReadWriteWeb, which is now known as ReadWrite. Frederic’s reporting focuses on a diverse range of topics, including enterprise technology, cloud computing, developer tools, Google, Microsoft, consumer gadgets, the transportation sector, and other areas that capture his attention.
Frederic Lardinois