aws launches its next-gen gpu instances

Amazon Web Services unveiled its latest generation of GPU-powered instances today. Named P4d, these instances arrive ten years following the initial release of AWS’s first Cluster GPU instances. This newest iteration utilizes Intel Cascade Lake processors alongside eight Nvidia A100 Tensor Core GPUs. According to AWS, these instances deliver up to 2.5 times the deep learning capabilities of the preceding generation, with the potential to reduce the cost of training a similar model by approximately 60%.
Image Credits: AWSCurrently, a single configuration is available: the p4d.24xlarge instance, in AWS terminology. The eight A100 GPUs are interconnected via Nvidia’s NVLink communication technology and also support the company’s GPUDirect interface.
Boasting 320 GB of high-bandwidth GPU memory and 400 Gbps networking, this represents a substantial amount of computing power. Furthermore, it includes 96 CPU cores, 1.1 TB of system memory, and 8 TB of SSD storage, which explains the on-demand cost of $32.77 per hour. However, the price decreases to under $20/hour with one-year reserved instances and $11.57/hour with three-year reserved instances.
Image Credits: AWSFor particularly demanding applications, it’s possible to combine 4,000 or more GPUs into an EC2 UltraCluster – AWS’s designation for these exceptionally powerful machines – effectively creating a supercomputer-scale resource. While the cost likely prohibits using these clusters for training models for smaller projects, AWS has already collaborated with several enterprise clients to evaluate these instances and clusters, including Toyota Research Institute, GE Healthcare, and Aon.
“At [Toyota Research Institute], our goal is to create a future with mobility for all,” stated Mike Garrison, Technical Lead, Infrastructure Engineering at TRI. “The prior P3 instances enabled us to decrease machine learning model training time from days to hours, and we anticipate that P4d instances, with their increased GPU memory and enhanced floating-point precision, will allow our machine learning team to train even more intricate models at an accelerated pace.”