(1) For NVIDIA's classical-quantum computer integration, see
CUDA-Q.
(2) (
Compute
Unified
Device
Architecture) NVIDIA's GPU platform. Every NVIDIA GPU has some number of CUDA processing cores that operate in parallel to render 3D scenes on the screen as well as perform parallel computations for cryptography, scientific applications and AI. The CUDA programming interface (API) exposes the CUDA cores for the developer. See
GPU.
CUDA was introduced in 2007 when NVIDIA's GPU was the GeForce 8 and AI was hardly on the tip of everyone's tongue. Unlike a CPU, which contains a handful of cores for general data processing, a GPU can contain thousands of CUDA cores each performing a single multiply-and-accumulate calculation in parallel for graphics and high-performance computing. See
GeForce,
PhysX and
CUDA core.
CUDA Cores and Tensor Cores
After Google created the Tensor Processing Unit (TPU) for AI, NVIDIA added Tensor cores to its GPUs (see
Tensor core). Whereas CUDA does multiply and add separately, Tensor performs them in one operation to speed up neural network training and inference.
All new NVIDIA GPUs contain both CUDA and Tensor cores, as well as cores dedicated to ray tracing. Because CUDA was created for general-purpose parallel computing, CUDA programming is commonly used to control the Tensor cores. See
GPU.
CUDA C/C++ and CUDA Fortran
CUDA operations are typically programmed in C++ and compiled with NVIDIA's CUDA compiler. A CUDA Fortran compiler was developed by the Portland Group (PGI), which was acquired by NVIDIA. See
GPU and
AI programming.