NVIDIA unveils next-gen GPU platform focused on AI inference efficiency
The new architecture targets lower latency and power per token, with optimized transformer kernels and memory pipelines for LLMs.
Read more →Curated news and product updates across AI, cloud, data and developer platforms.
The new architecture targets lower latency and power per token, with optimized transformer kernels and memory pipelines for LLMs.
Read more →Updates make it easier to blend text, image, and structured tools in a single flow, with expanded governance controls.
Read more →New controls for safety, cost attribution, and enterprise connectors reduce friction for production GenAI deployments.
Read more →Deeper plugin model and orchestration features help teams turn internal systems into natural-language copilots.
Read more →Faster hybrid search and guardrails improve retrieval quality for RAG while reducing infrastructure overhead.
Read more →New neural blocks and memory bandwidth targets cut energy cost per inference for private, offline experiences.
Read more →Improved evaluations and distribution tooling aim to make open models easier to adopt responsibly.
Read more →Build policies, provenance, and SBOM integration move into the default developer workflow to reduce risk.
Read more →Tighter online/offline parity simplifies production ML with lower operational overhead.
Read more →Edge caching for embeddings and retrieved chunks reduces tail latency for AI assistants worldwide.
Read more →