| Page 209 | Kisaco Research

Demo: High-throughput LLM inference with Kubernetes, llm-d, and Google Cloud TPUs

This hands-on session is designed for developers and architects building and scaling generative AI services. We will provide a practical look at Google Kubernetes Engine (GKE) as the foundation for high-performance large language model (LLM) inference. The session will feature a live demo of the GKE Inference Gateway, highlighting its model-aware routing and serving priority features. We will then delve into the open-source llm-d project, showcasing its vLLM-aware scheduling and disaggregated serving capabilities. To cap it off, we'll explore the impressive performance gains of running vLLM on Cloud TPUs for maximum throughput and efficiency. You will leave with actionable insights and code examples to optimize your LLM serving stack.

Author:

Nathan Beach

Director, Product Management

Google Cloud

Nathan Beach is Director of Product Management for Google Kubernetes Engine (GKE). He leads the product team working to make GKE a great platform on which to run AI workloads. He received his MBA from Harvard Business School and, prior to Google, led his own startup. He is a builder and creator passionate about making products that superbly meet user needs. He enjoys career coaching and mentoring, and he is eager to help others transition into product management and excel in their careers.

Read more about Demo: High-throughput LLM inference with Kubernetes, llm-d, and Google Cloud TPUs

Unlocking What’s Next: Edge AI Use Cases on Metis®

Innovation happens where AI meets the edge. In this interactive session, we’ll demonstrate how the Metis® platform enables breakthrough applications across industries. Discover how scalable, efficient inference unlocks new possibilities in manufacturing, logistics, energy, and beyond.

Author:

Manuel Botija

VP, Product Management

Axelera AI

Manuel Botija is an engineer with degrees from Telecom Paris and Universidad Politécnica de Madrid. Over the past 17 years, he has led product innovation in semiconductor startups across Silicon Valley and Europe. Before joining Axelera, Manuel served as Head of Product at GrAI Matter Labs, which was acquired by Snap Inc.

Author:

David Marks

Field Application Engineer

Axelera AI

Read more about Unlocking What’s Next: Edge AI Use Cases on Metis®

NeuReality's First True AI-CPU: Unleashing AI Accelerator Potential and Redefining Inference Price/Performance

Outdated x86 CPU/NIC architectures bottleneck AI's power, limiting true Generative AI potential. NeuReality's groundbreaking NR1® Chip combines entirely new categories of AI-CPU and AI-NIC into one single chip, fundamentally redefining AI data center inference solutions. It solves these bottlenecks, boosting Generative AI token output up to 6.5x for the same cost and power versus x86 CPU systems, making AI widely affordable and accessible for businesses and governments. It works in harmony with any AI Accelerator/GPU, maximizing GPU utilization, performance, and system energy efficiency. Our NR1® Inference Appliance, with its built-in software, intuitive SDK, and APIs, comes preloaded with out-of-the-box LLMs like Llama 3, Mistral, DeepSeek, Granite, and Qwen for rapid, seamless deployment with significantly reduced complexity, cost, and power consumption at scale.

Author:

Moshe Tanach

Co-Founder & CEO

NeuReality

Moshe Tanach is Founder and CEO at NeuReality.

Before founding NeuReality, he served as Director of Engineering at Marvell and Intel, leading complex wireless and networking products to mass production.

He also served as Appointed Vice President of R&D at DesignArt-Networks (later acquired by Qualcomm) developing 4G base station products.

He holds Bachelor of Science in Electrical Engineering (BSEE) from the Technion, Israel, Cum Laude.

Read more about NeuReality's First True AI-CPU: Unleashing AI Accelerator Potential and Redefining Inference Price/Performance

Sell to an investor

Read more about Sell to an investor

Five Forces Reshaping AI Infrastructure in 2025

Submitted by Agne Zarombaite on May 21, 2025 - 12:24pm

58.8_socials_1080x1080_3.png

Five Forces Reshaping AI Infrastructure in 2025

Submitted by Agne Zarombaite on May 21, 2025 - 12:22pm

58.8_socials_1080x1080_3.png

Over the last six months, we held two dozen closed‑door interviews with the people who pour the concrete, sign the power‑purchase agreements, and deploy the GPUs that drive today’s AI boom. They ranged from Fortune‑100 cloud operators and traditional utilities to private‑equity financiers and immersion‑cooling specialists. Taken together, the conversations reveal a market in hyper‑growth mode but constrained by physics - power density, transmission capacity, thermal limits, and by a brutally tight equipment supply chain. Five forces rise above the noise and will shape every capital‑allocation decision in AI infrastructure during 2025.

Powering Women in AI Infrastructure Panel

Author:

Nalini Garg

California State Ambassador

Women In AI

Nalini is a recognized leader at Deloitte in AI & Data Practice, standing at the forefront of AI innovation and Strategy. With a distinct path of achievements, she delivered over 30 talks on generative AI, emphasizing its safe usage and business efficiencies for life sciences and tech companies. Nominated as top 100 women in AI for 2023, 2024, she currently serves as the California state ambassador for Women in AI. Beyond her professional prowess, Nalini is a wellness instructor, artist, and a podcast host.

Author:

Ipsita Mohanty

Vice Chair

IEEE Women in Engineering

Ipsita Mohanty is an accomplished AI Leader and Vice Chair of IEEE Women in Engineering, Santa Clara Valley, Region 6. With over 15 years of experience spanning Salesforce, Goldman Sachs, Walmart Labs, and Amdocs, she has led several impactful AI-driven initiatives, including architecting Salesforce-Tableau Agents and enhancing recommendation systems at Walmart. Ipsita holds a Master’s degree in Information Technology from Carnegie Mellon University and a Bachelor’s in Computer Science and Engineering from KIIT University. She is also the co-founder of QApp, an AI/ML automation platform empowering freelancers. Winner of several AI Awards and recognized among the Top 100 Women in AI Leaders globally, she actively contributes to AI research and community leadership while championing diversity in technology.

Author:

Shweta Behere

Senior Engineering Manager

Workday

Shweta Behere is a Senior Engineering Manager at Workday, where she leads cloud platform engineering teams dedicated to building resilient and scalable systems. She is at the forefront of delivering AI solutions to enhance cloud platform infrastructure and observability. She currently serves as the IEEE Chair for the Women in Engineering Affinity Group for the Santa Clara Valley Section.

With over 18 years of experience, Shweta has made significant contributions to advancing public cloud, storage, and virtualization technologies, resulting in multiple patents and publications highlighting her innovative impact. Her leadership has earned her industry recognition, including being named among the “Top 50 Women of Impact in 2025" by Women Impact Tech and one of the “30 Outstanding Women in Engineering Leadership” by the Women Tech Network for her substantial contributions to the field.

Shweta is passionate about promoting women in engineering fields. As a thought leader, she addresses the challenges of balancing career and parenthood in tech, sharing her insights through articles and panel discussions. Through her mentorship, advocacy, and leadership, Shweta Behere inspires the next generation of technologists, especially women, and actively cultivates a more inclusive and supportive environment.

Read more about Powering Women in AI Infrastructure Panel

Powering Women in AI Infrastructure

Author:

Nalini Garg

California State Ambassador

Women In AI

Nalini is a recognized leader at Deloitte in AI & Data Practice, standing at the forefront of AI innovation and Strategy. With a distinct path of achievements, she delivered over 30 talks on generative AI, emphasizing its safe usage and business efficiencies for life sciences and tech companies. Nominated as top 100 women in AI for 2023, 2024, she currently serves as the California state ambassador for Women in AI. Beyond her professional prowess, Nalini is a wellness instructor, artist, and a podcast host.

Author:

Ipsita Mohanty

Vice Chair

IEEE Women in Engineering

Ipsita Mohanty is an accomplished AI Leader and Vice Chair of IEEE Women in Engineering, Santa Clara Valley, Region 6. With over 15 years of experience spanning Salesforce, Goldman Sachs, Walmart Labs, and Amdocs, she has led several impactful AI-driven initiatives, including architecting Salesforce-Tableau Agents and enhancing recommendation systems at Walmart. Ipsita holds a Master’s degree in Information Technology from Carnegie Mellon University and a Bachelor’s in Computer Science and Engineering from KIIT University. She is also the co-founder of QApp, an AI/ML automation platform empowering freelancers. Winner of several AI Awards and recognized among the Top 100 Women in AI Leaders globally, she actively contributes to AI research and community leadership while championing diversity in technology.

Author:

Shweta Behere

Senior Engineering Manager

Workday

Shweta Behere is a Senior Engineering Manager at Workday, where she leads cloud platform engineering teams dedicated to building resilient and scalable systems. She is at the forefront of delivering AI solutions to enhance cloud platform infrastructure and observability. She currently serves as the IEEE Chair for the Women in Engineering Affinity Group for the Santa Clara Valley Section.

With over 18 years of experience, Shweta has made significant contributions to advancing public cloud, storage, and virtualization technologies, resulting in multiple patents and publications highlighting her innovative impact. Her leadership has earned her industry recognition, including being named among the “Top 50 Women of Impact in 2025" by Women Impact Tech and one of the “30 Outstanding Women in Engineering Leadership” by the Women Tech Network for her substantial contributions to the field.

Shweta is passionate about promoting women in engineering fields. As a thought leader, she addresses the challenges of balancing career and parenthood in tech, sharing her insights through articles and panel discussions. Through her mentorship, advocacy, and leadership, Shweta Behere inspires the next generation of technologists, especially women, and actively cultivates a more inclusive and supportive environment.

Author:

Apala Guha

Principal AI Architect

Microsoft

Apala is Principal AI Architect at Microsoft Azure. She works on hardware-software codesign for supporting AI applications in Microsoft datacenters. She holistically analyzes hardware technologies, software optimizations and large language model architectures to design efficient future clouds. She has a PhD in Computer Science from the University of Virgina and a long history of working on AI accelerators, including at the startups, Lightmatter and Tenstorrent.

Read more about Powering Women in AI Infrastructure

Author:

Nathan Beach

Google

Author:

Manuel Botija

Author:

David Marks

Axelera

Website: https://www.axelera.ai/

Author:

Moshe Tanach

NeuReality

Website: https://www.neureality.ai/

Author:

Nalini Garg

Author:

Ipsita Mohanty

Author:

Shweta Behere

IEEE Women in Engineering

Website: https://wie.ieee.org/

Women in AI

Website: https://www.womeninai.co/

Author:

Nalini Garg

Author:

Ipsita Mohanty

Author:

Shweta Behere

Author:

Apala Guha

IEEE Women in Engineering

Website: https://wie.ieee.org/

Women in AI

Website: https://www.womeninai.co/