How to Use AMD GAIA: A Complete Guide to Running LLM Locally

Last update: 09/10/2025
Author Isaac
  • GAIA runs LLM locally on Windows with hybrid NPU+iGPU support on Ryzen AI.
  • Use Lemonade SDK and RAG (LlamaIndex) for contextualized and accurate answers.
  • Two installers: hybrid (Ryzen AI 300) and generic with Ollama for any PC.

gaia

La Artificial Intelligence Generative technology is experiencing a sweet moment and, with it, the need to run large language models at home has skyrocketed; in this context, AMD GAIA appears to be an easy way to run LLM locally, without relying on the cloud and reinforcing the privacy of your data. This open source proposal is designed for Windows, works on general computers and, when there is hardware Ryzen AI takes advantage of the NPU and even the iGPU to accelerate inference.

If you are worried about what you send to external servers or you are fed up with waiting, this project will sound like heavenly music to you, because GAIA offers lower latency, greater control and highly optimized performance in portable with AMD Ryzen AI 300 Series. Additionally, it relies on the Lemonade SDK to expose a web service compatible with the Lemonade API. OpenAI, integrates a RAG pipeline to contextualize responses and brings agents ready to work from minute one.

What is AMD GAIA and what exactly does it offer?

GAIA is a development of open source oriented to install and run applications Generative AI directly on your Windows PC. It's designed so anyone can run LLMs—such as Llama families and derivatives—without setting up a complex infrastructure or sending sensitive information to the cloud.

Its great advantage is that Take full advantage of the Ryzen AI's Neural Processing Unit (NPU) And, in hybrid mode, it combines that NPU with the integrated GPU (iGPU) to distribute loads and further accelerate inference. On compatible machines, the Ryzen AI 300 NPU delivers up to 50 TOPS, resulting in smooth, power-efficient natural language tasks.

At the same time, the project contemplates a universal path: a generic installer that works on any Windows PC, regardless of whether it's AMD or not. This mode uses Ollama as the backend to run the models, so you can try GAIA even if your computer doesn't have dedicated accelerator hardware.

To enrich the answers, GAIA incorporates a focus on Recovery Enhanced Generation (RAG)This allows for retrieving relevant information, reasoning with additional context, planning, and deploying external tools within a truly interactive chat experience. Today, the project includes four agents out of the box, with more on the way with community support.

AMD GAIA Step-by-Step Guide

Technical architecture: Lemonade SDK, RAG, and GAIA components

The technical basis is based on the Lemonade SDK (TurnkeyML/ONNX), which provides utilities for LLM-specific tasks: prompting, precision measurement, and serving across multiple runtimes (e.g., Hugging Face or ONNX Runtime GenAI API) and hardware (CPU, iGPU, and NPU).

In this scheme, Lemonade exposes an LLM web service with an OpenAI-compatible REST API, and GAIA consumes that service to orchestrate the experience. Within GAIA we find three key blocks that fit like a glove with the RAG pipeline:

  • LLM Connector: Bridges the NPU service Web API with the LlamaIndex-based RAG pipeline, managing calls and prompt formatting.
  • RAG Pipeline with LlamaIndex: includes the query engine and a vector memory, responsible for processing and storing relevant context from external sources.
  • Agent Web Server: connects to the GAIA interface via WebSocket, allowing real-time interaction with the user.
  The way to Get Climate Alerts on iPhone Lock Display screen

The workflow is clear and enhances accuracy: your query is vectorized, the relevant context is retrieved from local indexes, that context is injected into the LLM prompt, and finally, The response is sent by streaming to the UI. Thus, each request arrives enriched to the model and improves the quality of the responses.

and Gaia

Installers and operating modes

GAIA is offered in two variants to suit your hardware and needs: hybrid installer y generic installerThe idea is that you can use it on a laptop with the latest generation Ryzen AI or on a standard Windows PC.

  • Hybrid Mode (Ryzen AI 300 Series): Combines NPU + iGPU to maximize performance and efficiency. In inference workloads, Each unit executes what it does best (for example, quantized operations and specific kernels), achieving faster responses and lower power consumption.
  • Generic Mode (any Windows PC): is the universal way. It uses Ollama as backend to serve LLMs and makes it easy for anyone to start GAIA without special hardware requirements.

A practical detail: Both modes use the LLM web service exposed by Lemonade and communicate with the application via an OpenAI-compatible REST API. This makes integrating GAIA into your workflows (or migrating from legacy tools) incredibly simple.

System requirements and compatibility

For hybrid mode, You will need a computer with AMD Ryzen AI 300 Series processors, in addition to having the appropriate drivers for the Radeon iGPU (e.g., 890M) and NPU. This mode unlocks maximum performance and lowest latency.

As for memory, 16 GB of RAM is recommended as a minimum, with 32GB being a more comfortable figure when working with long contexts or more demanding models. At the operating system level, the focus is on Windows 11 (Home/Pro), although the standard installer is also compatible with Windows 10/11.

If you don't meet those requirements, it's okay: You can install GAIA in generic mode and experiment with local LLMs using your CPU/GPU and Ollama as the backend. The difference will be in performance versus the hybrid option.

Step-by-step installation

The start-up process is straightforward. Download the installer from the official GitHub repository and choose the version that fits your equipment (hybrid for Ryzen AI 300, generic for the rest).

Once you have the file, unzip and run the .exeIf Windows displays a security warning (SmartScreen), go to "More info" and click "Run anyway." Installation usually takes 5 to 10 minutes, depending on your connection.

When finished, you will see two shortcuts on your desktop: GAIA-GUI and GAIA-CLIThe first run may take a little longer, because the necessary models will be downloaded and dependencies. In some cases, the wizard will ask you for a Hugging Face token to download certain LLMs.

  VirtualBox: Complete Guide to Network Types, Uses, and Tricks

If you want to move by console, Open GAIA-CLI and run “gaia-cli -h” to see the available options. The CLI provides fine-grained control over parameters (model, quantization, context, etc.) and allows you to automate tests or integrate it with scripts.

Graphical interface (GUI) and command line (CLI)

The GUI is designed for users who prefer to go fast and without complications: Open GAIA-GUI and start chatting with agents, upload documents, index repositories and leverage the RAG with a couple of clicks.

In the CLI, you'll find complete flexibility. You can select models, adjust quantization, or define context sizes. explicitly. It's ideal for evaluating performance, comparing parameters, and orchestrating GAIA within development pipelines.

Also, thanks to The LLM service is compatible with the OpenAI API, integrating GAIA into existing tools or testing prompts that you already used in other services is a matter of adapting an endpoint and little else.

Available Agents and RAG Technology

Today GAIA includes four agents oriented to different uses, and the team—along with the community—is developing further. Each agent leverages the RAG pipeline to retrieve context from local vector indexes and improve LLM responsiveness.

  • Simple Prompt Completion: Direct interaction with the model for testing and evaluating prompts; perfect for fine-tuning before deployment.
  • Cottages: the conversational chat agent that manages dialogue history and supports more natural conversation.
  • clip: integrates YouTube search and Q&A capabilities; can vectorize external content and use it as context.
  • Joker: a joke generator that humanizes the experience and is used to test output styles.

In combination with RAG, agents can also use external tools, reason and plan tasks, opening the door to interactive and productive workflows without leaving the local environment.

Performance: NPU vs iGPU and Hybrid Mode

The Ryzen AI NPU is designed to inference loads of IA and shines in efficiency and latency. GAIA, starting with the release of Ryzen AI 1.3 software, can deploy Quantized LLMs in hybrid mode, using both NPU and iGPU and assigning to each component the operations it best masters.

What do you gain from this? Faster responses, lower power consumption and a smooth experience even with heavier models or longer contexts. And if your computer doesn't have an NPU, GAIA is still useful in generic mode, with performance tailored to the available hardware.

Advantages of running LLMs locally

The first big benefit is the privacy: No need to send data to external servers, which is critical in sensitive areas or when handling confidential information.

It also highlights the low latencyBy not relying on the network, responses arrive faster and interaction feels immediate, which is key to productivity and good user experiences.

Finally, performance is more predictable. Optimizing for NPU (and iGPU) It allows you to make the most of your computer's hardware, with lower energy consumption and less heat during long sessions.

Uninstallation and maintenance

If you need to uninstall GAIA, the process is very simple. Close all instances (CLI and GUI) to avoid file locks before deleting anything.

  Example: How can I share my Spotify account with my friends and family?

Then delete the GAIA folder in AppData and delete the model folders stored in .cache. Finally, remove the shortcuts from the desktop and you're done.

This manual method makes up for the fact that there is no automatic uninstaller yet. In a few minutes you will have a clean system. and without traces of local models or indexes.

Use cases and industries where it fits

GAIA is especially interesting where privacy is key: healthcare, finance and corporate environments They have a lot to gain by running AI locally and reducing exposure to third parties.

It also adds in scenarios without a stable connection: centers with limited or no Wi-Fi connectivity can run AI workflows without relying on the cloud.

For content creation, customer service and internal assistants, agents with RAG They provide contextualized and consistent answers with your local sources (repositories, documents, videos, etc.).

Comparison with other local solutions

Faced with alternatives such as LM Studio or ChatRTX, GAIA focuses on deep integration with AMD hardware, especially Ryzen AI NPUs, and on a robust RAG pipeline designed to recover and use local knowledge.

Furthermore, the project is open and extensible. You can create your own agents and use cases without dealing with black boxes, and OpenAI-compatible REST API support makes integration with apps existing.

News and momentum for 2025

The team behind GAIA has been incorporating improvements that expand the range of scenarios. Among them, improved support for Tensor Cores NVIDIA which speeds up execution when working with GPUs of that brand in certain flows.

There is also talk of Integration with cloud platforms such as GCP and AWS to facilitate large-scale work and synchronization when you need to combine local/cloud environments in a controlled manner.

Another notable line is the improved ONNX support, which increases interoperability between AI frameworks and makes it easier to move models between platforms. There are even tools for experimenting with quantum AI, opening the door to cutting-edge research and testing.

License, community and roadmap

GAIA is distributed under MIT license and its GitHub repository invites you to collaborate: report issues, propose improvements, and create new agents that cover more real needs.

On the horizon, the roadmap mentions more supported models and architectures, new agents for vertical cases, a possible expansion to others OS and continuous improvements in NPU efficiency.

GAIA brings together everything needed for anyone looking for a serious local setup: Privacy, performance, and an architecture that integrates well with your workflowIf you have a Ryzen AI 300 Series laptop, Hybrid Mode will give you a clear advantage; if not, Generic Mode lets you start today and grow from there.

and Gaia
Related article:
AMD Gaia: All about software for running LLMs locally