How to access the NPU in Copilot+ PC: a complete and practical guide

Worldbytes » Windows » Windows 11 » How to Access the NPU in Copilot+ PC: A Complete Guide to Getting the Most Out of AI

Requirements and hardware: 40 TOPS NPU, CPU/GPU/NPU split and supported platforms.
Access and development: Windows ML with ONNX Runtime, automatic EPs (QNN/OpenVINO) and safe fallback.
Models and performance: INT8 quantization, Olive for optimization, and profiling with WPR/WPA/ORT.
Real-world use: Copilot+ productivity, energy efficiency, and privacy with local processing.

NPU on Copilot+ PC

If you're wondering how to get the most out of your Copilot+ PC's NPU, you've come to the right place. In this guide, I explain in detail and in Spanish, what you need to take advantage of it, how to access it from Windows 11 and from your apps, what APIs to use, what model formats are supported, and how to measure performance to ensure that your acceleration is IA works as it should.

In addition to the technical part, You'll see what real-life experiences Copilot+ enables. (productivity, battery life, privacy), how the NPU compares to the CPU and GPU, and what happens if your device doesn't reach the 40 TOPS Microsoft requires for the full Copilot+ experience. This gives you a practical overview of the entire ecosystem without getting lost in unnecessary technicalities.

What is a Copilot+ PC and why the NPU changes the rules?

Copilot+ PC defines a new class of portable and desserts with Windows 11 designed around a high-performance neural processing unit (NPU). These NPUs are specialized for deep learning workloads—real-time translation, image generation/editing, AI-powered video effects—and are capable of exceeding 40 TOPS (trillion operations per second), enabling models to run locally with low latency and very low power consumption.

The key is that The NPU works in tandem with the CPU and GPUWindows 11 allocates each task to the most appropriate resource to balance speed and efficiency: CPU for general logic, GPU for parallelizable graphics/ML, and NPU for sustained AI inference with the best performance per watt. This allocation is what enables smooth AI functions without draining your battery mid-morning.

In the Copilot+ ecosystem, Microsoft delivers native Windows AI experiences and APIs bundled in Windows AI Foundry, with support for optimized models running on the NPU. These capabilities are progressively integrated into modern versions of Windows 11 and the Windows App SDK, reducing friction for developers and end users.

The difference compared to a traditional PC? While a powerful GPU can accelerate AI, the NPU is tuned to sustain long inference loads quietly and efficiently, resulting in improved battery life and consistent responsiveness in background tasks (subtitles live, smart blur, noise suppression, etc.).

AI Acceleration with NPU

Device Requirements and Compatibility

The full Copilot+ experience requires a compatible device.Microsoft has set a benchmark of 40 TPS for the NPU's performance to ensure fluidity and efficiency. This bar is already met by computers with the latest generation of Arm SoCs, and Intel and AMD platforms are gradually adding support for Windows 11.

If you work in a professional environment, Copilot+ variants are available for businesses. (e.g., IT-focused Surface computers) with enterprise-grade security and the same local AI computing benefits. Regardless of the manufacturer, it's essential that your device includes a high-performance NPU and up-to-date firmware/drivers so Windows can enable the correct acceleration paths.

What if your PC doesn't reach 40 TOPS? You'll be able to use AI features, but not everything under the Copilot+ seal.Windows can use a GPU or CPU as an alternative, albeit with higher power consumption and higher latency. This means you'll still get AI, but not the optimized end-to-end experience that distinguishes Copilot+.

Snapdragon X Elite and other platforms: how the workload is divided

The Snapdragon X Elite SoC, Arm-based and manufactured by Qualcomm, embodies this philosophy: it integrates a class-leading NPU capable of processing huge batches of data in parallel with far superior power efficiency than CPU/GPU for AI. In practice, this means more battery life and less heat with everyday AI workloads.

Prevent notification interruptions during recordings or video calls in Windows 11

Windows 11 manages the orchestration between CPU, GPU and NPUWhen you open an app with AI features, the system decides whether to send operations to the NPU (preemptive), the GPU, or the CPU, switching based on the availability and stability of the execution providers. This happens transparently; you just notice that everything runs smoothly and your battery lasts longer.

On other fronts, Intel and AMD also advance with integrated NPUs and compatible execution providers (OpenVINO on Intel, specific EPs on AMD) that Windows can activate via Windows ML. The shared goal is to offer native acceleration while maintaining compatibility with previous hardware and drivers whenever possible.

Exclusive AI features in Copilot+ and available APIs

The Copilot+ PC includes AI experiences built into Windows 11 and accessible via Windows AI Foundry/Windows Runtime APIs for optimized NPU models. This encompasses everything from video call effects (framing, blurring, noise cancellation) to summarization, translation, and local generation capabilities, all designed to run independently of the cloud.

For the developer, The preferred inference path is Windows Machine Learning (Windows ML)Microsoft is migrating recommended access from DirectML to Windows ML to simplify deployment, automatically manage execution providers (EPs), and maintain ONNX Runtime as the underlying inference engine without you having to deal with binaries and dependencies.

Access to the NPU on Copilot+ PC for users and developers

As a user, You can check that your NPU is being used in real time from the Task ManagerOpen Performance and you'll see the NPU graph alongside CPU, GPU, memory, disk, and network. Activate, for example, the webcam's studio effects and you'll see modest, sustained NPU activity—perfect for video calls without draining your battery.

As a developer, The NPU is a hardware resource that you must address. through the appropriate APIs. NPUs are designed for operations neural networks modern (convolutions, activations, attention, etc.), and their access on Windows is now channeled through Windows ML with ONNX Runtime underneath, ensuring the best acceleration path available on each machine.

Programmatic Access with Windows ML: EPs, ORTs, and Fallback

Windows ML introduce Integrated discovery and delivery of execution providers (EP). You no longer need to manually package Qualcomm's QNNExecutionProvider, Intel's OpenVINO EP, or others: Windows includes them or serves them via Windows Update, reducing app size and headaches with dependencies.

Below, ONNX Runtime (ORT) remains the inference engine Open-source that runs your ONNX models. Windows ML abstracts the complexity: it queries the available hardware, selects the ideal EP (QNN if there's a Qualcomm NPU; OpenVINO if applicable; GPU/CPU as a backup), downloads/loads the provider, and launches inference. If the preferred EP fails or is missing, it automatically falls back to another path without breaking your app.

This collaboration is supported by Microsoft's direct work with manufacturers (Qualcomm, Intel, AMD, etc.) to ensure backward driver compatibility and support for new silicon (e.g., Snapdragon X Elite, intel core ultra), allowing you to focus on the experience rather than low-level integration.

Compatible models, quantization, and integration with Olive

Many models are trained in high precisions such as FP32, but most NPUs perform best with smaller integers, typically INT8. That's why the model is often converted or quantized to run on the NPU, increasing performance and efficiency without losing too much quality.

If you don't use an already optimized model, You can bring your own model (BYOM) and run it through the Olive toolchain, which compresses, optimizes, and compiles for ONNX Runtime with NPU acceleration. Olive simplifies steps that previously required scripting and per-EP tuning, and accelerates time-to-production with automatic performance tuning.

How to measure the performance of NPU and AI models

To validate that your integration is flying, you need metrics and tracesWindows offers a powerful set of tools that record NPU activity, measure inference times, and break down bottlenecks by operator, session, or execution provider.

What is opacity or the black box in AI and why does it matter?

Among the key capabilities, you will be able to record a system trace While using your app, you can view NPU and call stack usage, correlate CPU/GPU/NPU workloads, analyze load and initialization times (model loading and ORT session creation), review EP configuration parameters, and profile individual operators to understand their contribution to overall time.

Furthermore, ONNX Runtime events in Windows Performance Analyzer (as of ORT 1.17, with improvements in 1.18.1) allows you to view model load times, EP settings, inference times, specific subcomponents (such as QNNs), and operator profiles. It's a precise snapshot of what your model is doing at each layer.

Recommended tools for diagnosis and profiling

Task Manager: A quick, real-time overview of your system (CPU, memory, disk, network, GPU, and now NPU), with usage percentages, shared memory, driver version, and more. Ideal for verifying that your function actually turns on the NPU.
Windows Performance Recorder (WPR): Now includes a 'Neural Processing' profile that records the Microsoft Compute Driver Model (MCDM) interaction with the NPU. This allows you to identify which processes are using the NPU and which calls are submitting work. This is useful for isolating regressions or EP validations.
Windows Performance Analyzer (WPA): Transforms ETW traces into graphs and time-lapse charts to analyze CPU, disk, network, ORT events, and an NPU-specific chart, all on the same time scale. It's the central tool for correlating phases (pre-fetch/post-fetch) and viewing the big picture of performance.
GPUView: Reads kernel and video events from .etl files and presents them visually. Supports GPU and NPU operations and the visualization of DirectX events for MCDM devices such as the NPU. Very useful if your pipeline mixes graphics and ML.
Qualcomm Snapdragon Profiler (qprof): A system-wide profiling solution that details the NPU's sub-HW (bandwidth, counters), as well as the CPU/GPU/DSP. If you're working on the Snapdragon X Elite, it offers essential signals for fine-tuning.

TOPS Performance: What it means and how to find out your team's performance

The TOPS (trillions of operations per second) quantify how many operations a processor can perform per unit of time in a specific numerical format. Microsoft uses 40 TOPS as a benchmark for Copilot+ certification, which gives you an idea of the type of models and effects you'll be able to execute comfortably locally.

To know the capacity of your NPU, First identify your computer's processor in 'Settings > System > Information'. With that information, search the manufacturer's website for the official TOPS number. If you want more technical comparisons, tools like Procyon AI Benchmark allow you to measure and compare it with other NPUs, although they are more designed for professionals.

Use Task Manager to see the NPU in action

Beyond the theoretical TOPS, displays the actual usage of the NPU Using Task Manager, open 'Performance' and go to the 'NPU' section. You'll see a graph of activity and associated metrics. Enable features like webcam studio effects to confirm your stream is being used by the NPU and not the CPU/GPU.

For an in-depth diagnosis, combines WPR/WPA with the appropriate profilesA typical workflow would be: download the 'ort.wprp' and 'etw_provider.wprp' profiles, start capturing along with the 'NeuralProcessing' and CPU profiles, replay the case, stop capturing, and open the .etl file in WPA.

As an example, from the console you can run: wpr -start ort.wprp -start etw_provider.wprp -start NeuralProcessing -start CPU, play your scenario and finish with wpr -stop onnx_NPU.etl -compress. Then open the file in WPA and check 'Neural Processing: NPU Utilization' and 'Generic Events for ONNX' to cross-reference activity, timing, and threads.

Copilot+ in everyday life: productivity, battery life, and privacy

In everyday life, Copilot+ acts as an integrated personal assistant In Windows: compose emails and documents, summarize long texts, adjust settings, find files, and automate tasks using natural language. The key Copilot Some devices (such as ASUS) offer instant access without having to open menus.

Top 5 Artificial Intelligence Apps for Studying in 2025

The NPU also elevates the multimedia and collaboration experience: real-time background blurring and refocusing, automatic subtitles, intelligent noise suppression, and video enhancements that previously required heavy software. Everything happens locally, smoothly, and quietly thanks to the NPU offloading the CPU/GPU.

Another clear advantage is autonomy: Intelligent power management and NPU maximize battery lifeModels like the ASUS Zenbook A14 have seen extended usage figures that allow you to work all day with AI running in the background without being tied to a charger. The system knows when to push the envelope and when to hold back.

In privacy and security, Running AI on the device reduces cloud dependenceFeatures like voice recognition and personal document scanning can stay on your PC. Devices like the ASUS Vivobook add physical webcam shutters and fingerprint login to complete the protection.

Compatibility, rumors, and the Intel/AMD/Qualcomm case

Part of the current conversation revolves around the 40 TOPS requirements for Copilot+There have been rumors that some desktop CPUs (e.g., upcoming Intel Core Ultra 'Arrow Lake-S Refresh') would maintain NPUs around 13 TOPS, which would make it impossible to meet the Copilot+ label without external help.

That doesn't mean you 'can't use AI': A desktop with a dedicated GPU can supply the power and run heavy AI loads, well exceeding 300 TOPS on INT8 with cards like an RTX 4070. The difference is that Copilot+ prioritizes sustained NPU efficiency for always-on, low-power experiences.

For its part, Qualcomm now offers NPUs aligned with Copilot+ on Arm platforms like the Snapdragon X Elite, and Microsoft is working with Intel and AMD to make Windows ML and EPs (OpenVINO, etc.) work stably on both current and future hardware. The ecosystem is rapidly becoming more standardized.

Useful resources and where to learn more

If you want a clear introduction to key concepts such as CPU, GPU, NPU and TOPSFor Copilot+ productivity, performance, and security, we recommend reviewing training materials and data sheets from manufacturers. For example, you can check out this visual from AMD: download PDF.

For developers, delve deeper into Windows ML and ONNX Runtime to run models locally, taking advantage of the NPU, and if you bring your own model, rely on the Olive chain for quantization and optimization. And don't forget to instrument: WPR/WPA, GPUView, and ORT events, starting with version 1.17, are your allies to go from "just working" to "flying."

If you're coming from the Copilot preview on an older PC (e.g., a Skylake), Yes, the NPU can be used as an accelerator just like the GPU, but better in sustained loadsCopilot+ adds the layer of efficiency and continuity provided by the NPU and expands local experiences; on computers without a 40-TOPS NPU, Windows will attempt to resolve the issue using the CPU/GPU, although autonomy and latency will not be the same.

If you are going to invest in new equipment, Look beyond CPU/GPU and pay attention to the NPU and its TOPSIf you already have Copilot+ on PC, enable AI features, monitor the NPU in Task Manager, and if you're developing, migrate to Windows ML with ORT to ensure the system chooses the best path (QNN/OpenVINO) and you have reliable profiles in WPR/WPA.

Isaac

Passionate writer about the world of bytes and technology in general. I love sharing my knowledge through writing, and that's what I'll do on this blog, show you all the most interesting things about gadgets, software, hardware, tech trends, and more. My goal is to help you navigate the digital world in a simple and entertaining way.