Reading and Processing Excel with Pandas and OpenPyxl: A Complete Guide

Worldbytes » Microsoft office 365 » How to read and process Excel sheets with pandas and openpyxl: from scratch to pro

Pandas is ideal for processing and transforming large-scale data; OpenPyXL excels at formatting, styling, and workbook control.
Combining both libraries allows you to automate reports: calculations with pandas and layout with openpyxl.
Optimize performance by reading only necessary columns and using read_only/write_only modes when appropriate.

Working with Excel in Python

If you are involved in data analysis or need to automate repetitive tasks in spreadsheets, combining Python with Excel is a winning move to speed up your workflowExcel is still the most popular tool in many companies, and learn how to prevent Excel from converting numbers to dates, while Python gives you power, flexibility, and an ecosystem of libraries designed for data. In this guide, you'll see, in detail, how to read and process Excel sheets with pandas y openpyxl, when to use each one and how to get the most out of them in real-life situations.

Beyond opening a file and looking at a couple of cells, here you will learn how to load specific sheets and ranges, filter, transform, and save results, forma tear cells with advanced styles, create new workbooks and sheets, generate automated reports, and even create charts or small dashboards. We'll go from the basics to practical examples, with code ready to adapt, and with performance recommendations and best practices to avoid bottlenecks and typical errors.

Preparing the environment and necessary libraries

Before you get started, make sure you have a recent version of Python installed; it is recommended Python 3.7 or higher to ensure compatibility with the libraries we will be using. To check your version, you can run the following command in the terminal.

python --version

To manipulate Excel in Python, the key libraries you're going to use are pandas y openpyxl; each one covers different needs. With pip You can install them in a moment and start experimenting.

pip install pandas openpyxl

If you prefer to manage dependencies with a manager like Poetry, you can also install both packages with commands simple, for example poetry add pandas y poetry add openpyxl, which helps you maintain a reproducible environment per project without headaches.

Installing pandas and openpyxl

Reading books, sheets and cells with openpyxl

Library openpyxl Works directly on .xlsx files, allowing you to open workbooks, manipulate sheets, and read/write cells accurately. It's ideal when you need fine format control, apply styles, formulas and work with the Excel structure as such.

from openpyxl import load_workbook

# Cargar un archivo Excel
workbook = load_workbook("example.xlsx")

# Ver nombres de hojas disponibles
print(workbook.sheetnames)

Once the workbook is open, you can select a sheet by name and view specific values. This approach is useful when you want to inspect point cells or traverse ranges without converting everything to a tabular structure.

# Seleccionar una hoja concreta
sheet = workbook

# Leer el valor de una celda
valor = sheet.value
print(f"Valor de A1: {valor}")

To iterate through rows or ranges, iter_rows is your ally. You can delimit rows and columns and process each cell. If you only read, enabling read-only mode reduces memory and improves speed on large files.

# Recorrer las primeras 10 filas
for fila in sheet.iter_rows(min_row=1, max_row=10):
    valores = 
    print(" ".join(valores))

Reading data with pandas

Read and write data efficiently with pandas

If your goal is data analysis and manipulationPandas is your best bet. It turns Excel spreadsheets into DataFrames (very powerful tables) to filter, aggregate, transform, and export results faster than with manual loops.

import pandas as pd

# Leer un Excel a DataFrame
df = pd.read_excel("datos.xlsx")

# Ver las primeras filas
print(df.head())

The function read_excel allows you to load specific sheets, specific columns, or skip initial rows (very useful for files with complex headers or notes). This gives you control and performance because you avoid bringing data that you don't need.

# Hoja específica
df = pd.read_excel("datos.xlsx", sheet_name="Hoja2")

# Importar solo ciertas columnas (por etiqueta de Excel o nombre de columna)
df = pd.read_excel("datos.xlsx", usecols=)  # Por letras

# Omitir filas del principio
df = pd.read_excel("datos.xlsx", skiprows=4)

When you're done transforming your DataFrame, you can transfer it to Excel with a single method. index=False Avoid writing the index as an additional column, something very common when preparing business reports.

# Guardar el DataFrame en Excel
 df.to_excel("datos_procesados.xlsx", index=False)

Advanced formatting with openpyxl

When to use pandas and when to use openpyxl

Although they complement each other, they do not address the same problem: Pandas shines at mass processing (filtering, aggregations, joins, cleaning), while openpyxl rules the format (styles, borders, widths, formulas, charts, sheet creation/deletion, etc.). Choosing wisely saves you time.

Extract Text from PDF Files with OCR in Windows 11: A Complete Guide

If you need to modify thousands of cells with a simple rule (for example, adding 10% to a column), with pandas you will do it in one line; in openpyxl you will need traverse cells with loops and manage references. However, if you want to apply corporate formatting, styles, or add charts to the final Excel spreadsheet, openpyxl is the way to go.

A very useful strategy is to combine both: process with pandas and, once the final table is generated, use openpyxl to polish the professional finish (bold, centered headers, colors, number formats, etc.). This way, you have performance and a result ready to present.

Common operations with pandas: selection, filtering, and modifications

With pandas, column selection and conditional filtering are a piece of cake. This allows you to transform large datasets with vectorized operations without writing loops, achieving more readable and faster code.

import pandas as pd

archivo_excel = "ejemplo_excel.xlsx"
df = pd.read_excel(archivo_excel)

# Selección de columnas
df_col = df]

# Filtrado por condición
filtrado = df > 10]

# Nuevas columnas y transformaciones
df = df * 2
df = df.apply(lambda x: x + 5)

# Guardar resultado
df.to_excel("resultado_excel.xlsx", index=False)

To inspect only one part, you can use .head() or indexing .iloc when you need rows/columns by position. Additionally, when exporting, pandas supports Multiple formats (CSV, Parquet, etc.), which opens up options beyond Excel. It's also common to complement Pandas with guides on Arithmetic operations in Excel when migrating logic between both environments.

Reading and editing with openpyxl: from cells to ranges

If the focus is the Excel document itself, openpyxl allows you to create workbooks, add sheets, rename them, and delete unnecessary ones. This detailed control is essential when you need to adapt to the layout of a template or maintain existing formulas.

from openpyxl import Workbook, load_workbook

# Crear un libro nuevo
ewb = Workbook()
ws = ewb.active
ws.title = "Hoja Principal"
ewb.save("nuevo.xlsx")

# Cargar y manipular un libro existente
wb = load_workbook("datos_openpyxl.xlsx")
wb.create_sheet("Nueva Hoja")
del wb
wb.save("datos_openpyxl.xlsx")

Accessing individual cells or ranges is straightforward. You can also read, change values, and write Back again. For massive changes, using ranges and having a clear Excel structure will make your work easier.

ws = wb
celda = ws
print(celda.value)

# Modificar valores
ws = "Nuevo Nombre"

# Recorrer un rango
for fila in ws:
    for c in fila:
        print(c.value)

Apply styles, formats, and numbers with OpenPyxl

One of the advantages of openpyxl is that you can dress up your report with fonts, borders, paddings and alignments, as well as numeric formats (such as two decimal places). This is key for reports that non-technical people will see.

from openpyxl.styles import Font, Border, Side, PatternFill, Alignment

ws = wb

# Estilos
fuente = Font(name="Arial", size=12, bold=True, color="FF000000")
borde = Border(left=Side(style="thin"), right=Side(style="thin"),
               top=Side(style="thin"), bottom=Side(style="thin"))
relleno = PatternFill(start_color="FFFF0000", end_color="FFFF0000", fill_type="solid")

# Aplicar a una celda
c = ws
c.font = fuente
c.border = borde
c.fill = relleno
c.alignment = Alignment(horizontal="center", vertical="center")
c.number_format = "0.00"  # Dos decimales

wb.save("datos_openpyxl.xlsx")

You can also enter excel formulas directly into the cells, which will be recalculated when the file is opened in Excel. Keep in mind the Most common errors in Excel formulas, which often appear when mixing code-generated data and spreadsheet logic.

ws.value = "=SUM(A1:B1)"
wb.save("datos_openpyxl.xlsx")

Simple charts and visualizations in Excel with OpenPyxl

To complete a report, you sometimes need to include a chart within the workbook itself. With openpyxl you can create bar, line, or other charts from ranges of data and place them in a specific location.

from openpyxl.chart import BarChart, Reference

chart = BarChart()
datos = Reference(ws, min_col=1, min_row=1, max_col=2, max_row=5)
chart.add_data(datos, titles_from_data=False)
ws.add_chart(chart, "E1")
wb.save("datos_openpyxl.xlsx")

Report Automation: Combining Pandas and OpenPyxl

A very useful recipe is to process data with pandas (totals, averages, groupings) and dump the results into a new workbook, which you then decorate with openpyxl to deliver a well-formatted reportThis pattern scales well for periodic reports.

import pandas as pd
from openpyxl import Workbook
from openpyxl.styles import Font, Alignment

# Leer ventas
ventas = pd.read_excel("ventas.xlsx")

# Agregaciones
por_producto = ventas.groupby("Producto").sum()
promedio = ventas.mean()

# Crear libro de reporte
wb = Workbook()
ws = wb.active
ws.title = "Reporte de Ventas"

# Cabeceras
ws = "Producto"
ws = "Total Ventas"
ws = "Promedio Ventas Mensual"

enc = Font(bold=True)
for celda in ("A1", "B1", "C1"):
    ws.font = enc
    ws.alignment = Alignment(horizontal="center")

# Datos
fila = 2
for producto, total in por_producto.items():
    ws = producto
    ws = total
    ws = promedio
    fila += 1

wb.save("reporte_ventas.xlsx")

If you also want to provide visual coherence, add thin borders and number formats to the amount columns. This way, your report is ready to be shared without any manual tweaking in Excel; and if you need to automate the filling, you can rely on techniques to automatically fill in data based on patterns.

Version control in collaborative documents in Office

Typical processes: filtering, transforming, and saving with Pandas

A common case is to filter by a condition and save the result to a new file. This is done in a few lines with pandas and results in ideal for cleaning pipelines or data preparation for business teams.

import pandas as pd

df = pd.read_excel("example.xlsx")

# Filtrar ventas > 1000
filtrado = df > 1000]

# Guardar sin índice
filtrado.to_excel("filtered.xlsx", index=False)
print("Archivo guardado")

If you need to keep an original column and create a modified one (for example, add 10% to "Total Area"), the vectorized operation avoids loops and leaves you with a DataFrame clean and easy to inspect.

df = pd.read_excel("cultivos.xlsx")

df = df
df = df * 1.1

# Mostrar primeras 10 filas a partir de la tercera columna
print(df.iloc)

If you also prefer to temporarily hide results in the final file instead of deleting them, consider rules for hide rows based on cell value and facilitate team review.

Deleting rows and cleaning data

Another common task is deleting records by position or condition. With Pandas, deleting the first 10 rows is straightforward using the index; for more complex filters, use Boolean expressions without for-loops.

df = pd.read_excel("cultivos.xlsx")

# Quitar las 10 primeras filas por índice
df.drop(df.index, inplace=True)
print(df.head(10))

If your cleanup depends on rules like "delete rows with even Total Area", you can build a condition and apply it, keeping the code expressive and maintainable against loops and manual counters. And if the sheet is protected, remember how Unprotect an Excel sheet with a password before modifying it.

Editing and deleting with openpyxl: cells and rows

When working with OpenPyxl, modifying an entire column involves looping through cells. The advantage is that you can insert new columns, respect the original layout and save to another file without breaking the format.

from openpyxl import load_workbook
from openpyxl.utils import get_column_letter

wb = load_workbook("cultivos.xlsx")
ws = wb

# Insertar columna G y titularla
ws.insert_cols(7)
ws = "Area Total Modificada"

# Sumar 10% a la columna F y guardar antiguo valor en G
for fila in ws.iter_rows(min_row=2):
    valor_antiguo = None
    for celda in fila:
        col = get_column_letter(celda.column)
        if col == "F":
            valor_antiguo = celda.value
            celda.value = float(celda.value) * 1.1
        if col == "G":
            celda.value = valor_antiguo

wb.save("cultivos_modify.xlsx")

To delete rows by position, openpyxl also solves it with a call, although for advanced conditions You will have to iterate and decide what to delete based on the contents of each row.

wb = load_workbook("cultivos.xlsx")
ws = wb

# Eliminar las 10 primeras filas (ajusta idx si hay cabeceras)
ws.delete_rows(idx=1, amount=10)
wb.save("cultivos_modify.xlsx")

Consolidate multiple Excel files into one

A classic: you have dozens of .xlsx files with the same schema and you want to join them into a single table to analyze them. With pandas and glob, you can do it in a couple of lines, without manually opening each file.

import pandas as pd
import glob

excel_files = glob.glob("*.xlsx")

# Concatenar todo en un DataFrame
todos = pd.concat(, ignore_index=True)

todos.to_excel("consolidated_data.xlsx", index=False)

This approach is perfect for monthly integrations, multi-delegation reporting, or any process where you receive multiple books with a homogeneous structure.

Generate reports by department automatically

From a file with global sales, you can segment by department and generate a personalized report for each one automatically. Each file is ready to be shared with your team.

import pandas as pd

sales = pd.read_excel("sales_data.xlsx")
departamentos = sales.unique()

for dpto in departamentos:
    df_dpto = sales == dpto]
    df_dpto.to_excel(f"{dpto}_report.xlsx", index=False)

If you also need to format each report, you can load the resulting file with openpyxl and styling headers, adjust column widths, and add a corporate color. You can also automate file organization and create cascading folders and subfolders for each department.

Ultimate Guide to Using Netsh on Windows 11: Advanced Commands and Tricks

Small interactive dashboard with Tkinter and pandas

For rapid prototyping, you can create a simple window that displays columns and calculates the average of the selected one. It's not a complete BI, but it works for express validations without leaving Python.

import tkinter as tk
import pandas as pd
from tkinter import messagebox

file = "data.xlsx"
data = pd.read_excel(file)

def calcular_media():
    col = listbox.get(listbox.curselection())
    media = data.mean()
    messagebox.showinfo("Resultado", f"Promedio en {col}: {media:.2f}")

root = tk.Tk()
root.title("Dashboard interactivo")

listbox = tk.Listbox(root)
listbox.pack()
for c in data.columns:
    listbox.insert(tk.END, c)

btn = tk.Button(root, text="Calcular promedio", command=calcular_media)
btn.pack()

root.mainloop()

For more complex reporting projects you may want to move this to a web app with Streamlit or Dash, but for a quick local utility, Tkinter can get you out of a bind with very little code.

Exploratory Analysis: Statistics and Graphics with Pandas + Matplotlib

When your Excel file contains customer or sales information, it's a good idea to take a quick look at distributions and relationships. With Pandas, you can obtain descriptive statistics, and with Matplotlib, generate histograms and scatter plots very useful.

import pandas as pd
import matplotlib.pyplot as plt

clientes = pd.read_excel("clientes.xlsx")

# Estadísticas generales
print(clientes.describe())

# Histograma de edades
clientes.hist(bins=20)
plt.xlabel("Edad")
plt.ylabel("Frecuencia")
plt.title("Distribución de edades")
plt.show()

# Dispersión ingresos vs satisfacción
clientes.plot.scatter(x="Ingresos", y="Satisfaccion")
plt.xlabel("Ingresos")
plt.ylabel("Satisfacción")
plt.title("Ingresos vs. Satisfacción")
plt.show()

With this, you can detect outliers, biases, or interesting relationships to dig deeper into later. If you want to report it in Excel, export summary tables with pandas and create in-book graphics with openpyxl for a well-rounded deliverable. For specific statistical analyses, you can also consult the quartile function in excel as a reference when migrating indicators.

Saving results with pandas and openpyxl

With Pandas, writing to Excel is instant; plus, you have the advantage of exporting to other formats like CSV or Parquet. If the file is for business, add leaves with different details or data filtered by segments in the same book.

df = pd.read_excel("cultivos.xlsx")

df = "SI"
df.to_excel("cultivos_modify_pandas.xlsx", index=False)

If you're working with openpyxl, remember that you can create new columns and fill them by range. It's a straightforward way to mark records reviewed in reports with a fixed structure.

from openpyxl import load_workbook

wb = load_workbook("cultivos.xlsx")
ws = wb

ws = "Revisado"
for celda in ws:
    celda.value = "SI"

wb.save("cultivos_modify_openpyxl.xlsx")

Performance and good practices

For large files, it's a good idea to limit the amount of data you fetch into memory. In pandas, read only the necessary columns with usecols and avoid processing irrelevant sheets. In openpyxl, use read_only=True to read and write_only=True to write.

Watch out for data types: Sometimes Excel saves numbers as text, and that can break filters and aggregations. When loading with pandas, you can specify dtypes or normalize columns after reading to avoid surprises with dates, amounts or IDs.

With Excel formulas, remember that not all functions are supported equally by libraries; most likely are recalculated when opened in ExcelIf you need fixed values, evaluate with pandas first and write back pre-calculated numbers.

Regarding compatibility, prioritize files . Xlsx (modern format) and test your scripts with different versions of Excel when your users have heterogeneous environments. This reduces issues due to functions or unsupported features in older versions.

As a final note, think about reproducible pipelines: set dependencies (Poetry, requirements.txt), document your parameters (sheet_name, usecols, skiprows) and add logs minimum for debugging. This will save you time when scaling your project or changing your data source.

If you've made it this far, you've already mastered reading, processing, and writing Excel with pandas and openpyxl, you know when each tool is appropriate and how to combine them for professional reports. From now on, you'll see your spreadsheets as sources and destinations within a robust, automated, and much faster workflow.

How to open CSV files in Excel with UTF-8 encoding and avoid errors

Isaac

Passionate writer about the world of bytes and technology in general. I love sharing my knowledge through writing, and that's what I'll do on this blog, show you all the most interesting things about gadgets, software, hardware, tech trends, and more. My goal is to help you navigate the digital world in a simple and entertaining way.