- The density of DNA allows for up to 455 exabytes per gram, and its preservation, encapsulated in silica and in cold storage, reaches scales of millions of years.
- Binary encoding in A/C/G/T with Reed-Solomon correction makes it possible to store and retrieve data accurately and with error tolerance.
- Experiments (EBI, ETH Zurich, Microsoft+UW) and prototypes such as the DNA “tape” demonstrate automation and hard drive-like operations.
- Costs are still high, but they are falling with genomics; DNA aims for massive, sustainable archiving in the face of data center limitations.
The digital world is constantly growing, and although we talk about "the cloud," that cloud lives on solid ground, inside gigantic data centers. Enormous warehouses with endless corridors, rows of servers, and colossal electricity bills now store our photos, videos, emails, and science data. In this scenario, an idea is gaining traction: using DNA as a medium for information, a tiny molecule with a density of storage overwhelming.
The promise is powerful: encode bits into sequences of A, C, G, and T to save everything from a historical document to a video file, and do so stably for centuries or even decades. more than a million years If it is well preserved. The fact that sparks the imagination is well known: in theory, a single gram of DNA could store up to 455 exabytes of data (455.000 billion GB), a figure that dwarfs current hard drives and silicon memory.
What is DNA storage and why it matters
DNA is the instruction manual for life, and its language is written with four “letters”: adenine (A), cytosine (C), guanine (G), and thymine (T). For computing purposes, we can translate zeros and ones into combinations of these bases to create a synthetic sequence that, when read, retrieves the original file. This translation has proven viable since 2012, when the contents of a megabyte were successfully encoded and read, opening the door to a new paradigm of digital archiving.
The underlying reason is density: by packing information into molecules, the physical space required collapses. In numbers, we talk about those 455 exabytes per gramTo understand it without a calculator: a small test tube could hold everything. Wikipedia together with Facebook, and if we expand the scale, the knowledge of our civilization would occupy only a few cubic meters, nothing to do with the thousands of square meters of data centers.
This vision is not just aesthetic. On a practical level, DNA does not need electricity to remain readable: in cool, dry, and dark conditions, the data persists. We know this from molecular archaeology, which allows us to read genetic material from remains hundreds of thousands of years old. This behavior makes DNA an ideal candidate for very long-term archiving.
Capacity, comparisons, and the challenge of today's data centers
Data centers are the "cathedrals of bits." There are more than 2.000 of them worldwide, and each one occupies an average of about 5 hectares (50.000 m²). If we compare that footprint with the compactness of DNA, the impact is evident. It's not just the land and the building: the cooling and energy consumption are also massive, and the environmental footprint associated is not minor.
In parallel, the data skyrockets. Google processes around 4.650 billion searches daily; YouTube views nearly 4.700 billion videos daily; Facebook receives over 350 million photos every 24 hours; and Twitter sends around 600 million tweets. The global outlook suggests that by 2025, around 463 exabytes of data every day, with a large part of humanity yet to connect to the internet. This is no small feat.
This avalanche is putting pressure on current technologies just as they are approaching physical limits. Backblaze's experience, which monitors 25.000 hard drives in service, provides clues: after four years, nearly 22% of the units show wear and tear or failure. Some last more than a decade, others fail very quickly. The conclusion is simple: hardware Conventional is not eternal and constant replacement has economic and operational costs.
DNA as memory, on the other hand, shifts part of the problem to the chemical and conservation level. With a stable medium and without the need for energy to "keep" the information alive, the deep archive—the one we review very occasionally but which must endure—could change its paradigm and relieve pressure on giant infrastructures.
How to code, preserve, and fix errors
Encoding data in DNA consists of translating bits into bases. A simple scheme maps A and C to "0" and G and T to "1." This mapping writes short fragments that, when combined, reconstruct any digital file. To enhance robustness, error-correcting codes are used. Reed-Solomon, which add intelligent redundancy: if some pieces are damaged, the system can recover the original information.
The big leap in durability was made by the ETH Zurich team, led by Robert Grass and Reinhard Heckel. Inspired by how DNA is preserved in fossils, they encapsulated DNA molecules in silica (glass) spheresWhy glass? Because it's a chemically inert material and protects against the elements that most degrade DNA: primarily water and oxygen.
To accelerate the “timeline,” they subjected the encapsulated DNA to temperatures of 60, 65, and 70°C, simulating decades or centuries of deterioration in a matter of weeks. The stability of the DNA inside the glass was remarkable. Extrapolating, storing it at -18°C would allow information to be preserved for more than a million years, a figure that changes the mental framework of what we understand by a durable archive.
The contrast with a test tube exposed to open air is striking: DNA survives only two or three years before becoming unreadable. By encapsulating it in glass and keeping it in a cool, dry, and dark environment, its survival rate is multiplied. Furthermore, sol-gel technology makes it easier to create this glass "shell" around the molecules, making the process easier. technically affordable from the laboratory.
Experiments and results: from EBI and ETH Zurich to Microsoft and the UW
The proofs keep piling up. In 2012, DNA content was successfully encoded and read, and soon after, the European Bioinformatics Institute (EBI) in England took the idea further: it stored text, images, and audio—including Shakespearean sonnets, excerpts from Martin Luther King’s “I have a dream” speech, an image of the institute itself, and the landmark double helix article—and then retrieved the information with 100% accuracy.
Their methodology combined overlapping fragments, position indexes, and redundancy to ensure reconstruction even if some copies were damaged. The total volume was around 760 KB, and the equivalent DNA was smaller than a speck of dust. At the biosecurity level, they clarified that this synthetic DNA uses a different "code" and cannot accidentally be incorporated into the genome of a living organism; if it enters a body, it would degrade and eliminate without functional space.
Harvard also tested the idea with a book of more than 53.000 words and 11 images, synthesizing thousands of short fragments on a glass chip that were then read using standard sequencing techniques, the same ones we use to study ancient genomes or archaeological samples. This reinforced the idea that the "molecular library" can be consulted with widely available equipment. molecular biology.
Back at ETH Zurich, Grass and his team subjected two historical documents (about 83 KB in total, including the Swiss Federal Pact of 1291 and passages from the Archimedes Palimpsest) to thermal stress. After a week at 60-70°C, the texts were still legible. Their calculations put the durability at about 2.000 years at 10°C, extending to million-year scales if stored at -18°C. The cost at the time was high—around €1.350 to $2.000 per just 83 KB—, but cost trends in genomics are favorable.
Proof of this is the dramatic drop in the price of sequencing a human genome: from several million years ago to just hundreds today. Along these lines, researchers from Microsoft and the University of Washington built the first device that automates the end-to-end process of storing and reading DNA. With it, they encoded the word “hello” and retrieved it, an engineering feat that—although it took around 21 hours for 5 bytes— illustrates that automation is already underway.
Prototypes that look to the future: from DNA cassette tape to industry
A recent proposal, published in Science Advances by a team from the Southern University of Science and Technology, revives a very familiar format: the “cassette tape.” Their device integrates a nylon and polyester membrane with barcode patterns Laser-printed. The white areas house compartments for storing synthetic DNA with coded files; the black stripes act as hydrophobic barriers to prevent mixing.
Each partition has a unique "address," allowing for many-to-many, many-to-many (DMRM) operations. That is, we can store multiple files, retrieve them, delete them, and rewrite them to the same area, emulating the behavior of a HDD but on a molecular support. In figures, a 1.000-meter cassette can hold more than 500.000 partitions and achieve up to 362 petabytes per kilometer, enough to store, according to the authors, several times the content of YouTube in a size smaller than a paperback novel.
This line coexists with other initiatives. Microsoft is also working on Project Silica, which explores quartz as an archival medium: lasers permanently alter the crystal structure, and then machine learning algorithms read those marks. It's not DNA, but it illustrates the search for media. ultra-stable and compact for archiving.
The biotech industry is also pushing. Catalog, a Boston startup, has developed a system for rearranging prefabricated DNA blocks and writing data without having to synthesize them from scratch, on the way to what they call the first "machine" that uses DNA as if it were a physical operating systemIn San Diego, Iridia combines DNA and nanotechnology to build drives capable of operating in parallel, the seed of a “living hard drive.”
From the public sector, IARPA —the US intelligence advanced research projects agency— is promoting the MIST program, whose goal is to write a terabyte of DNA per day and read it at a speed ten times faster. “We want to replace current hard drives with denser, safer, and more resistant molecular media,” they explain, aligned with the idea that silicon is reaching its physical limits.
Density comparisons provide context: a hard drive has around 10^9 bits per cubic centimeter, while DNA reaches 10^18. It's no wonder that some reports—such as the one from the Potomac Institute for Policy Studies—speak about how everything digital on the planet could fit into approximately one kilogram of DNA. This may sound grandiose, but the physical (and biological) basis is solid, and cold, dry storage offers temporary windows which far outperform conventional magnetic and optical technologies.
The question that lies behind achieving stability and density is: what do we save? For Robert Grass himself, the focus should be on selecting “truly important” information that deserves to be archived neutrally for the future. Just as our vision of the Middle Ages depends on what was preserved, a faithful photograph of our time will require criteria, curation, and open standards that will make reading easier centuries from now.
Challenges remain: today, large-scale synthesis and sequencing remains expensive and relatively slow. However, the cost curve in genomics is stubbornly downward, and automation is already demonstrating technical feasibility. Proofreading algorithms, targeting formats, and architectures such as the "molecular cassette tape" point to systems that enable writing and erasing more practical.
As if there weren't enough evidence that DNA is a durable medium, paleogenetics continues to deliver records: DNA has been sequenced from a polar bear dating back some 110.000 years, from a horse dating back some 700.000 years, and 400.000-year-old human mitochondrial DNA has been recovered from the Sima de los Huesos (Spain). Although conditions matter—the cold helps—cases in relatively temperate caves expand the map of royal preservation.
The less glamorous, but crucial, side is that of everyday chemistry: water and oxygen are the greatest enemies. Therefore, encapsulating them in glass and storing them in cold chambers minimizes reactions and chain breaks. On a laboratory scale, creating silica spheres using sol-gel techniques has simplified the process, and experiments with thermal stress show that deterioration follows patterns. predictable, comparable to those observed in fossils.
To put it into perspective, it's worth remembering the contrast in size and cost: the test set with 83 KB documents cost around €1.350/$2.000 a few years ago. It's expensive when you think in terms of terabytes and petabytes, but not so long ago sequencing a human genome cost millions, but today it's around hundreds. If this trend continues, DNA will go from being an experimental medium to a competitive one. mass archiving and “cold” backups.
The union of biology and technology is no longer science fiction. From barcoded DNA "tapes" to molecular libraries that don't require electricity, and alternatives like quartz crystal, the race for a durable and compact medium is on. If one thing seems clear, it's that DNA—with its impossible density and its vocation for survival—is emerging as the prime candidate to store, in a grand way, the digital memory of our species with a temporal robustness that no current album can promise.
Passionate writer about the world of bytes and technology in general. I love sharing my knowledge through writing, and that's what I'll do on this blog, show you all the most interesting things about gadgets, software, hardware, tech trends, and more. My goal is to help you navigate the digital world in a simple and entertaining way.