Storing image files in mixtures containing custom-synthesized small molecules is a milestone for molecular data storage, researchers report.

In all, the researchers stored more than 200 kilobytes of data, which they say is the most stored to date using small molecules. That’s not a lot of data compared to traditional means of storage, but it is significant progress in terms of small molecule storage, they say.

“The large numbers of unique small molecules, the amount of data we can store, and the reliability of the data readout shows real promise for scaling this up even further,” says coauthor Jacob Rosenstein, an assistant professor in the School of Engineering at Brown University.

More and more data

As the data universe continues to expand, researchers are working to find new and more compact means of storage. By encoding data in molecules, it may be possible to store the equivalent of terabytes of data in just a few millimeters of space.

Most research on molecular storage has focused on long-chain polymers like DNA, well known carriers of biological data. But there are potential advantages to using small molecules as opposed to long polymers. Small molecules are potentially easier and cheaper to produce than synthetic DNA, and in theory have an even higher storage capacity.

The researchers have been working to find ways of making small-molecule data storage feasible and scalable.

To store data, the team uses small metal plates arrayed with 1,500 tiny spots less than a millimeter in diameter. Each spot contains a mixture of molecules. The presence or absence of different molecules in each mixture indicate the digital data. The number of bits in each mixture can be as large as the library of distinct molecules available for mixing. The data can then be read out using a mass spectrometer, which can identify the molecules present in each well.

In a paper from last year, the team showed that they could store image files in the kilobyte range using some common metabolites, the molecules that organisms use to regulate metabolism.

For this new work, the researchers were able to vastly expand the size of their library—and thereby the sizes of the files they could encode—by synthesizing their own molecules.

Scaling up molecular data storage

The team made their molecules using Ugi reactions—a technique often used in the pharmaceutical industry to quickly produce large numbers of different compounds. Ugi reactions combine four broad classes of reagents (an amine, an aldehyde or a ketone, a carboxylic acid, and an isocyanide) into one new molecule.

By using different reagents from each class, the researchers could quickly produce a wide array of distinct molecules. For this work, the team used five different amines, five aldehydes, 12 carboxylic acids, and five isocyanides in different combinations to create 1,500 distinct compounds.

“The advantage here is the potential scalability of the library,” Rubenstein says. “We use just 27 different components to make a 1,500-molecule library in one day. That means we don’t have to go out and find 1,500 unique molecules.”

From there, the team used sub-libraries of compounds to encode their images. They used a 32-compound library to store a binary image of the Egyptian god Anubis. And they used a 575-compound library to encode a 0.88-megapixel Picasso drawing of a violin.

The large number of molecules available for the chemical libraries also enabled the researchers to explore alternate encoding schemes that made the readout of data more robust. While mass spectrometry is highly precise, it’s not perfect. So as with any system used to store or transmit data, this system will need some form of error correction.

“The way we design the libraries and read out the data includes extra information that lets us correct some errors,” says first author Chris Arcadia, a graduate student. “That helped us streamline the experimental workflow and still get accuracy rates as high as 99%.”

There’s still more work to be done to bring this idea up to a useful scale, the researchers say. But the ability to create large chemical libraries and use them for encoding ever larger files suggests the approach can indeed scale up.

“We’re no longer limited by the size of our chemical library, which is really important,” Rosenstein says. “That’s the biggest step forward here. When we started this project a few years ago, we had some debates about whether something of this scale was even experimentally feasible. So it’s really encouraging that we’ve been able to do this.”

The study appears in Nature Communications. Funding for the work came from DARPA and the National Science Foundation.