Publications
bioRxivMay 2020 DOI:
10.1101/2020.05.25.115477

Content-Based Similarity Search in Large-Scale DNA Data Storage Systems

Bee, Callista; Chen, Yuan-Jyue; Ward, David; Liu, Xiaomeng; Seelig, Georg; Strauss, Karin; Ceze, Luis
Product Used
Oligo Pools
Abstract
Synthetic DNA has the potential to store the world’s continuously growing amount of data in an extremely dense and durable medium. Current proposals for DNA-based digital storage systems include the ability to retrieve individual files by their unique identifier, but not by their content. Here, we demonstrate content-based retrieval from a DNA database by learning a mapping from images to DNA sequences such that an encoded query image will retrieve visually similar images from the database via DNA hybridization. We encoded and synthesized a database of 1.6 million images and queried it with a variety of images, showing that each query retrieves a sample of the database containing visually similar images are retrieved at a rate much greater than chance. We compare our results with several algorithms for similarity search in electronic systems, and demonstrate that our molecular approach is competitive with state-of-the-art electronics.One Sentence SummaryLearned encodings enable content-based image similarity search from a database of 1.6 million images encoded in synthetic DNA.
Product Used
Oligo Pools

Related Publications