The idea and the general considerations of the recording, storage and retrieval of information in DNA molecules belong to Michael Neumann – Soviet scientist and physicist. In 1964, the magazine “Radio” published an article, which describes the technology of the process and data storage device – Neumann oligonucleotides (MNeimON).
In 2012, geneticists at Harvard University was able to encode a draft of a book of 53,400 words, 11 images and one program. They found that in each cubic millimeter of DNA can be stored 5.5 petabytes of data. A year later, the researchers of the European Bioinformatics Institute managed to save, and then completely remove and play about 0.6 megabytes of text and video files: 154 sonnets of Shakespeare, a fragment of the famous speech of Martin Luther King’s “I Have a Dream” 26 seconds long, the scientific work on the structure of DNA by James Watson and Francis Crick, photos headquarters EBI in Hinxton and file describing the data conversion methods. All DNA files reproduced with accuracy ranging between 99.99% and 100%.
Yaniv Erlich (Yaniv Erlich) and his colleague Dean Zelinsky (Dina Zielinski), Research NYGC employee chose six files for encoding and recording in the DNA – computer KolibriOS operating system, a French film in 1896 “Arrival of a Train at La Ciotat Station”, the code 50 -dollarovoy Amazon gift card, a computer virus, the image with “Pioneer” records and research of Claude Shannon in information theory 1948.
Scientists have collected these files into one, and then share the data on binary short lines. With fountain codes are randomly packed row in the “drops” fountain – units and converted combinations 00, 01, 10, 11 in the four nucleotide bases: adenine (A), cytosine (C), guanine (G) and thymine (T ). To then assemble these blocks together, a team of scientists add tags for each “drop”.
Total researchers have generated approximately 72 thousands of DNA strands, each of which contained an approximately 200 bases. They have collected this information in a text file and sent it to San Francisco, where the startup Twist Bioscience, engaged in DNA synthesis, transformed digital data into biological. Two weeks later, the team got Ehrlich tube with DNA molecules.
Using sequencing technology to read DNA strands and special software for the translation of the genetic code back to a binary file, they have successfully restored files. What takes to read and write, scientists do not yet specify. A group of researchers led by Ehrlich, also demonstrated that its algorithm for multiplying DNA sample using the polymerase chain reaction, can generate and accurately restore a virtually unlimited number of copies of the sample, and even copies of copies. Erlich starts the operating system in a virtual machine and playing “Minesweeper”
However, the most impressive features of the algorithm was the ability to accommodate 215 petabytes of data in one gram of DNA – up to 100 times more than has been achieved by other methods and algorithms.
Storage Capacity of DNA data is theoretically limited to two digits for each nucleotide, as well as the biological DNA device. Furthermore, in order to collect and read the recorded fragments include additional information required, which subsequently reduces the capacity of up to 1.8 at nucleotide binary symbols. Algorithm “DNA Fountain” allows you to put in an average of 1.6 bits in each nucleotide – is 60% more than was possible previously, as well as close to the limit of 1.8 bits.
The main obstacle to the widespread adoption of technology is its cost. Researchers have spent 7000 dollars to synthesize DNA and archive 2 megabytes of data, and even 2000, to decipher it. Although DNA sequencing cost is gradually reduced, then its synthesis still costs a round sum. Investors are not willing to invest a ton of money just for the sake of synthesis has fallen in price.
Ehrlich and his team offer a different way of solving the problem: reduce the price of DNA synthesis is possible if the molecules to produce lower quality, and then use the “DNA of the fountain” type coding strategy to fix the molecular errors. The scientific work is published in the journal Science, 3 March 2017