Open Access Open Access  Restricted Access Subscription or Fee Access

DNA sequence minimization using Huffman coding algorithm

S. Gomathi, G.Ignisha Rajathi, C.Gopala Krishnan

Abstract


Recently, there is a serious challenge in storing, processing, and transmitting full genomic sequence DNA or RNA in databases due to its growth. Hence it is necessary to maintain genetic data which leads to data compression. For DNA sequences compression methodology assumed from Computer One-Bit Compression method (OBComp), which would compress repeated as well as non-repeated elements/sequences, the existing standard compression tools are insufficient. In direct coding technique, nucleotide is denoted with bits resulting compressed value of 2 bits per byte (bpb). In Computer One-Bit Compression method (OBComp), a single bit 0 or 1 is used. It can code the two highest occurrences nucleotides. These two positions are saved. Additionally, to enrich the compression, Run Length Encoding technique and Huffman coding algorithm is then applied by modifying its version. The suggested technique successfully minimised the initial size of DNA sequences. An easy methodology and the remarkable compression ratio make it interesting

Full Text:

PDF

References


Bharti RK, Singh RK. A biological sequence compression based on look up table (LUT) using complementary palindrome of fixed size. Int J Comput Appl. 2011; 35:0975–8887.

Grumbach S, Tahi F. A new challenge for compression algorithms: genetic sequences. Inf Process Manage 1994; 30:875–86.

Rivals E, Delahaye J-P, Dauchet M, Delgrange O. A guaranteed compression scheme for repetitive DNA sequences. LIFL I University, technical report 1995; IT-285.

Rajarajeswari P, Apparao A. DNABIT compress – genome compression algorithm. Bioinformation 2011; 5:350–60.

Grumbach S, Tahi F. Compression of DNA sequences. In: IEEE Symposium on the Data Compression Conference, DCC-93; Snowbird, UT, 1993:340–50.

Ziv J, Lempel A. A universal algorithm for sequential data compression. IEEE Trans Inf Theory. 1977; IT-23:337.

Saada, B., Zhang, J.: Vertical DNA sequences compression algorithm based on hexadecimal representation. In: Proceedings of the World Congress on Engineering and Computer Science, pp. 21–25. WCECS, San Francisco (2015)

Jahaan, A., Ravi, T., Arokiaraj, S.: A comparative study and survey on existing DNA compression techniques. Int. J. Adv. Res. Comput. Sci. 8, 732–735 (2017)

Kuruppu, S., Puglisi, S.J., Zobel, J.: Reference sequence construction for relative compression of genomes. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 420–425. Springer, Heidelberg (2011).

Majumder, A.B., Gupta, S.: CBSTD: A cloud-based symbol table driven DNA compression algorithm. In: Bhattacharyya, S., Sen, S., Dutta, M., Biswas, P., Chattopadhyay, H. (eds.) Industry Interactive Innovations in Science, Engineering and Technology. LNNS, vol. 11, pp. 467–476. Springer, Singapore (2018).


Refbacks

  • There are currently no refbacks.