Abstract:
DNA sequencing is the process of determining the sequence of nucleotide bases
(adenine, guanine, cytosine, and thymine) in a piece of DNA that is represented as A, G, C
and T respectively. Today, with the right equipment and materials, sequencing a short
piece of DNA is relatively straightforward. The advent of rapid DNA sequencing
methods has greatly accelerated biological and medical research and discovery.
Knowledge of DNA has become indispensable for basic biological research, and in
numerous applied fields such as medical diagnosis, biotechnology, forensic biology,
virology and biological systematics. The rapid speed of sequencing and searching
attained with modern DNA pattern searching technology has been instrumental.
Nowadays analysis of large biological dataset using searching a pattern from DNA needs
faster and cost effective machines to attain more accurate result within a short time. It is
so much difficult to handle large set of biological data. Searching a specified pattern is
now automated and works faster within a short piece of sequenced DNA. In this paper,
modified version of Clustering and KMP algorithm is used to search a specific pattern in
a large DNA dataset. Overall process also includes the total number of matching pattern
found in DNA dataset.
Description:
This thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering of East West University, Dhaka, Bangladesh.