Sequence object is basic topic of biopython. In this chapter, studying what a Sequence object is, and use it to handle target gene sequence.
For Training, we use the 'TATA Box sequence'.
import Bio
# 1. Create Sequence Object
from Bio.Seq import Seq
tatabox_seq = Seq("tataaaggcAATATGCAGTAG")
print(tatabox_seq)
print(type(tatabox_seq))
There should be information about the Sequence object. We can add an Information about type of this sequence (DNA, RNA or amino acid) using the Alphabet module.
#2. Alpabet Module
from Bio.Alphabet import IUPAC
tatabox_seq = Seq("tataaaggcAATATGCAGTAG", IUPAC.unambiguous_dna)
print(tatabox_seq)
print(type(tatabox_seq))
The IUPAC module contains several objects as well as objects representing DNA.
Now that we have a Sequence object, we can use it with Sequence object methods.
#3. Count Base Number in Sequence
from Bio.Seq import Seq
my_seq = Seq("ATGCAGTAG")
count_a = my_seq.count("A")
print(count_a) #count the number of Adenin base
You can calculate the GC-contents (%), which tells you how much G and C smoke is in the sequence. GC-contents(%) = ((count_C + count_G)/(count_totalbase))*100(%)
#4. Calculate the GC-contents in this Sequence Object
count_c = my_seq.count("C")
count_g = my_seq.count("G")
count_totalbase = len(my_seq)
GC_contents = ((count_c + count_g)/count_totalbase)*100
print(GC_contents)
#5. Converting Sequence Object Upper,Lowercase Letters
tatabox_seq = Seq("tataaggCAATATGCAGTAG")
print(tatabox_seq.upper())
print(tatabox_seq.lower())
DNA is transcribed into mRNA and translated into protein. This is the central principle of molecular biology.
#6. Transcribing and Translating Sequence Objects
my_dna = Seq("ATGCAGTAGACT")
my_mrna = my_dna.transcribe()
my_protein = my_dna.translate()
print(my_mrna)
print(my_protein)
If you see a stop codon while translating to a protein, you should stop translating. There's the way to end the translation at the first stop codon.
#7. Stop Translate
my_mrna = Seq("AUGAACUAAGUUUAGAAU")
my_protein = my_mrna.translate()
my_protein_stop = my_mrna.translate(to_stop = True)
print(my_protein)
print(my_protein_stop)
#8. Split by Stop Translation
my_mrna = Seq("AUGAACUAAGUUUAGAAU")
my_protein = my_mrna.translate()
print(my_protein)
for seq in my_protein.split('*'):
print(seq)
DNA bases are paired with adenine and thymine by double bonds, and guanine and cytosine by triple bonds. This is called a complementary relationship.
#9-1. Create complementary and reverse complementary sequences of DNA sequence in Python
my_dna = "TATAAAGGCAATATGCAGTAG"
comp_dic ={'A':'T', 'T':'A', 'G':'C', 'C':'G'}#Create a dictionary with complementary bases as key-values.
comp_seq = ""
for base in my_dna:
comp_seq += comp_dic[base]
revcomp_seq = comp_seq[::-1]
print(comp_seq)
print(revcomp_seq)
#9-2. Create complementary and reverse complementary sequences of DNA sequence in BioPython
my_dna = Seq("TATAAAGGCAATATGCAGTAG")
comp_seq = my_dna.complement()
revcomp_seq = my_dna.reverse_complement()
print(comp_seq)
print(revcomp_seq)
As a result of DNA transcription, mrna is produced. The translation process reads the three bases of mrna and generates the corresponding amino acids according to codon table. You can print a codon table using BioPython.
#10-1. Standard Codon Table
from Bio.Data import CodonTable
codon_table = CodonTable.unambiguous_dna_by_name["Standard"] #standard codon table
print(codon_table)
#10-2. Mitochondria Codon Table
mito_codon_table =CodonTable.unambiguous_dna_by_name["Vertebrate Mitochondrial"]
print(mito_codon_table)
The ORF is an Open Reading Frame, which is a base that is likely to make a protein, starting with ATG, the start codon, and ending with the stop codon.
Therefore, finding the ORF means finding the sequence between the start codon and the end codon.
#11. Find Open Reading Frame
tatabox_seq = Seq("tataaaggcAATATGCAGTAG")
start_idx = tatabox_seq.find("ATG")
end_idx = tatabox_seq.find("TAG", start_idx) #More than this, there are 'TAA', 'TAG', and 'TGA' in the termination codon.
orf = tatabox_seq[start_idx:end_idx+3] #have to include end_idx
print(orf)
from IPython.core.display import display, HTML
display(HTML("<style>.container {width:90% !important;}</style>"))
'Data Science > Bioinformatics with Biopython' 카테고리의 다른 글
7. Chapter5. Sequence Record object (0) | 2020.03.15 |
---|---|
6. Chapter4-2. Gene Sequences - Sequence object (0) | 2020.03.13 |
4. Chapter3. Introduction of the Bioinformatics File Format (0) | 2020.03.08 |
3. Chapter2. Biopython Installation (0) | 2020.03.08 |
2. Chapter1. Introduction to BioPython (0) | 2020.03.08 |