Bio.SeqUtils 모듈 사용¶

Bio.SeqUtils 모듈은 4-1에서 진행했던 GC-contents간단 계산과 서열의 무게 계산, 유전 서열에서 나올 수 있는 모든 아미노산 서열을 정리해서 보여주는 메서드를 포함하고 있다.

from Bio.Seq import Seq
from Bio.SeqUtils import GC

#1. Bio.SeqUtils로 GC-contents 계산
exon_seq = Seq("ATGCAGTAG")
gc_contents = GC(exon_seq) #Bio.SeqUtils로 GC-contents 계산
#Bio.SeqUtils을 사용하지 않았다면, (exon_seq.count("G")+exon_seq.count("C")/len(exon_seq))*100으로 했어야 한다. 

print(gc_contents)

44.44444444444444

#2. Bio.SeqUtils로 서열의 무게 계산하기
# Bio.Alphabet의 IUPAC으로 서열의 종류 파악 -> Bio.SeqUtils의 molecular_weight로 서열의 무게 계산하기
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
from Bio.SeqUtils import molecular_weight

seq1 = Seq("ATGCAGTAG")
seq2 = Seq("ATGCAGTAG", IUPAC.unambiguous_dna)
seq3 = Seq("ATGCAGTAG", IUPAC.protein)

print(molecular_weight(seq1))
print(molecular_weight(seq2))
print(molecular_weight(seq3))

2842.8206999999993
2842.8206999999993
707.7536

Bio.SeqUtils의 six_frame_translations 메서드로 DNA서열에서 가능한 모든 6개의 번역된 서열 (forward로 한 칸씩 밀려가면서 3개, backward로 한 칸씩 밀려가면서 3개) 을 구할 수 있다.

#3. Bio.SeqUtils로 가능한 모든 번역 구하기 
from Bio.Seq import Seq
from Bio.SeqUtils import six_frame_translations

my_seq = Seq("ATGCCTTGAAATGTATAG")
print(six_frame_translations(my_seq))

GC_Frame: a:6 t:6 g:4 c:2 
Sequence: atgccttgaaatgtatag, 18 nt, 33.33 %GC


1/1
  A  L  K  C  I
 C  L  E  M  Y
M  P  *  N  V  *
atgccttgaaatgtatag   33 %
tacggaactttacatatc
G  Q  F  T  Y 
 H  R  S  I  Y  L
  A  K  F  H  I

Bio.SeqUtils의 MeltingTemp메서드로 DNA의 Tm을 계산 할 수 있다. Tm : DNA 이중 나선의 절반이 단일 가닥으로 풀어지는데 필요한 온도

*GC 간 결합이 AT간 결합 보다 결합력이 세기 때문에 GC-contents가 높을수록 Tm값이 올라간다.

#4. calculate melting temperature

from Bio.Seq import Seq
from Bio.SeqUtils import MeltingTemp as mt

my_seq = Seq("AGTCTGGGACGGCGGCGCGGCAATCGCA")
print(mt.Tm_Wallace(my_seq))

96.0

Bio.SeqUtils의 seq1 메소드 : 아미노산 서열 기호를 약자로 변형 seq3 메소드 : 아미노산 서열 약자를 기호로 변형

#5-1. convert aminoacid 1 alphabet 기호 to 3 altphabets 약자 
from Bio.SeqUtils import seq1, seq3

essential_amino_acid_3 = "LeuLysMetValIleThrTrpPhe"
print(seq1(essential_amino_acid_3))

LKMVITWF

#5-2. convert aminoacid 3 alphabets 약자 to 1 altphabet 기호 
from Bio.SeqUtils import seq1, seq3

essential_amino_acid_1 = "LKMVITWF"
print(seq3(essential_amino_acid_1))

LeuLysMetValIleThrTrpPhe

Practice(연습문제)¶

#### 1. 다음 서열을 Sequence 객체로 만들어 대문자로 변환하시오

my_seq = Seq("aagtGACAGggatTG")
print(my_seq.upper())

AAGTGACAGGGATTG

#### 2. 다음 서열을 첫 번째 종결 코돈까지 번역하시오
from Bio.Seq import Seq

my_seq = Seq("AAGTGACAGGGATTG")
my_protein = my_seq.translate(to_stop = True)

print(my_protein)

K

#### 3. 다음 서열의 역상보 서열의 GC와 녹는점(Tm)을 계산하시오
from Bio.Seq import Seq
from Bio.SeqUtils import GC, MeltingTemp

my_seq = Seq("AAGTGACAGGGATTG")
my_reverse_seq = my_seq.reverse_complement()

print(GC(my_reverse_seq))
print(MeltingTemp.Tm_Wallace(my_reverse_seq))

46.666666666666664
44.0

8. Chapter7. Multiple Sequence Alignment (1)	2020.03.20
7. Chapter5. Sequence Record object (0)	2020.03.15
5. Chapter4-1. Gene Sequences - Sequence object (0)	2020.03.10
4. Chapter3. Introduction of the Bioinformatics File Format (0)	2020.03.08
3. Chapter2. Biopython Installation (0)	2020.03.08

Grace's Tech Blog

6. Chapter4-2. Gene Sequences - Sequence object

Bio.SeqUtils 모듈 사용¶

Practice(연습문제)¶

'Data Science > Bioinformatics with Biopython' 카테고리의 다른 글

'Data Science/Bioinformatics with Biopython'의 다른글

티스토리툴바

6. Chapter4-2. Gene Sequences - Sequence object

Bio.SeqUtils 모듈 사용¶

Practice(연습문제)¶

'Data Science > Bioinformatics with Biopython' 카테고리의 다른 글

'Data Science/Bioinformatics with Biopython'의 다른글

관련글

티스토리툴바