《常用生物数据库及数据格式.ppt》由会员分享,可在线阅读,更多相关《常用生物数据库及数据格式.ppt(96页珍藏版)》请在三一办公上搜索。
1、常用生物数据库和数据格式,2,基本内容,生物数据库相关背景常用数据格式fasta,fastq,gff,GenBank 常用序列数据库美国国立生物技术信息中心(NCBI)欧洲生物信息学中心(EBI)DDBJ常用基因功能数据库基因本体数据库(Gene Ontology)京东基因与基因组百科全书(KEGG)Interpro蛋白功能数据库常用基因组数据库UCSC基因组浏览器Ensembl基因组注释数据库,3,4,Sequence,Interpro,5,?,数据多,数据格式多,数据库也多。,如何找到我们想要的数据库呢?,6,最新生物数据库列表(Nucleic Acids Research),7,基本内容
2、,生物数据库相关背景常用数据格式fasta,fastq,gff,GenBank 常用序列数据库美国国立生物技术信息中心(NCBI)欧洲生物信息学中心(EBI)DDBJ常用基因功能数据库基因本体数据库(Gene Ontology)京东基因与基因组百科全书(KEGG)Interpro蛋白功能数据库常用基因组数据库UCSC基因组浏览器Ensembl基因组注释数据库,常见数据格式,8,常见数据格式,FASTA formatFASTQ formatGenBank formatEMBL formatGFF format,9,FASTA format,10,描述行“”分隔符一般50-100个字符每行没有标准
3、的扩展名,FASTQ sequence format,11,与fasta格式类似一条序列一般占用四行序列和质量值各占一行,12/88,GenBank File Format,LOCUS SCU49845 5028 bp DNA linear PLN 21-JUN-1999 DEFINITION Saccharomyces cerevisiae TCP1-beta gene,partial cds;and Axl2p(AXL2)and Rev7p(REV7)genes,complete cds.ACCESSION U49845 VERSION U49845.1 GI:1293613 KEYWOR
4、DS.SOURCE Saccharomyces cerevisiae(bakers yeast)ORGANISM Saccharomyces cerevisiae Eukaryota;Fungi;Ascomycota;Saccharomycotina;Saccharomycetes;Saccharomycetales;Saccharomycetaceae;Saccharomyces.REFERENCE 1(bases 1 to 5028)AUTHORS Torpey,L.E.,Gibbs,P.E.,Nelson,J.and Lawrence,C.W.TITLE Cloning and sequ
5、ence of REV7,a gene whose function is required for DNA damage-induced mutagenesis in Saccharomyces cerevisiae JOURNAL Yeast 10(11),1503-1509(1994)PUBMED 7871890.FEATURES Location/Qualifiers CDS 1.206/codon_start=3/product=TCP1-beta/protein_id=AAA98665.1/db_xref=GI:1293614/translation=SSIYNGISTSGLDLN
6、NGTIADMRQLGIVESYKLKRAVVSSASEA AEVLLRVDNIIRARPRTANRQHM gene 687.3158/gene=AXL2.ORIGIN 1 gatcctccat atacaacggt atctccacct caggtttaga tctcaacaac ggaaccattg 61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtagtcagct.4981 tgccatgact cagattctaa ttttaagcta ttcaatttct ctttgatc/,13/88,GBFF文件分为三部分:,
7、LOCUS SCU49845 5028 bp DNA linear PLN 21-JUN-1999 DEFINITION Saccharomyces cerevisiae TCP1-beta gene,partial cds;and Axl2p(AXL2)and Rev7p(REV7)genes,complete cds.ACCESSION U49845 VERSION U49845.1 GI:1293613 KEYWORDS.SOURCE Saccharomyces cerevisiae(bakers yeast)ORGANISM Saccharomyces cerevisiae Eukar
8、yota;Fungi;Ascomycota;Saccharomycotina;Saccharomycetes;Saccharomycetales;Saccharomycetaceae;Saccharomyces.REFERENCE 1(bases 1 to 5028)AUTHORS Torpey,L.E.,Gibbs,P.E.,Nelson,J.and Lawrence,C.W.TITLE Cloning and sequence of REV7,a gene whose function is required for DNA damage-induced mutagenesis in Sa
9、ccharomyces cerevisiae JOURNAL Yeast 10(11),1503-1509(1994)PUBMED 7871890 FEATURES Location/Qualifiers CDS 1.206/codon_start=3/product=TCP1-beta/protein_id=AAA98665.1/db_xref=GI:1293614/translation=SSIYNGISTSGLDLNNGTIADMRQLGIVESYKLKRAVVSSASEA AEVLLRVDNIIRARPRTANRQHM gene 687.3158/gene=AXL2.ORIGIN 1
10、gatcctccat atacaacggt atctccacct caggtttaga tctcaacaac ggaaccattg 61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtagtcagct.4981 tgccatgact cagattctaa ttttaagcta ttcaatttct ctttgatc/,头部包含整个记录的信息(描述符),第二部分包含了注释这一记录的特性,第三部分是核苷酸序列本身,所有序列数据库记录都在最后一行以“/”结尾,14/88,GBFF格式说明,LOCUS SCU49845 5028 bp
11、 DNA linear PLN 21-JUN-1999 DEFINITION Saccharomyces cerevisiae TCP1-beta gene,partial cds;and Axl2p(AXL2)and Rev7p(REV7)genes,complete cds.ACCESSION U49845 VERSION U49845.1 GI:1293613 KEYWORDS.SOURCE Saccharomyces cerevisiae(bakers yeast)ORGANISM Saccharomyces cerevisiae Eukaryota;Fungi;Ascomycota;
12、Saccharomycotina;Saccharomycetes;Saccharomycetales;Saccharomycetaceae;Saccharomyces.REFERENCE 1(bases 1 to 5028)AUTHORS Torpey,L.E.,Gibbs,P.E.,Nelson,J.and Lawrence,C.W.TITLE Cloning and sequence of REV7,a gene whose function is required for DNA damage-induced mutagenesis in Saccharomyces cerevisiae
13、 JOURNAL Yeast 10(11),1503-1509(1994)PUBMED 7871890.FEATURES Location/Qualifiers CDS 1.206/codon_start=3/product=TCP1-beta/protein_id=AAA98665.1/db_xref=GI:1293614/translation=SSIYNGISTSGLDLNNGTIADMRQLGIVESYKLKRAVVSSASEA AEVLLRVDNIIRARPRTANRQHM gene 687.3158/gene=AXL2.ORIGIN 1 gatcctccat atacaacggt
14、atctccacct caggtttaga tctcaacaac ggaaccattg 61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtagtcagct.4981 tgccatgact cagattctaa ttttaagcta ttcaatttct ctttgatc/,15/88,GBFF:locus,LOCUS SCU49845 5028 bp DNA linear PLN 21-JUN-1999 DEFINITION Saccharomyces cerevisiae TCP1-beta gene,partial cd
15、s;and Axl2p(AXL2)and Rev7p(REV7)genes,complete cds.ACCESSION U49845 VERSION U49845.1 GI:1293613 KEYWORDS.SOURCE Saccharomyces cerevisiae(bakers yeast)ORGANISM Saccharomyces cerevisiae Eukaryota;Fungi;Ascomycota;Saccharomycotina;Saccharomycetes;Saccharomycetales;Saccharomycetaceae;Saccharomyces.REFER
16、ENCE 1(bases 1 to 5028)AUTHORS Torpey,L.E.,Gibbs,P.E.,Nelson,J.and Lawrence,C.W.TITLE Cloning and sequence of REV7,a gene whose function is required for DNA damage-induced mutagenesis in Saccharomyces cerevisiae JOURNAL Yeast 10(11),1503-1509(1994)PUBMED 7871890.FEATURES Location/Qualifiers CDS 1.20
17、6/codon_start=3/product=TCP1-beta/protein_id=AAA98665.1/db_xref=GI:1293614/translation=SSIYNGISTSGLDLNNGTIADMRQLGIVESYKLKRAVVSSASEA AEVLLRVDNIIRARPRTANRQHM gene 687.3158/gene=AXL2.ORIGIN 1 gatcctccat atacaacggt atctccacct caggtttaga tctcaacaac ggaaccattg 61 ccgacatgag acagttaggt atcgtcgaga gttacaagc
18、t aaaacgagca gtagtcagct.4981 tgccatgact cagattctaa ttttaagcta ttcaatttct ctttgatc/,16/88,GBFF:locus,LOCUS SCU49845 5028 bp DNA linear PLN 21-JUN-1999 DEFINITION Saccharomyces cerevisiae TCP1-beta gene,partial cds;and Axl2p(AXL2)and Rev7p(REV7)genes,complete cds.ACCESSION U49845 VERSION U49845.1 GI:1
19、293613 KEYWORDS.SOURCE Saccharomyces cerevisiae(bakers yeast)ORGANISM Saccharomyces cerevisiae Eukaryota;Fungi;Ascomycota;Saccharomycotina;Saccharomycetes;Saccharomycetales;Saccharomycetaceae;Saccharomyces.REFERENCE 1(bases 1 to 5028)AUTHORS Torpey,L.E.,Gibbs,P.E.,Nelson,J.and Lawrence,C.W.TITLE Clo
20、ning and sequence of REV7,a gene whose function is required for DNA damage-induced mutagenesis in Saccharomyces cerevisiae JOURNAL Yeast 10(11),1503-1509(1994)PUBMED 7871890.FEATURES Location/Qualifiers CDS 1.206/codon_start=3/product=TCP1-beta/protein_id=AAA98665.1/db_xref=GI:1293614/translation=SS
21、IYNGISTSGLDLNNGTIADMRQLGIVESYKLKRAVVSSASEA AEVLLRVDNIIRARPRTANRQHM gene 687.3158/gene=AXL2.ORIGIN 1 gatcctccat atacaacggt atctccacct caggtttaga tctcaacaac ggaaccattg 61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtagtcagct.4981 tgccatgact cagattctaa ttttaagcta ttcaatttct ctttgatc/,所有GBFF
22、都起始于LOCUS行第一项是LOCUS名称(SCU49845)现在唯一的作用是它在数据库中是独一无二的,已不再具有任何实际意义。大多数情况下,它仅使用检索号码(accesession number)以满足对LOCUS名称的要求。第二项是序列长度(5028 bp)。规定单条数据库记录的长度不能超过350kb。除历史原因外,GenBank已经很少接受长度低于50bp的序列了。第三项表明分子类型(DNA),其序列必须是一种单一的分子类型第四项是GenBank分类码(PLN),由3个字母组成。现在其作用仅限于在下载数据库时对数据库作简单的分类。最后一项是其最后修订日期。(21-JUN-1999)。有时
23、也仅表示是数据首次公开日期。,17/88,GenBank分类码,back,18/88,GBFF:definition,LOCUS SCU49845 5028 bp DNA linear PLN 21-JUN-1999 DEFINITION Saccharomyces cerevisiae TCP1-beta gene,partial cds;and Axl2p(AXL2)and Rev7p(REV7)genes,complete cds.ACCESSION U49845 VERSION U49845.1 GI:1293613 KEYWORDS.SOURCE Saccharomyces cere
24、visiae(bakers yeast)ORGANISM Saccharomyces cerevisiae Eukaryota;Fungi;Ascomycota;Saccharomycotina;Saccharomycetes;Saccharomycetales;Saccharomycetaceae;Saccharomyces.REFERENCE 1(bases 1 to 5028)AUTHORS Torpey,L.E.,Gibbs,P.E.,Nelson,J.and Lawrence,C.W.TITLE Cloning and sequence of REV7,a gene whose fu
25、nction is required for DNA damage-induced mutagenesis in Saccharomyces cerevisiae JOURNAL Yeast 10(11),1503-1509(1994)PUBMED 7871890.FEATURES Location/Qualifiers CDS 1.206/codon_start=3/product=TCP1-beta/protein_id=AAA98665.1/db_xref=GI:1293614/translation=SSIYNGISTSGLDLNNGTIADMRQLGIVESYKLKRAVVSSASE
26、A AEVLLRVDNIIRARPRTANRQHM gene 687.3158/gene=AXL2.ORIGIN 1 gatcctccat atacaacggt atctccacct caggtttaga tctcaacaac ggaaccattg 61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtagtcagct.4981 tgccatgact cagattctaa ttttaagcta ttcaatttct ctttgatc/,19/88,GBFF:definition,LOCUS SCU49845 5028 bp DN
27、A linear PLN 21-JUN-1999 DEFINITION Saccharomyces cerevisiae TCP1-beta gene,partial cds;and Axl2p(AXL2)and Rev7p(REV7)genes,complete cds.ACCESSION U49845 VERSION U49845.1 GI:1293613 KEYWORDS.SOURCE Saccharomyces cerevisiae(bakers yeast)ORGANISM Saccharomyces cerevisiae Eukaryota;Fungi;Ascomycota;Sac
28、charomycotina;Saccharomycetes;Saccharomycetales;Saccharomycetaceae;Saccharomyces.REFERENCE 1(bases 1 to 5028)AUTHORS Torpey,L.E.,Gibbs,P.E.,Nelson,J.and Lawrence,C.W.TITLE Cloning and sequence of REV7,a gene whose function is required for DNA damage-induced mutagenesis in Saccharomyces cerevisiae JO
29、URNAL Yeast 10(11),1503-1509(1994)PUBMED 7871890.FEATURES Location/Qualifiers CDS 1.206/codon_start=3/product=TCP1-beta/protein_id=AAA98665.1/db_xref=GI:1293614/translation=SSIYNGISTSGLDLNNGTIADMRQLGIVESYKLKRAVVSSASEA AEVLLRVDNIIRARPRTANRQHM gene 687.3158/gene=AXL2.ORIGIN 1 gatcctccat atacaacggt atc
30、tccacct caggtttaga tctcaacaac ggaaccattg 61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtagtcagct.4981 tgccatgact cagattctaa ttttaagcta ttcaatttct ctttgatc/,LOCUS行的下一行为DEFINITION行。主要对GenBank记录中所含的生物学意义做出总结。它的说明内容包括了来源物种、基因/蛋白质名称。若序列是非编码区,则包含对序列功能的简单描述;若是一段编码区,则标明该序列是部分序列(partial cds)还是全序
31、列(complete cds)。,20/88,GBFF:accession,LOCUS SCU49845 5028 bp DNA linear PLN 21-JUN-1999 DEFINITION Saccharomyces cerevisiae TCP1-beta gene,partial cds;and Axl2p(AXL2)and Rev7p(REV7)genes,complete cds.ACCESSION U49845 VERSION U49845.1 GI:1293613 KEYWORDS.SOURCE Saccharomyces cerevisiae(bakers yeast)O
32、RGANISM Saccharomyces cerevisiae Eukaryota;Fungi;Ascomycota;Saccharomycotina;Saccharomycetes;Saccharomycetales;Saccharomycetaceae;Saccharomyces.REFERENCE 1(bases 1 to 5028)AUTHORS Torpey,L.E.,Gibbs,P.E.,Nelson,J.and Lawrence,C.W.TITLE Cloning and sequence of REV7,a gene whose function is required fo
33、r DNA damage-induced mutagenesis in Saccharomyces cerevisiae JOURNAL Yeast 10(11),1503-1509(1994)PUBMED 7871890.FEATURES Location/Qualifiers CDS 1.206/codon_start=3/product=TCP1-beta/protein_id=AAA98665.1/db_xref=GI:1293614/translation=SSIYNGISTSGLDLNNGTIADMRQLGIVESYKLKRAVVSSASEA AEVLLRVDNIIRARPRTAN
34、RQHM gene 687.3158/gene=AXL2.ORIGIN 1 gatcctccat atacaacggt atctccacct caggtttaga tctcaacaac ggaaccattg 61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtagtcagct.4981 tgccatgact cagattctaa ttttaagcta ttcaatttct ctttgatc/,21/88,GBFF:accession,LOCUS SCU49845 5028 bp DNA linear PLN 21-JUN-19
35、99 DEFINITION Saccharomyces cerevisiae TCP1-beta gene,partial cds;and Axl2p(AXL2)and Rev7p(REV7)genes,complete cds.ACCESSION U49845 VERSION U49845.1 GI:1293613 KEYWORDS.SOURCE Saccharomyces cerevisiae(bakers yeast)ORGANISM Saccharomyces cerevisiae Eukaryota;Fungi;Ascomycota;Saccharomycotina;Saccharo
36、mycetes;Saccharomycetales;Saccharomycetaceae;Saccharomyces.REFERENCE 1(bases 1 to 5028)AUTHORS Torpey,L.E.,Gibbs,P.E.,Nelson,J.and Lawrence,C.W.TITLE Cloning and sequence of REV7,a gene whose function is required for DNA damage-induced mutagenesis in Saccharomyces cerevisiae JOURNAL Yeast 10(11),150
37、3-1509(1994)PUBMED 7871890.FEATURES Location/Qualifiers CDS 1.206/codon_start=3/product=TCP1-beta/protein_id=AAA98665.1/db_xref=GI:1293614/translation=SSIYNGISTSGLDLNNGTIADMRQLGIVESYKLKRAVVSSASEA AEVLLRVDNIIRARPRTANRQHM gene 687.3158/gene=AXL2.ORIGIN 1 gatcctccat atacaacggt atctccacct caggtttaga tct
38、caacaac ggaaccattg 61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtagtcagct.4981 tgccatgact cagattctaa ttttaagcta ttcaatttct ctttgatc/,检索号(accession)是序列记录的惟一指针。通常由1个字母加5个数字(U12345)或由2个字母加6个数字(AF123456)组成。它在数据库中是惟一而且不变的。有时ACCESSION行中可能会出现多个检索号,可能是由于数据提交者提交了一条与原记录相关的新记录或新提交的记录覆盖了原有的旧记录。我们称
39、第一个检索号为主检索号,其余的统称二级检索号。,22/88,GBFF:version,LOCUS SCU49845 5028 bp DNA linear PLN 21-JUN-1999 DEFINITION Saccharomyces cerevisiae TCP1-beta gene,partial cds;and Axl2p(AXL2)and Rev7p(REV7)genes,complete cds.ACCESSION U49845 VERSION U49845.1 GI:1293613 KEYWORDS.SOURCE Saccharomyces cerevisiae(bakers ye
40、ast)ORGANISM Saccharomyces cerevisiae Eukaryota;Fungi;Ascomycota;Saccharomycotina;Saccharomycetes;Saccharomycetales;Saccharomycetaceae;Saccharomyces.REFERENCE 1(bases 1 to 5028)AUTHORS Torpey,L.E.,Gibbs,P.E.,Nelson,J.and Lawrence,C.W.TITLE Cloning and sequence of REV7,a gene whose function is requir
41、ed for DNA damage-induced mutagenesis in Saccharomyces cerevisiae JOURNAL Yeast 10(11),1503-1509(1994)PUBMED 7871890.FEATURES Location/Qualifiers CDS 1.206/codon_start=3/product=TCP1-beta/protein_id=AAA98665.1/db_xref=GI:1293614/translation=SSIYNGISTSGLDLNNGTIADMRQLGIVESYKLKRAVVSSASEA AEVLLRVDNIIRAR
42、PRTANRQHM gene 687.3158/gene=AXL2.ORIGIN 1 gatcctccat atacaacggt atctccacct caggtttaga tctcaacaac ggaaccattg 61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtagtcagct.4981 tgccatgact cagattctaa ttttaagcta ttcaatttct ctttgatc/,23/88,GBFF:version,LOCUS SCU49845 5028 bp DNA linear PLN 21-JUN
43、-1999 DEFINITION Saccharomyces cerevisiae TCP1-beta gene,partial cds;and Axl2p(AXL2)and Rev7p(REV7)genes,complete cds.ACCESSION U49845 VERSION U49845.1 GI:1293613 KEYWORDS.SOURCE Saccharomyces cerevisiae(bakers yeast)ORGANISM Saccharomyces cerevisiae Eukaryota;Fungi;Ascomycota;Saccharomycotina;Sacch
44、aromycetes;Saccharomycetales;Saccharomycetaceae;Saccharomyces.REFERENCE 1(bases 1 to 5028)AUTHORS Torpey,L.E.,Gibbs,P.E.,Nelson,J.and Lawrence,C.W.TITLE Cloning and sequence of REV7,a gene whose function is required for DNA damage-induced mutagenesis in Saccharomyces cerevisiae JOURNAL Yeast 10(11),
45、1503-1509(1994)PUBMED 7871890.FEATURES Location/Qualifiers CDS 1.206/codon_start=3/product=TCP1-beta/protein_id=AAA98665.1/db_xref=GI:1293614/translation=SSIYNGISTSGLDLNNGTIADMRQLGIVESYKLKRAVVSSASEA AEVLLRVDNIIRARPRTANRQHM gene 687.3158/gene=AXL2.ORIGIN 1 gatcctccat atacaacggt atctccacct caggtttaga
46、tctcaacaac ggaaccattg 61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtagtcagct.4981 tgccatgact cagattctaa ttttaagcta ttcaatttct ctttgatc/,VERSION行是版本号,格式为:检索号.版本号。版本号用于识别数据库中一条单一的特定核苷酸序列。在数据库中,如某条序列数据发生了变化,即使是单碱基的改变它的版本号也将增加,而其检索号保持不变。版本号系统与其后的GI(geninfo identifier)号系统是平行运行的。即当一条序列改变后,它将
47、被赋予一个新的GI号,其版本号也将增加。蛋白质的翻译发生任何变换,核酸序列都将被赋予一个新的GI号。,24/88,GBFF:keywords,LOCUS SCU49845 5028 bp DNA linear PLN 21-JUN-1999 DEFINITION Saccharomyces cerevisiae TCP1-beta gene,partial cds;and Axl2p(AXL2)and Rev7p(REV7)genes,complete cds.ACCESSION U49845 VERSION U49845.1 GI:1293613 KEYWORDS.SOURCE Saccha
48、romyces cerevisiae(bakers yeast)ORGANISM Saccharomyces cerevisiae Eukaryota;Fungi;Ascomycota;Saccharomycotina;Saccharomycetes;Saccharomycetales;Saccharomycetaceae;Saccharomyces.REFERENCE 1(bases 1 to 5028)AUTHORS Torpey,L.E.,Gibbs,P.E.,Nelson,J.and Lawrence,C.W.TITLE Cloning and sequence of REV7,a g
49、ene whose function is required for DNA damage-induced mutagenesis in Saccharomyces cerevisiae JOURNAL Yeast 10(11),1503-1509(1994)PUBMED 7871890.FEATURES Location/Qualifiers CDS 1.206/codon_start=3/product=TCP1-beta/protein_id=AAA98665.1/db_xref=GI:1293614/translation=SSIYNGISTSGLDLNNGTIADMRQLGIVESY
50、KLKRAVVSSASEA AEVLLRVDNIIRARPRTANRQHM gene 687.3158/gene=AXL2.ORIGIN 1 gatcctccat atacaacggt atctccacct caggtttaga tctcaacaac ggaaccattg 61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtagtcagct.4981 tgccatgact cagattctaa ttttaagcta ttcaatttct ctttgatc/,25/88,GBFF:keywords,LOCUS SCU49845