I need to create a program that can take some text file called a fasta file and transform it to give the sequence_name, Domain_names, Start of Domain, end Of Domain.

我需要创建一个程序,可以获取一些称为fasta文件的文本文件并将其转换为sequence_name,Domain_names,Domain of Domain,End Of Domain。

So a fasta file is just a text file that looks like this

因此,fasta文件只是一个看起来像这样的文本文件

>MICE_8
ATTCGATCGATCGATTTCGATCGATCGATCGATCGGGATCGATCGATCGATCGATC
>MICE_59 
ATTTTTCGGCATCGATAGCTAGCTAGCTAG

My program needs to take one command argument which is the file name of the fasta and give an output like this:

我的程序需要一个命令参数,它是fasta的文件名,并给出如下输出:

MICE_8 gnl|CDD|256537 819 923 gnl|CDD|260076 111 189 gnl|CDD|260056 4 93                                          
MICE_59

here is a decription of the output for more information:

以下是输出的描述以获取更多信息:

  • MICE_8 is the name of the first sequence in the fasta file
  • MICE_8是fasta文件中第一个序列的名称
  • gnl|CDD|256537 is the name of the first protein domain
  • gnl | CDD | 256537是第一个蛋白质结构域的名称
  • 819 this is where the domain stats
  • 819这是域名统计
  • 923 this is where it ends
  • 923这就是结束的地方
  • gnl|CDD|260076 is the name of the second protein domain for the first sequence and so on it starts at 111 and end at position 189.
  • gnl | CDD | 260076是第一个序列的第二个蛋白质结构域的名称,依此类推,它从111开始,到189位结束。

Also since the last sequence did not get a hit the program still needs to display the name of the sequence.

此外,由于最后一个序列没有得到命中,程序仍然需要显示序列的名称。

OK so here is my code so far and what it outputs so far

好的,所以这是我到目前为止的代码以及它到目前为止输出的内容

import sys
import os

fastaname = sys.argv[1]
rpsblastname = "rpsblast.out"

cmd = "rpsblast+ -db /home/bryan/data/cdd/cdd -query %s -outfmt 6 -evalue 0.05 > %s" % (fastaname,rpsblastname)
os.system(cmd)

handle = open(rpsblastname, "r")
seqname = ""
for line in handle:
    linearr = line.split()
    # seqname = linearr [0]
    domain = linearr[1]
    start = linearr[6]
    end = linearr[7]
    # If sequence name is the same as last time, don't print it
    if seqname == linearr[0]:
        sys.stdout.write("%s %s %s" % (domain, start, end))
    # Otherwise do print the sequence name, and update seqname
    else:
        seqname = linearr[0]
        print
        sys.stdout.write("%s %s %s %s" % (seqname,domain,start,end))

here is what my output looks like so far:

这是我的输出到目前为止的样子:

mel@roswald:~$ ./Domainfinder.py bioinformation.fasta 

MICE_8 gnl|CDD|256537 819 923gnl|CDD|260076 111 189gnl|CDD|260056 4 93                                                                                                         

The program i created is almost to the required specification. * only have 3 problems that * need be to address:

我创建的程序几乎达到了所需的规格。 *只有3个需要解决的问题:

  1. there is an extra space between where I run the program and the result
  2. 在我运行程序和结果之间有一个额外的空间
  3. my program does not write out the name of the sequence which has zero hits
  4. 我的程序没有写出命中为零的序列的名称
  5. my program does not separate the domain names by a space.
  6. 我的程序没有用空格分隔域名。

the correct output should look like this

正确的输出应该是这样的

mel@roswald:~$ ./Domainfinder.py bioinformation.fasta                                        
MICE_8 gnl|CDD|256537 819 923 gnl|CDD|260076 111 189 gnl|CDD|260056 4 93                                          
MICE_59

1 个解决方案

#1


0

solved the issue. Mainly what needed to be done is use a dictionary to hold the sequence names as keys instead of using lists. Than since dictionaries are random we need to be able to create a list from the dictionary to order the sequence names as they are read. Also we extract the sequence names from the rpsblast out. if anyone has any question feel free to pm.

解决了这个问题。主要需要做的是使用字典将序列名称保存为键而不是使用列表。由于字典是随机的,我们需要能够从字典中创建一个列表,以便在读取时对序列名称进行排序。我们还从rpsblast中提取序列名称。如果有人有任何问题随时可以下午。

更多相关文章

  1. 可变序列长度数据的分类
  2. 任何人都可以提供更多的pythonic方式来生成morris序列吗?
  3. 如何在序列化后从查询中更新json数据?
  4. 16讲 序列!序列!
  5. python排序列表与铸造
  6. Python 3.4中的Pytesser:名称“image_to_string”没有定义?
  7. 获取网卡名称 linux c
  8. 如何利用SQL语句查询数据库中所有表的名称?
  9. 如果在两个模式中存在具有相似名称的删除表

随机推荐

  1. Android ADB=Android Debug Bridge帮助信
  2. Android 自定义控件打造史上最简单的侧滑
  3. Android的进程优先级与进程回收详解
  4. Android -- 解决Android Studio 和 Andro
  5. Ubuntu 下创建启动器
  6. 仿Android6.0联系人列表
  7. android复合控件
  8. android ios vue 互调
  9. 关于Android(安卓)intent的知识
  10. EditeText标签字体大小设置 是否可编辑