The Bio::Blast class contains methods for running local or remote BLAST searches, as well as for parsing of the output of such BLASTs (i.e. the BLAST reports). For more information on similarity searches and the BLAST program, see www.ncbi.nlm.nih.gov/Education/BLASTinfo/similarity.html.
require 'bio' # To run an actual BLAST analysis: # 1. create a BLAST factory remote_blast_factory = Bio::Blast.remote('blastp', 'SWISS', '-e 0.0001', 'genomenet') #or: local_blast_factory = Bio::Blast.local('blastn','/path/to/db') # 2. run the actual BLAST by querying the factory report = remote_blast_factory.query(sequence_text) # Then, to parse the report, see Bio::Blast::Report
blastall | [RW] | Full path for blastall. (default: ‘blastall’). |
db | [RW] | Database name (_-d_ option for blastall) |
filter | [RW] | Filter option for blastall -F (T or F). |
format | [RW] |
Output report format for blastall -m
0, pairwise; 1; 2; 3; 4; 5; 6; 7, XML Blast outpu;, 8, tabular; 9, tabular with comment lines; 10, ASN text; 11, ASN binery [intege]. |
matrix | [RW] | Substitution matrix for blastall -M |
options | [R] | Options for blastall |
output | [R] | Returns a String containing blast execution output in as is the Bio::Blast#format. |
parser | [W] | |
program | [RW] | Program name (_-p_ option for blastall): blastp, blastn, blastx, tblastn or tblastx |
server | [R] | Server to submit the BLASTs to |
This is a shortcut for Bio::Blast.new:
Bio::Blast.local(program, database, options)
is equivalent to
Bio::Blast.new(program, database, options, 'local')
Arguments:
Returns: | Bio::Blast factory object |
# File lib/bio/appl/blast.rb, line 79 79: def self.local(program, db, options = '', blastall = nil) 80: f = self.new(program, db, options, 'local') 81: if blastall then 82: f.blastall = blastall 83: end 84: f 85: end
Creates a Bio::Blast factory object.
To run any BLAST searches, a factory has to be created that describes a certain BLAST pipeline: the program to use, the database to search, any options and the server to use. E.g.
blast_factory = Bio::Blast.new('blastn','dbsts', '-e 0.0001 -r 4', 'genomenet')
Arguments:
Returns: | Bio::Blast factory object |
# File lib/bio/appl/blast.rb, line 317 317: def initialize(program, db, opt = [], server = 'local') 318: @program = program 319: @db = db 320: 321: @blastall = 'blastall' 322: @matrix = nil 323: @filter = nil 324: 325: @output = '' 326: @parser = nil 327: @format = nil 328: 329: @options = set_options(opt, program, db) 330: self.server = server 331: end
Bio::Blast.remote does exactly the same as Bio::Blast.new, but sets the remote server ‘genomenet’ as its default.
Arguments:
Returns: | Bio::Blast factory object |
# File lib/bio/appl/blast.rb, line 97 97: def self.remote(program, db, option = '', server = 'genomenet') 98: self.new(program, db, option, server) 99: end
Bio::Blast.report parses given data, and returns an array of report (Bio::Blast::Report or Bio::Blast::Default::Report) objects, or yields each report object when a block is given.
Supported formats: NCBI default (-m 0), XML (-m 7), tabular (-m 8).
Arguments:
Returns: | Undefiend when a block is given. Otherwise, an Array containing report (Bio::Blast::Report or Bio::Blast::Default::Report) objects. |
# File lib/bio/appl/blast.rb, line 114 114: def self.reports(input, parser = nil) 115: begin 116: istr = input.to_str 117: rescue NoMethodError 118: istr = nil 119: end 120: if istr then 121: input = StringIO.new(istr) 122: end 123: raise 'unsupported input data type' unless input.respond_to?(:gets) 124: 125: # if proper parser is given, emulates old behavior. 126: case parser 127: when :xmlparser, :rexml 128: ff = Bio::FlatFile.new(Bio::Blast::Report, input) 129: if block_given? then 130: ff.each do |e| 131: yield e 132: end 133: return [] 134: else 135: return ff.to_a 136: end 137: when :tab 138: istr = input.read unless istr 139: rep = Report.new(istr, parser) 140: if block_given? then 141: yield rep 142: return [] 143: else 144: return [ rep ] 145: end 146: end 147: 148: # preparation of the new format autodetection rule if needed 149: if !defined?(@@reports_format_autodetection_rule) or 150: !@@reports_format_autodetection_rule then 151: regrule = Bio::FlatFile::AutoDetect::RuleRegexp 152: blastxml = regrule[ 'Bio::Blast::Report', 153: /\<\!DOCTYPE BlastOutput PUBLIC / ] 154: blast = regrule[ 'Bio::Blast::Default::Report', 155: /^BLAST.? +[\-\.\w]+ +\[[\-\.\w ]+\]/ ] 156: tblast = regrule[ 'Bio::Blast::Default::Report_TBlast', 157: /^TBLAST.? +[\-\.\w]+ +\[[\-\.\w ]+\]/ ] 158: tab = regrule[ 'Bio::Blast::Report_tab', 159: /^([^\t]*\t){11}[^\t]*$/ ] 160: auto = Bio::FlatFile::AutoDetect[ blastxml, 161: blast, 162: tblast, 163: tab 164: ] 165: # sets priorities 166: blastxml.is_prior_to blast 167: blast.is_prior_to tblast 168: tblast.is_prior_to tab 169: # rehash 170: auto.rehash 171: @@report_format_autodetection_rule = auto 172: end 173: 174: # Creates a FlatFile object with dummy class 175: ff = Bio::FlatFile.new(Object, input) 176: ff.dbclass = nil 177: 178: # file format autodetection 179: 3.times do 180: break if ff.eof? or 181: ff.autodetect(31, @@report_format_autodetection_rule) 182: end 183: # If format detection failed, assumed to be tabular (-m 8) 184: ff.dbclass = Bio::Blast::Report_tab unless ff.dbclass 185: 186: if block_given? then 187: ff.each do |entry| 188: yield entry 189: end 190: ret = [] 191: else 192: ret = ff.to_a 193: end 194: ret 195: end
Note that this is the old implementation of Bio::Blast.reports. The aim of this method is keeping compatibility for older BLAST XML documents which might not be parsed by the new Bio::Blast.reports nor Bio::FlatFile. (Though we are not sure whether such documents exist or not.)
Bio::Blast.reports_xml parses given data, and returns an array of Bio::Blast::Report objects, or yields each Bio::Blast::Report object when a block is given.
It can be used only for XML format. For default (-m 0) format, consider using Bio::FlatFile, or Bio::Blast.reports.
Arguments:
Returns: | Undefiend when a block is given. Otherwise, an Array containing Bio::Blast::Report objects. |
# File lib/bio/appl/blast.rb, line 220 220: def self.reports_xml(input, parser = nil) 221: ary = [] 222: input.each_line("</BlastOutput>\n") do |xml| 223: xml.sub!(/[^<]*(<?)/, '\1') # skip before <?xml> tag 224: next if xml.empty? # skip trailing no hits 225: rep = Report.new(xml, parser) 226: if rep.reports then 227: if block_given? 228: rep.reports.each { |r| yield r } 229: else 230: ary.concat rep.reports 231: end 232: else 233: if block_given? 234: yield rep 235: else 236: ary.push rep 237: end 238: end 239: end 240: return ary 241: end
Returns options of blastall
# File lib/bio/appl/blast.rb, line 374 374: def option 375: # backward compatibility 376: Bio::Command.make_command_line(options) 377: end
Set options for blastall
# File lib/bio/appl/blast.rb, line 380 380: def option=(str) 381: # backward compatibility 382: self.options = Shellwords.shellwords(str) 383: end
Sets options for blastall
# File lib/bio/appl/blast.rb, line 255 255: def options=(ary) 256: @options = set_options(ary) 257: end
This method submits a sequence to a BLAST factory, which performs the actual BLAST.
# example 1 seq = Bio::Sequence::NA.new('agggcattgccccggaagatcaagtcgtgctcctg') report = blast_factory.query(seq) # example 2 str <<END_OF_FASTA >lcl|MySequence MPPSAISKISNSTTPQVQSSSAPNLTMLEGKGISVEKSFRVYSEEENQNQHKAKDSLGF KELEKDAIKNSKQDKKDHKNWLETLYDQAEQKWLQEPKKKLQDLIKNSGDNSRVILKDS END_OF_FASTA report = blast_factory.query(str)
Bug note: When multi-FASTA is given and the format is 7 (XML) or 8 (tab), it should return an array of Bio::Blast::Report objects, but it returns a single Bio::Blast::Report object. This is a known bug and should be fixed in the future.
Arguments:
Returns: | a Bio::Blast::Report (or Bio::Blast::Default::Report) object when single query is given. When multiple sequences are given as the query, it returns an array of Bio::Blast::Report (or Bio::Blast::Default::Report) objects. If it can not parse result, nil will be returnd. |
# File lib/bio/appl/blast.rb, line 358 358: def query(query) 359: case query 360: when Bio::Sequence 361: query = query.output(:fasta) 362: when Bio::Sequence::NA, Bio::Sequence::AA, Bio::Sequence::Generic 363: query = query.to_fasta('query', 70) 364: else 365: query = query.to_s 366: end 367: 368: @output = self.__send__("exec_#{@server}", query) 369: report = parse_result(@output) 370: return report 371: end
Sets server to submit the BLASTs to. The exec_xxxx method should be defined in Bio::Blast or Bio::Blast::Remote::Xxxx class.
# File lib/bio/appl/blast.rb, line 265 265: def server=(str) 266: @server = str 267: begin 268: m = Bio::Blast::Remote.const_get(@server.capitalize) 269: rescue NameError 270: m = nil 271: end 272: if m and !(self.is_a?(m)) then 273: # lazy include Bio::Blast::Remote::XXX module 274: self.class.class_eval { include m } 275: end 276: return @server 277: end