Class Bio::Blast
In: lib/bio/appl/bl2seq/report.rb
lib/bio/appl/blast.rb
lib/bio/appl/blast/format0.rb
lib/bio/appl/blast/format8.rb
lib/bio/appl/blast/ncbioptions.rb
lib/bio/appl/blast/remote.rb
lib/bio/appl/blast/report.rb
lib/bio/appl/blast/rexml.rb
lib/bio/appl/blast/rpsblast.rb
lib/bio/appl/blast/wublast.rb
lib/bio/appl/blast/xmlparser.rb
lib/bio/io/fastacmd.rb
Parent: Object

Description

The Bio::Blast class contains methods for running local or remote BLAST searches, as well as for parsing of the output of such BLASTs (i.e. the BLAST reports). For more information on similarity searches and the BLAST program, see www.ncbi.nlm.nih.gov/Education/BLASTinfo/similarity.html.

Usage

  require 'bio'

  # To run an actual BLAST analysis:
  #   1. create a BLAST factory
  remote_blast_factory = Bio::Blast.remote('blastp', 'SWISS',
                                           '-e 0.0001', 'genomenet')
  #or:
  local_blast_factory = Bio::Blast.local('blastn','/path/to/db')

  #   2. run the actual BLAST by querying the factory
  report = remote_blast_factory.query(sequence_text)

  # Then, to parse the report, see Bio::Blast::Report

See also

References

Methods

local   new   option   option=   options=   query   remote   reports   reports_xml   server=  

Classes and Modules

Module Bio::Blast::RPSBlast
Module Bio::Blast::Remote
Class Bio::Blast::Bl2seq
Class Bio::Blast::Fastacmd
Class Bio::Blast::NCBIOptions
Class Bio::Blast::Report
Class Bio::Blast::Report_tab

Attributes

blastall  [RW]  Full path for blastall. (default: ‘blastall’).
db  [RW]  Database name (_-d_ option for blastall)
filter  [RW]  Filter option for blastall -F (T or F).
format  [RW]  Output report format for blastall -m

0, pairwise; 1; 2; 3; 4; 5; 6; 7, XML Blast outpu;, 8, tabular; 9, tabular with comment lines; 10, ASN text; 11, ASN binery [intege].

matrix  [RW]  Substitution matrix for blastall -M
options  [R]  Options for blastall
output  [R]  Returns a String containing blast execution output in as is the Bio::Blast#format.
parser  [W] 
program  [RW]  Program name (_-p_ option for blastall): blastp, blastn, blastx, tblastn or tblastx
server  [R]  Server to submit the BLASTs to

Public Class methods

This is a shortcut for Bio::Blast.new:

 Bio::Blast.local(program, database, options)

is equivalent to

 Bio::Blast.new(program, database, options, 'local')

Arguments:

  • program (required): ‘blastn’, ‘blastp’, ‘blastx’, ‘tblastn’ or ‘tblastx‘
  • db (required): name of the local database
  • options: blastall options # (see www.genome.jp/dbget-bin/show_man?blast2)
  • blastall: full path to blastall program (e.g. "/opt/bin/blastall"; DEFAULT: "blastall")
Returns:Bio::Blast factory object

[Source]

    # File lib/bio/appl/blast.rb, line 79
79:     def self.local(program, db, options = '', blastall = nil)
80:       f = self.new(program, db, options, 'local')
81:       if blastall then
82:         f.blastall = blastall
83:       end
84:       f
85:     end

Creates a Bio::Blast factory object.

To run any BLAST searches, a factory has to be created that describes a certain BLAST pipeline: the program to use, the database to search, any options and the server to use. E.g.

  blast_factory = Bio::Blast.new('blastn','dbsts', '-e 0.0001 -r 4', 'genomenet')

Arguments:

  • program (required): ‘blastn’, ‘blastp’, ‘blastx’, ‘tblastn’ or ‘tblastx‘
  • db (required): name of the (local or remote) database
  • options: blastall options # (see www.genome.jp/dbget-bin/show_man?blast2)
  • server: server to use (e.g. ‘genomenet’; DEFAULT = ‘local’)
Returns:Bio::Blast factory object

[Source]

     # File lib/bio/appl/blast.rb, line 317
317:     def initialize(program, db, opt = [], server = 'local')
318:       @program  = program
319:       @db       = db
320: 
321:       @blastall = 'blastall'
322:       @matrix   = nil
323:       @filter   = nil
324: 
325:       @output   = ''
326:       @parser   = nil
327:       @format   = nil
328: 
329:       @options = set_options(opt, program, db)
330:       self.server = server
331:     end

Bio::Blast.remote does exactly the same as Bio::Blast.new, but sets the remote server ‘genomenet’ as its default.


Arguments:

  • program (required): ‘blastn’, ‘blastp’, ‘blastx’, ‘tblastn’ or ‘tblastx‘
  • db (required): name of the remote database
  • options: blastall options # (see www.genome.jp/dbget-bin/show_man?blast2)
  • server: server to use (DEFAULT = ‘genomenet’)
Returns:Bio::Blast factory object

[Source]

    # File lib/bio/appl/blast.rb, line 97
97:     def self.remote(program, db, option = '', server = 'genomenet')
98:       self.new(program, db, option, server)
99:     end

Bio::Blast.report parses given data, and returns an array of report (Bio::Blast::Report or Bio::Blast::Default::Report) objects, or yields each report object when a block is given.

Supported formats: NCBI default (-m 0), XML (-m 7), tabular (-m 8).


Arguments:

Returns:Undefiend when a block is given. Otherwise, an Array containing report (Bio::Blast::Report or Bio::Blast::Default::Report) objects.

[Source]

     # File lib/bio/appl/blast.rb, line 114
114:     def self.reports(input, parser = nil)
115:       begin
116:         istr = input.to_str
117:       rescue NoMethodError
118:         istr = nil
119:       end
120:       if istr then
121:         input = StringIO.new(istr)
122:       end
123:       raise 'unsupported input data type' unless input.respond_to?(:gets)
124: 
125:       # if proper parser is given, emulates old behavior.
126:       case parser
127:       when :xmlparser, :rexml
128:         ff = Bio::FlatFile.new(Bio::Blast::Report, input)
129:         if block_given? then
130:           ff.each do |e|
131:             yield e
132:           end
133:           return []
134:         else
135:           return ff.to_a
136:         end
137:       when :tab
138:         istr = input.read unless istr
139:         rep = Report.new(istr, parser)
140:         if block_given? then
141:           yield rep
142:           return []
143:         else
144:           return [ rep ]
145:         end
146:       end
147: 
148:       # preparation of the new format autodetection rule if needed
149:       if !defined?(@@reports_format_autodetection_rule) or
150:           !@@reports_format_autodetection_rule then
151:         regrule = Bio::FlatFile::AutoDetect::RuleRegexp
152:         blastxml = regrule[ 'Bio::Blast::Report',
153:                             /\<\!DOCTYPE BlastOutput PUBLIC / ]
154:         blast    = regrule[ 'Bio::Blast::Default::Report',
155:                             /^BLAST.? +[\-\.\w]+ +\[[\-\.\w ]+\]/ ]
156:         tblast   = regrule[ 'Bio::Blast::Default::Report_TBlast',
157:                             /^TBLAST.? +[\-\.\w]+ +\[[\-\.\w ]+\]/ ]
158:         tab      = regrule[ 'Bio::Blast::Report_tab',
159:                             /^([^\t]*\t){11}[^\t]*$/ ]
160:         auto = Bio::FlatFile::AutoDetect[ blastxml,
161:                                           blast,
162:                                           tblast,
163:                                           tab
164:                                         ]
165:         # sets priorities
166:         blastxml.is_prior_to blast
167:         blast.is_prior_to tblast
168:         tblast.is_prior_to tab
169:         # rehash
170:         auto.rehash
171:         @@report_format_autodetection_rule = auto
172:       end
173: 
174:       # Creates a FlatFile object with dummy class
175:       ff = Bio::FlatFile.new(Object, input)
176:       ff.dbclass = nil
177: 
178:       # file format autodetection
179:       3.times do
180:         break if ff.eof? or
181:           ff.autodetect(31, @@report_format_autodetection_rule)
182:       end
183:       # If format detection failed, assumed to be tabular (-m 8)
184:       ff.dbclass = Bio::Blast::Report_tab unless ff.dbclass
185: 
186:       if block_given? then
187:         ff.each do |entry|
188:           yield entry
189:         end
190:         ret = []
191:       else
192:         ret = ff.to_a
193:       end
194:       ret
195:     end

Note that this is the old implementation of Bio::Blast.reports. The aim of this method is keeping compatibility for older BLAST XML documents which might not be parsed by the new Bio::Blast.reports nor Bio::FlatFile. (Though we are not sure whether such documents exist or not.)

Bio::Blast.reports_xml parses given data, and returns an array of Bio::Blast::Report objects, or yields each Bio::Blast::Report object when a block is given.

It can be used only for XML format. For default (-m 0) format, consider using Bio::FlatFile, or Bio::Blast.reports.


Arguments:

Returns:Undefiend when a block is given. Otherwise, an Array containing Bio::Blast::Report objects.

[Source]

     # File lib/bio/appl/blast.rb, line 220
220:     def self.reports_xml(input, parser = nil)
221:       ary = []
222:       input.each_line("</BlastOutput>\n") do |xml|
223:         xml.sub!(/[^<]*(<?)/, '\1') # skip before <?xml> tag
224:         next if xml.empty?          # skip trailing no hits
225:         rep = Report.new(xml, parser)
226:         if rep.reports then
227:           if block_given?
228:             rep.reports.each { |r| yield r }
229:           else
230:             ary.concat rep.reports
231:           end
232:         else
233:           if block_given?
234:             yield rep
235:           else
236:             ary.push rep
237:           end
238:         end
239:       end
240:       return ary
241:     end

Public Instance methods

Returns options of blastall

[Source]

     # File lib/bio/appl/blast.rb, line 374
374:     def option
375:       # backward compatibility
376:       Bio::Command.make_command_line(options)
377:     end

Set options for blastall

[Source]

     # File lib/bio/appl/blast.rb, line 380
380:     def option=(str)
381:       # backward compatibility
382:       self.options = Shellwords.shellwords(str)
383:     end

Sets options for blastall

[Source]

     # File lib/bio/appl/blast.rb, line 255
255:     def options=(ary)
256:       @options = set_options(ary)
257:     end

This method submits a sequence to a BLAST factory, which performs the actual BLAST.

  # example 1
  seq = Bio::Sequence::NA.new('agggcattgccccggaagatcaagtcgtgctcctg')
  report = blast_factory.query(seq)

  # example 2
  str <<END_OF_FASTA
  >lcl|MySequence
  MPPSAISKISNSTTPQVQSSSAPNLTMLEGKGISVEKSFRVYSEEENQNQHKAKDSLGF
  KELEKDAIKNSKQDKKDHKNWLETLYDQAEQKWLQEPKKKLQDLIKNSGDNSRVILKDS
  END_OF_FASTA
  report = blast_factory.query(str)

Bug note: When multi-FASTA is given and the format is 7 (XML) or 8 (tab), it should return an array of Bio::Blast::Report objects, but it returns a single Bio::Blast::Report object. This is a known bug and should be fixed in the future.


Arguments:

  • query (required): single- or multiple-FASTA formatted sequence(s)
Returns:a Bio::Blast::Report (or Bio::Blast::Default::Report) object when single query is given. When multiple sequences are given as the query, it returns an array of Bio::Blast::Report (or Bio::Blast::Default::Report) objects. If it can not parse result, nil will be returnd.

[Source]

     # File lib/bio/appl/blast.rb, line 358
358:     def query(query)
359:       case query
360:       when Bio::Sequence
361:         query = query.output(:fasta)
362:       when Bio::Sequence::NA, Bio::Sequence::AA, Bio::Sequence::Generic
363:         query = query.to_fasta('query', 70)
364:       else
365:         query = query.to_s
366:       end
367: 
368:       @output = self.__send__("exec_#{@server}", query)
369:       report = parse_result(@output)
370:       return report
371:     end

Sets server to submit the BLASTs to. The exec_xxxx method should be defined in Bio::Blast or Bio::Blast::Remote::Xxxx class.

[Source]

     # File lib/bio/appl/blast.rb, line 265
265:     def server=(str)
266:       @server = str
267:       begin
268:         m = Bio::Blast::Remote.const_get(@server.capitalize)
269:       rescue NameError
270:         m = nil
271:       end
272:       if m and !(self.is_a?(m)) then
273:         # lazy include Bio::Blast::Remote::XXX module
274:         self.class.class_eval { include m }
275:       end
276:       return @server
277:     end

[Validate]