Module | Bio::Sequence::Common |
In: |
lib/bio/sequence/common.rb
lib/bio/sequence/compat.rb |
Bio::Sequence::Common is a Mixin implementing methods common to Bio::Sequence::AA and Bio::Sequence::NA. All of these methods are available to either Amino Acid or Nucleic Acid sequences, and by encapsulation are also available to Bio::Sequence objects.
# Create a sequence dna = Bio::Sequence.auto('atgcatgcatgc') # Splice out a subsequence using a Genbank-style location string puts dna.splice('complement(1..4)') # What is the base composition? puts dna.composition # Create a random sequence with the composition of a current sequence puts dna.randomize
Create a new sequence by adding to an existing sequence. The existing sequence is not modified.
s = Bio::Sequence::NA.new('atgc') s2 = s + 'atgc' puts s2 #=> "atgcatgc" puts s #=> "atgc"
The new sequence is of the same class as the existing sequence if the new data was added to an existing sequence,
puts s2.class == s.class #=> true
but if an existing sequence is added to a String, the result is a String
s3 = 'atgc' + s puts s3.class #=> String
Returns: | new Bio::Sequence::NA/AA or String object |
# File lib/bio/sequence/common.rb, line 121 121: def +(*arg) 122: self.class.new(super(*arg)) 123: end
Returns a hash of the occurrence counts for each residue or base.
s = Bio::Sequence::NA.new('atgc') puts s.composition #=> {"a"=>1, "c"=>1, "g"=>1, "t"=>1}
Returns: | Hash object |
# File lib/bio/sequence/common.rb, line 215 215: def composition 216: count = Hash.new(0) 217: self.scan(/./) do |x| 218: count[x] += 1 219: end 220: return count 221: end
Add new data to the end of the current sequence. The original sequence is modified.
s = Bio::Sequence::NA.new('atgc') s << 'atgc' puts s #=> "atgcatgc" s << s puts s #=> "atgcatgcatgcatgc"
Returns: | current Bio::Sequence::NA/AA object (modified) |
# File lib/bio/sequence/common.rb, line 94 94: def concat(*arg) 95: super(self.class.new(*arg)) 96: end
Normalize the current sequence, removing all whitespace and transforming all positions to uppercase if the sequence is AA or transforming all positions to lowercase if the sequence is NA. The original sequence is modified.
s = Bio::Sequence::NA.new('atgc') s.normalize!
Returns: | current Bio::Sequence::NA/AA object (modified) |
# File lib/bio/sequence/common.rb, line 78 78: def normalize! 79: initialize(self) 80: self 81: end
Returns a randomized sequence. The default is to retain the same base/residue composition as the original. If a hash of base/residue counts is given, the new sequence will be based on that hash composition. If a block is given, each new randomly selected position will be passed into the block. In all cases, the original sequence is not modified.
s = Bio::Sequence::NA.new('atgc') puts s.randomize #=> "tcag" (for example) new_composition = {'a' => 2, 't' => 2} puts s.randomize(new_composition) #=> "ttaa" (for example) count = 0 s.randomize { |x| count += 1 } puts count #=> 4
Arguments:
Returns: | new Bio::Sequence::NA/AA object |
# File lib/bio/sequence/common.rb, line 243 243: def randomize(hash = nil) 244: if hash 245: tmp = '' 246: hash.each {|k, v| 247: tmp += k * v.to_i 248: } 249: else 250: tmp = self 251: end 252: seq = self.class.new(tmp) 253: # Reference: http://en.wikipedia.org/wiki/Fisher-Yates_shuffle 254: seq.length.downto(2) do |n| 255: k = rand(n) 256: c = seq[n - 1] 257: seq[n - 1] = seq[k] 258: seq[k] = c 259: end 260: if block_given? then 261: (0...seq.length).each do |i| 262: yield seq[i, 1] 263: end 264: return self.class.new('') 265: else 266: return seq 267: end 268: end
Create a new sequence based on the current sequence. The original sequence is unchanged.
s = Bio::Sequence::NA.new('atgc') s2 = s.seq puts s2 #=> 'atgc'
Returns: | new Bio::Sequence::NA/AA object |
# File lib/bio/sequence/common.rb, line 65 65: def seq 66: self.class.new(self) 67: end
Return a new sequence extracted from the original using a GenBank style position string. See also documentation for the Bio::Location class.
s = Bio::Sequence::NA.new('atgcatgcatgcatgc') puts s.splice('1..3') #=> "atg" puts s.splice('join(1..3,8..10)') #=> "atgcat" puts s.splice('complement(1..3)') #=> "cat" puts s.splice('complement(join(1..3,8..10))') #=> "atgcat"
Note that ‘complement‘ed Genbank position strings will have no effect on Bio::Sequence::AA objects.
Arguments:
Returns: | Bio::Sequence::NA/AA object |
# File lib/bio/sequence/common.rb, line 285 285: def splice(position) 286: unless position.is_a?(Locations) then 287: position = Locations.new(position) 288: end 289: s = '' 290: position.each do |location| 291: if location.sequence 292: s << location.sequence 293: else 294: exon = self.subseq(location.from, location.to) 295: begin 296: exon.complement! if location.strand < 0 297: rescue NameError 298: end 299: s << exon 300: end 301: end 302: return self.class.new(s) 303: end
Returns a new sequence containing the subsequence identified by the start and end numbers given as parameters. *Important:* Biological sequence numbering conventions (one-based) rather than ruby‘s (zero-based) numbering conventions are used.
s = Bio::Sequence::NA.new('atggaatga') puts s.subseq(1,3) #=> "atg"
Start defaults to 1 and end defaults to the entire existing string, so subseq called without any parameters simply returns a new sequence identical to the existing sequence.
puts s.subseq #=> "atggaatga"
Arguments:
Returns: | new Bio::Sequence::NA/AA object |
# File lib/bio/sequence/common.rb, line 143 143: def subseq(s = 1, e = self.length) 144: raise "Error: start/end position must be a positive integer" unless s > 0 and e > 0 145: s -= 1 146: e -= 1 147: self[s..e] 148: end
Bio::Sequence#to_fasta is DEPRECIATED Do not use Bio::Sequence#to_fasta ! Use Bio::Sequence#output instead. Note that Bio::Sequence::NA#to_fasta, Bio::Sequence::AA#to_fasata, and Bio::Sequence::Generic#to_fasta can still be used, because there are no alternative methods.
Output the FASTA format string of the sequence. The 1st argument is used as the comment string. If the 2nd option is given, the output sequence will be folded.
Arguments:
Returns: | String |
# File lib/bio/sequence/compat.rb, line 54 54: def to_fasta(header = '', width = nil) 55: warn "Bio::Sequence#to_fasta is obsolete. Use Bio::Sequence#output(:fasta) instead" if $DEBUG 56: ">#{header}\n" + 57: if width 58: self.to_s.gsub(Regexp.new(".{1,#{width}}"), "\\0\n") 59: else 60: self.to_s + "\n" 61: end 62: end
Return sequence as String. The original sequence is unchanged.
seq = Bio::Sequence::NA.new('atgc') puts s.to_s #=> 'atgc' puts s.to_s.class #=> String puts s #=> 'atgc' puts s.class #=> Bio::Sequence::NA
Returns: | String object |
# File lib/bio/sequence/common.rb, line 52 52: def to_s 53: String.new(self) 54: end
Returns a float total value for the sequence given a hash of base or residue values,
values = {'a' => 0.1, 't' => 0.2, 'g' => 0.3, 'c' => 0.4} s = Bio::Sequence::NA.new('atgc') puts s.total(values) #=> 1.0
Arguments:
Returns: | Float object |
# File lib/bio/sequence/common.rb, line 198 198: def total(hash) 199: hash.default = 0.0 unless hash.default 200: sum = 0.0 201: self.each_byte do |x| 202: begin 203: sum += hash[x.chr] 204: end 205: end 206: return sum 207: end
This method steps through a sequences in steps of ‘step_size’ by subsequences of ‘window_size’. Typically used with a block. Any remaining sequence at the terminal end will be returned.
Prints average GC% on each 100bp
s.window_search(100) do |subseq| puts subseq.gc end
Prints every translated peptide (length 5aa) in the same frame
s.window_search(15, 3) do |subseq| puts subseq.translate end
Split genome sequence by 10000bp with 1000bp overlap in fasta format
i = 1 remainder = s.window_search(10000, 9000) do |subseq| puts subseq.to_fasta("segment #{i}", 60) i += 1 end puts remainder.to_fasta("segment #{i}", 60)
Arguments:
Returns: | new Bio::Sequence::NA/AA object |
# File lib/bio/sequence/common.rb, line 179 179: def window_search(window_size, step_size = 1) 180: last_step = 0 181: 0.step(self.length - window_size, step_size) do |i| 182: yield self[i, window_size] 183: last_step = i 184: end 185: return self[last_step + window_size .. -1] 186: end