Class Bio::Sequence::NA
In: lib/bio/sequence/compat.rb
lib/bio/sequence/na.rb
Parent: Object

DESCRIPTION

Bio::Sequence::NA represents a bare Nucleic Acid sequence in bioruby.

USAGE

  # Create a Nucleic Acid sequence.
  dna = Bio::Sequence.auto('atgcatgcATGCATGCAAAA')
  rna = Bio::Sequence.auto('augcaugcaugcaugcaaaa')

  # What are the names of all the bases?
  puts dna.names
  puts rna.names

  # What is the GC percentage?
  puts dna.gc_percent
  puts rna.gc_percent

  # What is the molecular weight?
  puts dna.molecular_weight
  puts rna.molecular_weight

  # What is the reverse complement?
  puts dna.reverse_complement
  puts dna.complement

  # Is this sequence DNA or RNA?
  puts dna.rna?

  # Translate my sequence (see method docs for many options)
  puts dna.translate
  puts rna.translate

Methods

Included Modules

Bio::Sequence::Common

Public Class methods

Generate an nucleic acid sequence object from a string.

  s = Bio::Sequence::NA.new("aagcttggaccgttgaagt")

or maybe (if you have an nucleic acid sequence in a file)

  s = Bio::Sequence:NA.new(File.open('dna.txt').read)

Nucleic Acid sequences are always all lowercase in bioruby

  s = Bio::Sequence::NA.new("AAGcTtGG")
  puts s                                  #=> "aagcttgg"

Whitespace is stripped from the sequence

  seq = Bio::Sequence::NA.new("atg\nggg\ttt\r  gc")
  puts s                                  #=> "atggggttgc"

Arguments:

Returns:Bio::Sequence::NA object

[Source]

    # File lib/bio/sequence/na.rb, line 77
77:   def initialize(str)
78:     super
79:     self.downcase!
80:     self.tr!(" \t\n\r",'')
81:   end

Generate a new random sequence with the given frequency of bases. The sequence length is determined by their cumulative sum. (See also Bio::Sequence::Common#randomize which creates a new randomized sequence object using the base composition of an existing sequence instance).

  counts = {'a'=>1,'c'=>2,'g'=>3,'t'=>4}
  puts Bio::Sequence::NA.randomize(counts)  #=> "ggcttgttac" (for example)

You may also feed the output of randomize into a block

  actual_counts = {'a'=>0, 'c'=>0, 'g'=>0, 't'=>0}
  Bio::Sequence::NA.randomize(counts) {|x| actual_counts[x] += 1}
  actual_counts                     #=> {"a"=>1, "c"=>2, "g"=>3, "t"=>4}

Arguments:

  • (optional) hash: Hash object
Returns:Bio::Sequence::NA object

[Source]

    # File lib/bio/sequence/compat.rb, line 87
87:   def self.randomize(*arg, &block)
88:     self.new('').randomize(*arg, &block)
89:   end

Public Instance methods

Calculate the ratio of AT / ATGC bases. U is regarded as T.

  s = Bio::Sequence::NA.new('atggcgtga')
  puts s.at_content                       #=> 0.444444444444444

Returns:Float

[Source]

     # File lib/bio/sequence/na.rb, line 319
319:   def at_content
320:     count = self.composition
321:     at = count['a'] + count['t'] + count['u']
322:     gc = count['g'] + count['c']
323:     return 0.0 if at + gc == 0
324:     return at.quo(at + gc)
325:   end

Calculate the ratio of (A - T) / (A + T) bases. U is regarded as T.

  s = Bio::Sequence::NA.new('atgttgttgttc')
  puts s.at_skew                          #=> -0.75

Returns:Float

[Source]

     # File lib/bio/sequence/na.rb, line 347
347:   def at_skew
348:     count = self.composition
349:     a = count['a']
350:     t = count['t'] + count['u']
351:     return 0.0 if a + t == 0
352:     return (a - t).quo(a + t)
353:   end

Returns counts of each codon in the sequence in a hash.

  s = Bio::Sequence::NA.new('atggcgtga')
  puts s.codon_usage                #=> {"gcg"=>1, "tga"=>1, "atg"=>1}

This method does not validate codons! Any three letter group is a ‘codon’. So,

  s = Bio::Sequence::NA.new('atggNNtga')
  puts s.codon_usage                #=> {"tga"=>1, "gnn"=>1, "atg"=>1}

  seq = Bio::Sequence::NA.new('atgg--tga')
  puts s.codon_usage                #=> {"tga"=>1, "g--"=>1, "atg"=>1}

Also, there is no option to work in any frame other than the first.


Returns:Hash object

[Source]

     # File lib/bio/sequence/na.rb, line 275
275:   def codon_usage
276:     hash = Hash.new(0)
277:     self.window_search(3, 3) do |codon|
278:       hash[codon] += 1
279:     end
280:     return hash
281:   end
complement()

Alias for reverse_complement

complement!()

Alias for reverse_complement!

Example:

  seq = Bio::Sequence::NA.new('gaattc')
  cuts = seq.cut_with_enzyme('EcoRI')

or

  seq = Bio::Sequence::NA.new('gaattc')
  cuts = seq.cut_with_enzyme('g^aattc')

See Bio::RestrictionEnzyme::Analysis.cut

[Source]

     # File lib/bio/sequence/na.rb, line 481
481:   def cut_with_enzyme(*args)
482:     Bio::RestrictionEnzyme::Analysis.cut(self, *args)
483:   end
cut_with_enzymes(*args)

Alias for cut_with_enzyme

Returns a new sequence object with any ‘u’ bases changed to ‘t’. The original sequence is not modified.

  s = Bio::Sequence::NA.new('augc')
  puts s.dna                              #=> 'atgc'
  puts s                                  #=> 'augc'

Returns:new Bio::Sequence::NA object

[Source]

     # File lib/bio/sequence/na.rb, line 425
425:   def dna
426:     self.tr('u', 't')
427:   end

Changes any ‘u’ bases in the original sequence to ‘t’. The original sequence is modified.

  s = Bio::Sequence::NA.new('augc')
  puts s.dna!                             #=> 'atgc'
  puts s                                  #=> 'atgc'

Returns:current Bio::Sequence::NA object (modified)

[Source]

     # File lib/bio/sequence/na.rb, line 437
437:   def dna!
438:     self.tr!('u', 't')
439:   end

Returns a new complementary sequence object (without reversing). The original sequence object is not modified.

  s = Bio::Sequence::NA.new('atgc')
  puts s.forward_complement               #=> 'tacg'
  puts s                                  #=> 'atgc'

Returns:new Bio::Sequence::NA object

[Source]

     # File lib/bio/sequence/na.rb, line 102
102:   def forward_complement
103:     s = self.class.new(self)
104:     s.forward_complement!
105:     s
106:   end

Converts the current sequence into its complement (without reversing). The original sequence object is modified.

  seq = Bio::Sequence::NA.new('atgc')
  puts s.forward_complement!              #=> 'tacg'
  puts s                                  #=> 'tacg'

Returns:current Bio::Sequence::NA object (modified)

[Source]

     # File lib/bio/sequence/na.rb, line 116
116:   def forward_complement!
117:     if self.rna?
118:       self.tr!('augcrymkdhvbswn', 'uacgyrkmhdbvswn')
119:     else
120:       self.tr!('atgcrymkdhvbswn', 'tacgyrkmhdbvswn')
121:     end
122:     self
123:   end

Calculate the ratio of GC / ATGC bases. U is regarded as T.

  s = Bio::Sequence::NA.new('atggcgtga')
  puts s.gc_content                       #=> 0.555555555555556

Returns:Float

[Source]

     # File lib/bio/sequence/na.rb, line 305
305:   def gc_content
306:     count = self.composition
307:     at = count['a'] + count['t'] + count['u']
308:     gc = count['g'] + count['c']
309:     return 0.0 if at + gc == 0
310:     return gc.quo(at + gc)
311:   end

Calculate the ratio of GC / ATGC bases as a percentage rounded to the nearest whole number. U is regarded as T.

  s = Bio::Sequence::NA.new('atggcgtga')
  puts s.gc_percent                       #=> 55

Returns:Fixnum

[Source]

     # File lib/bio/sequence/na.rb, line 290
290:   def gc_percent
291:     count = self.composition
292:     at = count['a'] + count['t'] + count['u']
293:     gc = count['g'] + count['c']
294:     return 0 if at + gc == 0
295:     gc = 100 * gc / (at + gc)
296:     return gc
297:   end

Calculate the ratio of (G - C) / (G + C) bases.

  s = Bio::Sequence::NA.new('atggcgtga')
  puts s.gc_skew                          #=> 0.6

Returns:Float

[Source]

     # File lib/bio/sequence/na.rb, line 333
333:   def gc_skew
334:     count = self.composition
335:     g = count['g']
336:     c = count['c']
337:     return 0.0 if g + c == 0
338:     return (g - c).quo(g + c)
339:   end

Returns an alphabetically sorted array of any non-standard bases (other than ‘atgcu’).

  s = Bio::Sequence::NA.new('atgStgQccR')
  puts s.illegal_bases                    #=> ["q", "r", "s"]

Returns:Array object

[Source]

     # File lib/bio/sequence/na.rb, line 362
362:   def illegal_bases
363:     self.scan(/[^atgcu]/).sort.uniq
364:   end

Estimate molecular weight (using the values from BioPerl‘s SeqStats.pm module).

  s = Bio::Sequence::NA.new('atggcgtga')
  puts s.molecular_weight                 #=> 2841.00708

RNA and DNA do not have the same molecular weights,

  s = Bio::Sequence::NA.new('auggcguga')
  puts s.molecular_weight                 #=> 2956.94708

Returns:Float object

[Source]

     # File lib/bio/sequence/na.rb, line 378
378:   def molecular_weight
379:     if self.rna?
380:       Bio::NucleicAcid.weight(self, true)
381:     else
382:       Bio::NucleicAcid.weight(self)
383:     end
384:   end

Generate the list of the names of each nucleotide along with the sequence (full name). Names used in bioruby are found in the Bio::AminoAcid::NAMES hash.

  s = Bio::Sequence::NA.new('atg')
  puts s.names                    #=> ["Adenine", "Thymine", "Guanine"]

Returns:Array object

[Source]

     # File lib/bio/sequence/na.rb, line 409
409:   def names
410:     array = []
411:     self.each_byte do |x|
412:       array.push(Bio::NucleicAcid.names[x.chr.upcase])
413:     end
414:     return array
415:   end

Returns a new sequence object with the reverse complement sequence to the original. The original sequence is not modified.

  s = Bio::Sequence::NA.new('atgc')
  puts s.reverse_complement               #=> 'gcat'
  puts s                                  #=> 'atgc'

Returns:new Bio::Sequence::NA object

[Source]

     # File lib/bio/sequence/na.rb, line 133
133:   def reverse_complement
134:     s = self.class.new(self)
135:     s.reverse_complement!
136:     s
137:   end

Converts the original sequence into its reverse complement. The original sequence is modified.

  s = Bio::Sequence::NA.new('atgc')
  puts s.reverse_complement               #=> 'gcat'
  puts s                                  #=> 'gcat'

Returns:current Bio::Sequence::NA object (modified)

[Source]

     # File lib/bio/sequence/na.rb, line 147
147:   def reverse_complement!
148:     self.reverse!
149:     self.forward_complement!
150:   end

Returns a new sequence object with any ‘t’ bases changed to ‘u’. The original sequence is not modified.

  s = Bio::Sequence::NA.new('atgc')
  puts s.dna                              #=> 'augc'
  puts s                                  #=> 'atgc'

Returns:new Bio::Sequence::NA object

[Source]

     # File lib/bio/sequence/na.rb, line 449
449:   def rna
450:     self.tr('t', 'u')
451:   end

Changes any ‘t’ bases in the original sequence to ‘u’. The original sequence is modified.

  s = Bio::Sequence::NA.new('atgc')
  puts s.dna!                             #=> 'augc'
  puts s                                  #=> 'augc'

Returns:current Bio::Sequence::NA object (modified)

[Source]

     # File lib/bio/sequence/na.rb, line 461
461:   def rna!
462:     self.tr!('t', 'u')
463:   end

Create a ruby regular expression instance (Regexp)

  s = Bio::Sequence::NA.new('atggcgtga')
  puts s.to_re                            #=> /atggcgtga/

Returns:Regexp object

[Source]

     # File lib/bio/sequence/na.rb, line 393
393:   def to_re
394:     if self.rna?
395:       Bio::NucleicAcid.to_re(self.dna, true)
396:     else
397:       Bio::NucleicAcid.to_re(self)
398:     end
399:   end

Translate into an amino acid sequence.

  s = Bio::Sequence::NA.new('atggcgtga')
  puts s.translate                        #=> "MA*"

By default, translate starts in reading frame position 1, but you can start in either 2 or 3 as well,

  puts s.translate(2)                     #=> "WR"
  puts s.translate(3)                     #=> "GV"

You may also translate the reverse complement in one step by using frame values of -1, -2, and -3 (or 4, 5, and 6)

  puts s.translate(-1)                    #=> "SRH"
  puts s.translate(4)                     #=> "SRH"
  puts s.reverse_complement.translate(1)  #=> "SRH"

The default codon table in the translate function is the Standard Eukaryotic codon table. The translate function takes either a number or a Bio::CodonTable object for its table argument. The available tables are (NCBI):

  1. "Standard (Eukaryote)"
  2. "Vertebrate Mitochondrial"
  3. "Yeast Mitochondorial"
  4. "Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma"
  5. "Invertebrate Mitochondrial"
  6. "Ciliate Macronuclear and Dasycladacean"
  9. "Echinoderm Mitochondrial"
  10. "Euplotid Nuclear"
  11. "Bacteria"
  12. "Alternative Yeast Nuclear"
  13. "Ascidian Mitochondrial"
  14. "Flatworm Mitochondrial"
  15. "Blepharisma Macronuclear"
  16. "Chlorophycean Mitochondrial"
  21. "Trematode Mitochondrial"
  22. "Scenedesmus obliquus mitochondrial"
  23. "Thraustochytrium Mitochondrial"

If you are using anything other than the default table, you must specify frame in the translate method call,

  puts s.translate                #=> "MA*"  (using defaults)
  puts s.translate(1,1)           #=> "MA*"  (same as above, but explicit)
  puts s.translate(1,2)           #=> "MAW"  (different codon table)

and using a Bio::CodonTable instance in the translate method call,

  mt_table = Bio::CodonTable[2]
  puts s.translate(1, mt_table)           #=> "MAW"

By default, any invalid or unknown codons (as could happen if the sequence contains ambiguities) will be represented by ‘X’ in the translated sequence. You may change this to any character of your choice.

  s = Bio::Sequence::NA.new('atgcNNtga')
  puts s.translate                        #=> "MX*"
  puts s.translate(1,1,'9')               #=> "M9*"

The translate method considers gaps to be unknown characters and treats them as such (i.e. does not collapse sequences prior to translation), so

  s = Bio::Sequence::NA.new('atgc--tga')
  puts s.translate                        #=> "MX*"

Arguments:

  • (optional) frame: one of 1,2,3,4,5,6,-1,-2,-3 (default 1)
  • (optional) table: Fixnum in range 1,23 or Bio::CodonTable object (default 1)
  • (optional) unknown: Character (default ‘X’)
Returns:Bio::Sequence::AA object

[Source]

     # File lib/bio/sequence/na.rb, line 234
234:   def translate(frame = 1, table = 1, unknown = 'X')
235:     if table.is_a?(Bio::CodonTable)
236:       ct = table
237:     else
238:       ct = Bio::CodonTable[table]
239:     end
240:     naseq = self.dna
241:     case frame
242:     when 1, 2, 3
243:       from = frame - 1
244:     when 4, 5, 6
245:       from = frame - 4
246:       naseq.complement!
247:     when -1, -2, -3
248:       from = -1 - frame
249:       naseq.complement!
250:     else
251:       from = 0
252:     end
253:     nalen = naseq.length - from
254:     nalen -= nalen % 3
255:     aaseq = naseq[from, nalen].gsub(/.{3}/) {|codon| ct[codon] or unknown}
256:     return Bio::Sequence::AA.new(aaseq)
257:   end

Protected Instance methods

[Source]

     # File lib/bio/sequence/na.rb, line 465
465:   def rna?
466:     self.index('u')
467:   end

[Validate]