Class Bio::GFF::GFF3::Record::Gap
In: lib/bio/db/gff.rb
Parent: Object

Bio:GFF::GFF3::Record::Gap is a class to store data of "Gap" attribute.

Methods

Classes and Modules

Class Bio::GFF::GFF3::Record::Gap::Code

Constants

Code = Struct.new(:code, :length)   Code is a class to store length of single-letter code.

Attributes

data  [R]  Internal data. Users must not use it.

Public Class methods

Creates a new Gap object.


Arguments:

  • str: a formatted string, or nil.

[Source]

      # File lib/bio/db/gff.rb, line 1246
1246:           def initialize(str = nil)
1247:             if str then
1248:               @data = str.split(/ +/).collect do |x|
1249:                 if /\A([A-Z])([0-9]+)\z/ =~ x.strip then
1250:                   Code.new($1.intern, $2.to_i)
1251:                 else
1252:                   warn "ignored unknown token: #{x}.inspect" if $VERBOSE
1253:                   nil
1254:                 end
1255:               end
1256:               @data.compact!
1257:             else
1258:               @data = []
1259:             end
1260:           end

Creates a new Gap object from given sequence alignment.

Note that sites of which both reference and target are gaps are silently removed.


Arguments:

  • reference: reference sequence (nucleotide sequence)
  • target: target sequence (nucleotide sequence)
  • gap_regexp: regexp to identify gap

[Source]

      # File lib/bio/db/gff.rb, line 1362
1362:           def self.new_from_sequences_na(reference, target,
1363:                                          gap_regexp = /[^a-zA-Z]/)
1364:             gap = self.new
1365:             gap.instance_eval { 
1366:               __initialize_from_sequences_na(reference, target,
1367:                                              gap_regexp)
1368:             }
1369:             gap
1370:           end

Creates a new Gap object from given sequence alignment.

Note that sites of which both reference and target are gaps are silently removed.

For incorrect alignments that break 3:1 rule, gap positions will be moved inside codons, unwanted gaps will be removed, and some forward or reverse frameshift will be inserted.

For example,

   atgg-taagac-att
   M  V  K  -  I

is treated as:

   atggt<aagacatt
   M  V  K  >>I

Incorrect combination of frameshift with frameshift or gap may cause undefined behavior.

Forward frameshifts are recomended to be indicated in the target sequence. Reverse frameshifts can be indicated in the reference sequence or the target sequence.

Priority of regular expressions:

  space > forward/reverse frameshift > gap

Arguments:

  • reference: reference sequence (nucleotide sequence)
  • target: target sequence (amino acid sequence)
  • gap_regexp: regexp to identify gap
  • space_regexp: regexp to identify space character which is completely ignored
  • forward_frameshift_regexp: regexp to identify forward frameshift
  • reverse_frameshift_regexp: regexp to identify reverse frameshift

[Source]

      # File lib/bio/db/gff.rb, line 1558
1558:           def self.new_from_sequences_na_aa(reference, target,
1559:                                             gap_regexp = /[^a-zA-Z]/,
1560:                                             space_regexp = /\s/,
1561:                                             forward_frameshift_regexp = /\>/,
1562:                                             reverse_frameshift_regexp = /\</)
1563:             gap = self.new
1564:             gap.instance_eval { 
1565:               __initialize_from_sequences_na_aa(reference, target,
1566:                                                 gap_regexp,
1567:                                                 space_regexp,
1568:                                                 forward_frameshift_regexp,
1569:                                                 reverse_frameshift_regexp)
1570:             }
1571:             gap
1572:           end

Same as new(str).

[Source]

      # File lib/bio/db/gff.rb, line 1263
1263:           def self.parse(str)
1264:             self.new(str)
1265:           end

Public Instance methods

If self == other, returns true. otherwise, returns false.

[Source]

      # File lib/bio/db/gff.rb, line 1586
1586:           def ==(other)
1587:             if other.class == self.class and
1588:                 @data == other.data then
1589:               true
1590:             else
1591:               false
1592:             end
1593:           end

Processes nucleotide sequences and returns gapped sequences as an array of sequences.

Note for forward/reverse frameshift: Forward/Reverse_frameshift is simply treated as gap insertion to the target/reference sequence.


Arguments:

  • reference: reference sequence (nucleotide sequence)
  • target: target sequence (nucleotide sequence)
  • gap_char: gap character

[Source]

      # File lib/bio/db/gff.rb, line 1686
1686:           def process_sequences_na(reference, target, gap_char = '-')
1687:             s_ref, s_tgt = dup_seqs(reference, target)
1688: 
1689:             s_ref, s_tgt = __process_sequences(s_ref, s_tgt,
1690:                                                gap_char, gap_char,
1691:                                                1, 1,
1692:                                                gap_char, gap_char)
1693: 
1694:             if $VERBOSE and s_ref.length != s_tgt.length then
1695:               warn "returned sequences not equal length"
1696:             end
1697:             return s_ref, s_tgt
1698:           end

Processes sequences and returns gapped sequences as an array of sequences. reference must be a nucleotide sequence, and target must be an amino acid sequence.

Note for reverse frameshift: Reverse_frameshift characers are inserted in the reference sequence. For example, alignment of "Gap=M3 R1 M2" is:

    atgaagat<aatgtc
    M  K  I  N  V

Alignment of "Gap=M3 R3 M3" is:

    atgaag<<<attaatgtc
    M  K  I  I  N  V

Arguments:

  • reference: reference sequence (nucleotide sequence)
  • target: target sequence (amino acid sequence)
  • gap_char: gap character
  • space_char: space character inserted to amino sequence for matching na-aa alignment
  • forward_frameshift: forward frameshift character
  • reverse_frameshift: reverse frameshift character

[Source]

      # File lib/bio/db/gff.rb, line 1723
1723:           def process_sequences_na_aa(reference, target,
1724:                                       gap_char = '-',
1725:                                       space_char = ' ',
1726:                                       forward_frameshift = '>',
1727:                                       reverse_frameshift = '<')
1728:             s_ref, s_tgt = dup_seqs(reference, target)
1729:             s_tgt = s_tgt.gsub(/./, "\\0#{space_char}#{space_char}")
1730:             ref_increment = 3
1731:             tgt_increment = 1 + space_char.length * 2
1732:             ref_gap = gap_char * 3
1733:             tgt_gap = "#{gap_char}#{space_char}#{space_char}"
1734:             return __process_sequences(s_ref, s_tgt,
1735:                                        ref_gap, tgt_gap,
1736:                                        ref_increment, tgt_increment,
1737:                                        forward_frameshift,
1738:                                        reverse_frameshift)
1739:           end

string representation

[Source]

      # File lib/bio/db/gff.rb, line 1575
1575:           def to_s
1576:             @data.collect { |x| x.to_s }.join(" ")
1577:           end

[Validate]