Module Bio::Alignment::EnumerableExtension
In: lib/bio/alignment.rb

The module Bio::Alignment::EnumerableExtension is a set of useful methods for multiple sequence alignment. It can be included by any classes or can be extended to any objects. The classes or objects must have methods defined in Enumerable, and must have the each method which iterates over each sequence (or string) and yields a sequence (or string) object.

Optionally, if each_seq method is defined, which iterates over each sequence (or string) and yields each sequence (or string) object, it is used instead of each.

Note that the each or each_seq method would be called multiple times. This means that the module is not suitable for IO objects. In addition, break would be used in the given block and destructive methods would be used to the sequences.

For Array or Hash objects, you‘d better using ArrayExtension or HashExtension modules, respectively. They would have built-in each_seq method and/or some methods would be redefined.

Methods

Included Modules

PropertyMethods Output

Public Instance methods

Iterates over each sequence and results running blocks are collected and returns a new alignment as a Bio::Alignment::SequenceArray object.

Note that it would be redefined if you want to change return value‘s class.

[Source]

     # File lib/bio/alignment.rb, line 445
445:       def alignment_collect
446:         a = SequenceArray.new
447:         a.set_all_property(get_all_property)
448:         each_seq do |str|
449:           a << yield(str)
450:         end
451:         a
452:       end

Concatenates the given alignment. align must have each_seq or each method.

Returns self.

Note that it is a destructive method.

For Hash, please use it carefully because the order of the sequences is inconstant and key information is completely ignored.

[Source]

     # File lib/bio/alignment.rb, line 849
849:       def alignment_concat(align)
850:         flag = nil
851:         a = []
852:         each_seq { |s| a << s }
853:         i = 0
854:         begin
855:           align.each_seq do |seq|
856:             flag = true
857:             a[i].concat(seq) if a[i] and seq
858:             i += 1
859:           end
860:           return self
861:         rescue NoMethodError, ArgumentError => evar
862:           raise evar if flag
863:         end
864:         align.each do |seq|
865:           a[i].concat(seq) if a[i] and seq
866:           i += 1
867:         end
868:         self
869:       end

Returns the alignment length. Returns the longest length of the sequence in the alignment.

[Source]

     # File lib/bio/alignment.rb, line 366
366:       def alignment_length
367:         maxlen = 0
368:         each_seq do |s|
369:           x = s.length
370:           maxlen = x if x > maxlen
371:         end
372:         maxlen
373:       end

Removes excess gaps in the head of the sequences. If removes nothing, returns nil. Otherwise, returns self.

Note that it is a destructive method.

[Source]

     # File lib/bio/alignment.rb, line 752
752:       def alignment_lstrip!
753:         #(String-like)
754:         pos = 0
755:         each_site do |a|
756:           a.remove_gaps!
757:           if a.empty?
758:             pos += 1
759:           else
760:             break
761:           end
762:         end
763:         return nil if pos <= 0
764:         each_seq { |s| s[0, pos] = '' }
765:         self
766:       end

Fills gaps to the tail of each sequence if the length of the sequence is shorter than the alignment length.

Note that it is a destructive method.

[Source]

     # File lib/bio/alignment.rb, line 712
712:       def alignment_normalize!
713:         #(original)
714:         len = alignment_length
715:         each_seq do |s|
716:           s << (gap_char * (len - s.length)) if s.length < len
717:         end
718:         self
719:       end

Removes excess gaps in the tail of the sequences. If removes nothing, returns nil. Otherwise, returns self.

Note that it is a destructive method.

[Source]

     # File lib/bio/alignment.rb, line 727
727:       def alignment_rstrip!
728:         #(String-like)
729:         len = alignment_length
730:         newlen = len
731:         each_site_step(len - 1, 0, -1) do |a|
732:           a.remove_gaps!
733:           if a.empty? then
734:             newlen -= 1
735:           else
736:             break
737:           end
738:         end
739:         return nil if newlen >= len
740:         each_seq do |s|
741:           s[newlen..-1] = '' if s.length > newlen
742:         end
743:         self
744:       end

Gets a site of the position. Returns a Bio::Alignment::Site object.

If the position is out of range, it returns the site of which all are gaps.

[Source]

     # File lib/bio/alignment.rb, line 403
403:       def alignment_site(position)
404:         site = _alignment_site(position)
405:         site.set_all_property(get_all_property)
406:         site
407:       end

Returns the specified range of the alignment. For each sequence, the ‘slice’ method (it may be String#slice, which is the same as String#[]) is executed, and returns a new alignment as a Bio::Alignment::SequenceArray object.

Unlike alignment_window method, the result alignment might contain nil.

If you want to change return value‘s class, you should redefine alignment_collect method.

[Source]

     # File lib/bio/alignment.rb, line 807
807:       def alignment_slice(*arg)
808:         #(String-like)
809:         #(BioPerl) AlignI::slice like method
810:         alignment_collect do |s|
811:           s.slice(*arg)
812:         end
813:       end

Removes excess gaps in the sequences. If removes nothing, returns nil. Otherwise, returns self.

Note that it is a destructive method.

[Source]

     # File lib/bio/alignment.rb, line 774
774:       def alignment_strip!
775:         #(String-like)
776:         r = alignment_rstrip!
777:         l = alignment_lstrip!
778:         (r or l)
779:       end

For each sequence, the ‘subseq’ method (Bio::Seqeunce::Common#subseq is expected) is executed, and returns a new alignment as a Bio::Alignment::SequenceArray object.

All sequences in the alignment are expected to be kind of Bio::Sequence::NA or Bio::Sequence::AA objects.

Unlike alignment_window method, the result alignment might contain nil.

If you want to change return value‘s class, you should redefine alignment_collect method.

[Source]

     # File lib/bio/alignment.rb, line 829
829:       def alignment_subseq(*arg)
830:         #(original)
831:         alignment_collect do |s|
832:           s.subseq(*arg)
833:         end
834:       end

Returns specified range of the alignment. For each sequence, the ’[]’ method (it may be String#[]) is executed, and returns a new alignment as a Bio::Alignment::SequenceArray object.

Unlike alignment_slice method, the result alignment are guaranteed to contain String object if the range specified is out of range.

If you want to change return value‘s class, you should redefine alignment_collect method.

[Source]

     # File lib/bio/alignment.rb, line 466
466:       def alignment_window(*arg)
467:         alignment_collect do |s|
468:           s[*arg] or seqclass.new('')
469:         end
470:       end

Iterates over each site of the alignment and results running the block are collected and returns an array. It yields a Bio::Alignment::Site object.

[Source]

     # File lib/bio/alignment.rb, line 503
503:       def collect_each_site
504:         ary = []
505:         each_site do |site|
506:           ary << yield(site)
507:         end
508:         ary
509:       end

Helper method for calculating consensus sequence. It iterates over each site of the alignment. In each site, gaps will be removed if specified with opt. It yields a Bio::Alignment::Site object. Results running the block (String objects are expected) are joined to a string and it returns the string.

 opt[:gap_mode] ==> 0 -- gaps are regarded as normal characters
                    1 -- a site within gaps is regarded as a gap
                   -1 -- gaps are eliminated from consensus calculation
     default: 0

[Source]

     # File lib/bio/alignment.rb, line 523
523:       def consensus_each_site(opt = {})
524:         mchar = (opt[:missing_char] or self.missing_char)
525:         gap_mode = opt[:gap_mode]
526:         case gap_mode
527:         when 0, nil
528:           collect_each_site do |a|
529:             yield(a) or mchar
530:           end.join('')
531:         when 1
532:           collect_each_site do |a|
533:             a.has_gap? ? gap_char : (yield(a) or mchar)
534:           end.join('')
535:         when -1
536:           collect_each_site do |a|
537:             a.remove_gaps!
538:             a.empty? ? gap_char : (yield(a) or mchar)
539:           end.join('')
540:         else
541:           raise ':gap_mode must be 0, 1 or -1'
542:         end
543:       end

Returns the IUPAC consensus string of the alignment of nucleic-acid sequences.

It resembles the BioPerl‘s AlignI::consensus_iupac method.

Please refer to the consensus_each_site method for opt.

[Source]

     # File lib/bio/alignment.rb, line 565
565:       def consensus_iupac(opt = {})
566:         consensus_each_site(opt) do |a|
567:           a.consensus_iupac
568:         end
569:       end

Returns the consensus string of the alignment. 0.0 <= threshold <= 1.0 is expected.

It resembles the BioPerl‘s AlignI::consensus_string method.

Please refer to the consensus_each_site method for opt.

[Source]

     # File lib/bio/alignment.rb, line 552
552:       def consensus_string(threshold = 1.0, opt = {})
553:         consensus_each_site(opt) do |a|
554:           a.consensus_string(threshold)
555:         end
556:       end

This is the BioPerl‘s AlignI::match like method.

Changes second to last sequences’ sites to match_char(default: ’.’) when a site is equeal to the first sequence‘s corresponding site.

Note that it is a destructive method.

For Hash, please use it carefully because the order of the sequences is inconstant.

[Source]

     # File lib/bio/alignment.rb, line 662
662:       def convert_match(match_char = '.')
663:         #(BioPerl) AlignI::match like method
664:         len = alignment_length
665:         firstseq = nil
666:         each_seq do |s|
667:           unless firstseq then
668:             firstseq = s
669:           else
670:             (0...len).each do |i|
671:               if s[i] and firstseq[i] == s[i] and !is_gap?(firstseq[i..i])
672:                 s[i..i] = match_char
673:               end
674:             end
675:           end
676:         end
677:         self
678:       end

This is the BioPerl‘s AlignI::unmatch like method.

Changes second to last sequences’ sites match_char(default: ’.’) to original sites’ characters.

Note that it is a destructive method.

For Hash, please use it carefully because the order of the sequences is inconstant.

[Source]

     # File lib/bio/alignment.rb, line 690
690:       def convert_unmatch(match_char = '.')
691:         #(BioPerl) AlignI::unmatch like method
692:         len = alignment_length
693:         firstseq = nil
694:         each_seq do |s|
695:           unless firstseq then
696:             firstseq = s
697:           else
698:             (0...len).each do |i|
699:               if s[i..i] == match_char then
700:                 s[i..i] = (firstseq[i..i] or match_char)
701:               end
702:             end
703:           end
704:         end
705:         self
706:       end

Iterates over each sequences. Yields a sequence. It acts the same as Enumerable#each.

You would redefine the method suitable for the class/object.

[Source]

     # File lib/bio/alignment.rb, line 340
340:       def each_seq(&block) #:yields: seq
341:         each(&block)
342:       end

Iterates over each site of the alignment. It yields a Bio::Alignment::Site object (which inherits Array). It returns self.

[Source]

     # File lib/bio/alignment.rb, line 412
412:       def each_site
413:         cp = get_all_property
414:         (0...alignment_length).each do |i|
415:           site = _alignment_site(i)
416:           site.set_all_property(cp)
417:           yield(site)
418:         end
419:         self
420:       end

Iterates over each site of the alignment, with specifying start, stop positions and step. It yields Bio::Alignment::Site object (which inherits Array). It returns self. It is same as start.step(stop, step) { |i| yield alignment_site(i) }.

[Source]

     # File lib/bio/alignment.rb, line 428
428:       def each_site_step(start, stop, step = 1)
429:         cp = get_all_property
430:         start.step(stop, step) do |i|
431:           site = _alignment_site(i)
432:           site.set_all_property(cp)
433:           yield(site)
434:         end
435:         self
436:       end

Iterates over each sliding window of the alignment. window_size is the size of sliding window. step is the step of each sliding. It yields a Bio::Alignment::SequenceArray object which contains each sliding window. It returns a Bio::Alignment::SequenceArray object which contains remainder alignment at the terminal end. If window_size is smaller than 0, it returns nil.

[Source]

     # File lib/bio/alignment.rb, line 481
481:       def each_window(window_size, step_size = 1)
482:         return nil if window_size < 0
483:         if step_size >= 0 then
484:           last_step = nil
485:           0.step(alignment_length - window_size, step_size) do |i|
486:             yield alignment_window(i, window_size)
487:             last_step = i
488:           end
489:           alignment_window((last_step + window_size)..-1)
490:         else
491:           i = alignment_length - window_size
492:           while i >= 0
493:             yield alignment_window(i, window_size)
494:             i += step_size
495:           end
496:           alignment_window(0...(i-step_size))
497:         end
498:       end
lstrip!()

Alias for alignment_lstrip!

Returns the match line stirng of the alignment of nucleic- or amino-acid sequences. The type of the sequence is automatically determined or you can specify with opt[:type].

It resembles the BioPerl‘s AlignI::match_line method.

  opt[:type] ==> :na or :aa (or determined by sequence class)
  opt[:match_line_char]   ==> 100% equal    default: '*'
  opt[:strong_match_char] ==> strong match  default: ':'
  opt[:weak_match_char]   ==> weak match    default: '.'
  opt[:mismatch_char]     ==> mismatch      default: ' '
    :strong_ and :weak_match_char are used only in amino mode (:aa)

More opt can be accepted. Please refer to the consensus_each_site method for opt.

[Source]

     # File lib/bio/alignment.rb, line 624
624:       def match_line(opt = {})
625:         case opt[:type]
626:         when :aa
627:           amino = true
628:         when :na, :dna, :rna
629:           amino = false
630:         else
631:           if seqclass == Bio::Sequence::AA then
632:             amino = true
633:           elsif seqclass == Bio::Sequence::NA then
634:             amino = false
635:           else
636:             amino = nil
637:             self.each_seq do |x|
638:               if /[EFILPQ]/i =~ x
639:                 amino = true
640:                 break
641:               end
642:             end
643:           end
644:         end
645:         if amino then
646:           match_line_amino(opt)
647:         else
648:           match_line_nuc(opt)
649:         end
650:       end

Returns the match line stirng of the alignment of amino-acid sequences.

It resembles the BioPerl‘s AlignI::match_line method.

  opt[:match_line_char]   ==> 100% equal    default: '*'
  opt[:strong_match_char] ==> strong match  default: ':'
  opt[:weak_match_char]   ==> weak match    default: '.'
  opt[:mismatch_char]     ==> mismatch      default: ' '

More opt can be accepted. Please refer to the consensus_each_site method for opt.

[Source]

     # File lib/bio/alignment.rb, line 584
584:       def match_line_amino(opt = {})
585:         collect_each_site do |a|
586:           a.match_line_amino(opt)
587:         end.join('')
588:       end

Returns the match line stirng of the alignment of nucleic-acid sequences.

It resembles the BioPerl‘s AlignI::match_line method.

  opt[:match_line_char]   ==> 100% equal    default: '*'
  opt[:mismatch_char]     ==> mismatch      default: ' '

More opt can be accepted. Please refer to the consensus_each_site method for opt.

[Source]

     # File lib/bio/alignment.rb, line 601
601:       def match_line_nuc(opt = {})
602:         collect_each_site do |a|
603:           a.match_line_nuc(opt)
604:         end.join('')
605:       end
normalize!()

Alias for alignment_normalize!

Returns number of sequences in this alignment.

[Source]

      # File lib/bio/alignment.rb, line 1315
1315:       def number_of_sequences
1316:         i = 0
1317:         self.each_seq { |s| i += 1 }
1318:         i
1319:       end

Completely removes ALL gaps in the sequences. If removes nothing, returns nil. Otherwise, returns self.

Note that it is a destructive method.

[Source]

     # File lib/bio/alignment.rb, line 787
787:       def remove_all_gaps!
788:         ret = nil
789:         each_seq do |s|
790:           x = s.gsub!(gap_regexp, '')
791:           ret ||= x
792:         end
793:         ret ? self : nil
794:       end
rstrip!()

Alias for alignment_rstrip!

seq_length()

Alias for alignment_length

Returns class of the sequence. If instance variable @seqclass (which can be set by ‘seqclass=’ method) is set, simply returns the value. Otherwise, returns the first sequence‘s class. If no sequences are found, returns nil.

[Source]

     # File lib/bio/alignment.rb, line 349
349:       def seqclass
350:         if (defined? @seqclass) and @seqclass then
351:           @seqclass
352:         else
353:           klass = nil
354:           each_seq do |s|
355:             if s then
356:               klass = s.class
357:               break if klass
358:             end
359:           end
360:           (klass or String)
361:         end
362:       end

Returns an array of sequence names. The order of the names must be the same as the order of each_seq.

[Source]

      # File lib/bio/alignment.rb, line 1324
1324:       def sequence_names
1325:         (0...(self.number_of_sequences)).to_a
1326:       end
slice(*arg)

Alias for alignment_slice

strip!()

Alias for alignment_strip!

subseq(*arg)

Alias for alignment_subseq

window(*arg)

Alias for alignment_window

[Validate]