Class Bio::FlatFileIndex
In: lib/bio/io/flatfile/bdb.rb
lib/bio/io/flatfile/index.rb
lib/bio/io/flatfile/indexer.rb
Parent: Object

Bio::FlatFileIndex is a class for OBDA flatfile index.

Methods

Classes and Modules

Module Bio::FlatFileIndex::BDB_1
Module Bio::FlatFileIndex::BDBdefault
Module Bio::FlatFileIndex::DEBUG
Module Bio::FlatFileIndex::Flat_1
Module Bio::FlatFileIndex::Indexer
Module Bio::FlatFileIndex::Template
Class Bio::FlatFileIndex::BDBwrapper
Class Bio::FlatFileIndex::DataBank
Class Bio::FlatFileIndex::FileID
Class Bio::FlatFileIndex::FileIDs
Class Bio::FlatFileIndex::NameSpaces
Class Bio::FlatFileIndex::Results

Constants

MAGIC_FLAT = 'flat/1'   magic string for flat/1 index
MAGIC_BDB = 'BerkeleyDB/1'   magic string for BerkeleyDB/1 index

Public Class methods

[Source]

     # File lib/bio/io/flatfile/indexer.rb, line 734
734:     def self.formatstring2class(format_string)
735:       case format_string
736:       when /genbank/i
737:         dbclass = Bio::GenBank
738:       when /genpept/i
739:         dbclass = Bio::GenPept
740:       when /embl/i
741:         dbclass = Bio::EMBL
742:       when /sptr/i
743:         dbclass = Bio::SPTR
744:       when /fasta/i
745:         dbclass = Bio::FastaFormat
746:       else
747:         raise "Unsupported format : #{format}"
748:       end
749:     end

[Source]

     # File lib/bio/io/flatfile/indexer.rb, line 751
751:     def self.makeindex(is_bdb, dbname, format, options, *files)
752:       if format then
753:         dbclass = formatstring2class(format)
754:       else
755:         dbclass = Bio::FlatFile.autodetect_file(files[0])
756:         raise "Cannot determine format" unless dbclass
757:         DEBUG.print "file format is #{dbclass}\n"
758:       end
759: 
760:       options = {} unless options
761:       pns = options['primary_namespace']
762:       sns = options['secondary_namespaces']
763: 
764:       parser = Indexer::Parser.new(dbclass, pns, sns)
765: 
766:       #if /(EMBL|SPTR)/ =~ dbclass.to_s then
767:         #a = [ 'DR' ]
768:         #parser.add_secondary_namespaces(*a)
769:       #end
770:       if sns = options['additional_secondary_namespaces'] then
771:         parser.add_secondary_namespaces(*sns)
772:       end
773: 
774:       if is_bdb then
775:         Indexer::makeindexBDB(dbname, parser, options, *files)
776:       else
777:         Indexer::makeindexFlat(dbname, parser, options, *files)
778:       end
779:     end

Opens existing databank. Databank is a directory which contains indexed files and configuration files. The type of the databank (flat or BerkeleyDB) are determined automatically.

Unlike +FlatFileIndex.open+, block is not allowed.

[Source]

     # File lib/bio/io/flatfile/index.rb, line 113
113:     def initialize(name)
114:       @db = DataBank.open(name)
115:     end

Opens existing databank. Databank is a directory which contains indexed files and configuration files. The type of the databank (flat or BerkeleyDB) are determined automatically.

If block is given, the databank object is passed to the block. The databank will be automatically closed when the block terminates.

[Source]

     # File lib/bio/io/flatfile/index.rb, line 88
 88:     def self.open(name)
 89:       if block_given? then
 90:         begin
 91:           i = self.new(name)
 92:           r = yield i
 93:         ensure
 94:           if i then
 95:             begin
 96:               i.close
 97:             rescue IOError
 98:             end
 99:           end
100:         end
101:       else
102:         r = self.new(name)
103:       end
104:       r
105:     end

[Source]

     # File lib/bio/io/flatfile/indexer.rb, line 781
781:     def self.update_index(dbname, format, options, *files)
782:       if format then
783:         parser = Indexer::Parser.new(dbclass)
784:       else
785:         parser = nil
786:       end
787:       Indexer::update_index(dbname, parser, options, *files)
788:     end

Public Instance methods

If true, consistency checks will be performed every time accessing flatfiles. If nil/false, no checks are performed.

By default, always_check_consistency is true.

[Source]

     # File lib/bio/io/flatfile/index.rb, line 297
297:     def always_check_consistency(bool)
298:       @db.always_check
299:     end

If true is given, consistency checks will be performed every time accessing flatfiles. If nil/false, no checks are performed.

By default, always_check_consistency is true.

[Source]

     # File lib/bio/io/flatfile/index.rb, line 288
288:     def always_check_consistency=(bool)
289:       @db.always_check=(bool)
290:     end

Check consistency between the databank(index) and original flat files.

If the original flat files are changed after creating the databank, raises RuntimeError.

Note that this check only compares file sizes as described in the OBDA specification.

[Source]

     # File lib/bio/io/flatfile/index.rb, line 278
278:     def check_consistency
279:       check_closed?
280:       @db.check_consistency
281:     end

Closes the databank. Returns nil.

[Source]

     # File lib/bio/io/flatfile/index.rb, line 132
132:     def close
133:       check_closed?
134:       @db.close
135:       @db = nil
136:     end

Returns true if already closed. Otherwise, returns false.

[Source]

     # File lib/bio/io/flatfile/index.rb, line 139
139:     def closed?
140:       if @db then
141:         false
142:       else
143:         true
144:       end
145:     end

Returns default namespaces. Returns an array of strings or nil. nil means all namespaces.

[Source]

     # File lib/bio/io/flatfile/index.rb, line 172
172:     def default_namespaces
173:       @names
174:     end

Set default namespaces. default_namespaces = nil means all namespaces in the databank.

default_namespaces= [ str1, str2, … ] means set default namespeces to str1, str2, …

Default namespaces specified in this method only affect get_by_id, search, and include? methods.

Default of default namespaces is nil (that is, all namespaces are search destinations by default).

[Source]

     # File lib/bio/io/flatfile/index.rb, line 160
160:     def default_namespaces=(names)
161:       if names then
162:         @names = []
163:         names.each { |x| @names.push(x.dup) }
164:       else
165:         @names = nil
166:       end
167:     end

common interface defined in registry.rb Searching databank and returns entry (or entries) as a string. Multiple entries (contatinated to one string) may be returned. Returns empty string if not found.

[Source]

     # File lib/bio/io/flatfile/index.rb, line 122
122:     def get_by_id(key)
123:       search(key).to_s
124:     end

Searching databank. If some entries are found, returns an array of unique IDs (primary identifiers). If not found anything, returns nil.

This method is useful when search result is very large and search method is very slow.

[Source]

     # File lib/bio/io/flatfile/index.rb, line 210
210:     def include?(key)
211:       check_closed?
212:       if @names then
213:         r = @db.search_namespaces_get_unique_id(key, *@names)
214:       else
215:         r = @db.search_all_get_unique_id(key)
216:       end
217:       if r.empty? then
218:         nil
219:       else
220:         r
221:       end
222:     end

Same as include?, but serching only specified namespaces.

[Source]

     # File lib/bio/io/flatfile/index.rb, line 226
226:     def include_in_namespaces?(key, *names)
227:       check_closed?
228:       r = @db.search_namespaces_get_unique_id(key, *names)
229:       if r.empty? then
230:         nil
231:       else
232:         r
233:       end
234:     end

Same as include?, but serching only primary namespace.

[Source]

     # File lib/bio/io/flatfile/index.rb, line 238
238:     def include_in_primary?(key)
239:       check_closed?
240:       r = @db.search_primary_get_unique_id(key)
241:       if r.empty? then
242:         nil
243:       else
244:         r
245:       end
246:     end

Returns names of namespaces defined in the databank. (example: [ ‘LOCUS’, ‘ACCESSION’, ‘VERSION’ ] )

[Source]

     # File lib/bio/io/flatfile/index.rb, line 251
251:     def namespaces
252:       check_closed?
253:       r = secondary_namespaces
254:       r.unshift primary_namespace
255:       r
256:     end

Returns name of primary namespace as a string.

[Source]

     # File lib/bio/io/flatfile/index.rb, line 259
259:     def primary_namespace
260:       check_closed?
261:       @db.primary.name
262:     end

Searching databank and returns a Bio::FlatFileIndex::Results object.

[Source]

     # File lib/bio/io/flatfile/index.rb, line 177
177:     def search(key)
178:       check_closed?
179:       if @names then
180:         @db.search_namespaces(key, *@names)
181:       else
182:         @db.search_all(key)
183:       end
184:     end

Searching only specified namespeces. Returns a Bio::FlatFileIndex::Results object.

[Source]

     # File lib/bio/io/flatfile/index.rb, line 189
189:     def search_namespaces(key, *names)
190:       check_closed?
191:       @db.search_namespaces(key, *names)
192:     end

Searching only primary namespece. Returns a Bio::FlatFileIndex::Results object.

[Source]

     # File lib/bio/io/flatfile/index.rb, line 197
197:     def search_primary(key)
198:       check_closed?
199:       @db.search_primary(key)
200:     end

Returns names of secondary namespaces as an array of strings.

[Source]

     # File lib/bio/io/flatfile/index.rb, line 265
265:     def secondary_namespaces
266:       check_closed?
267:       @db.secondary.names
268:     end

[Validate]