Class Bio::NCBI::REST
In: lib/bio/io/ncbirest.rb
Parent: Object

Description

The Bio::NCBI::REST class provides REST client for the NCBI E-Utilities

Entrez utilities index:

Methods

Classes and Modules

Class Bio::NCBI::REST::EFetch
Class Bio::NCBI::REST::ESearch

Constants

NCBI_INTERVAL = 1.0 / 3.0   Run retrieval scripts on weekends or between 9 pm and 5 am Eastern Time weekdays for any series of more than 100 requests. -> Not implemented yet in BioRuby

Wait for 1/3 seconds. NCBI‘s restriction is: "Make no more than 3 requests every 1 second.".

Public Class methods

[Source]

     # File lib/bio/io/ncbirest.rb, line 351
351:   def self.efetch(*args)
352:     self.new.efetch(*args)
353:   end

[Source]

     # File lib/bio/io/ncbirest.rb, line 339
339:   def self.einfo
340:     self.new.einfo
341:   end

[Source]

     # File lib/bio/io/ncbirest.rb, line 343
343:   def self.esearch(*args)
344:     self.new.esearch(*args)
345:   end

[Source]

     # File lib/bio/io/ncbirest.rb, line 347
347:   def self.esearch_count(*args)
348:     self.new.esearch_count(*args)
349:   end

Public Instance methods

Retrieve database entries by given IDs and using E-Utils (efetch) service.

For information on the possible arguments, see

Usage

 ncbi = Bio::NCBI::REST.new
 ncbi.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"})
 ncbi.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb", "retmode"=>"xml"})
 ncbi.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"})

 Bio::NCBI::REST.efetch("185041", {"db"=>"nucleotide", "rettype"=>"gb", "retmode" => "xml"})
 Bio::NCBI::REST.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb"})
 Bio::NCBI::REST.efetch("AAA52805", {"db"=>"protein", "rettype"=>"gb"})

Arguments:

  • ids: list of NCBI entry IDs (required)
  • hash: hash of E-Utils option {"db" => "nuccore", "rettype" => "gb"}
    • db: "sequences", "nucleotide", "protein", "pubmed", "omim", …
    • retmode: "text", "xml", "html", …
    • rettype: "gb", "gbc", "medline", "count",…
  • step: maximum number of entries retrieved at a time
Returns:String

[Source]

     # File lib/bio/io/ncbirest.rb, line 315
315:   def efetch(ids, hash = {}, step = 100)
316:     serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
317:     opts = default_parameters.merge({ "retmode"  => "text" })
318:     opts.update(hash)
319: 
320:     case ids
321:     when Array
322:       list = ids
323:     else
324:       list = ids.to_s.split(/\s*,\s*/)
325:     end
326: 
327:     result = ""
328:     0.step(list.size, step) do |i|
329:       opts["id"] = list[i, step].join(',')
330:       unless opts["id"].empty?
331:         response = ncbi_post_form(serv, opts)
332:         result += response.body
333:       end
334:     end
335:     return result.strip
336:     #return result.strip.split(/\n\n+/)
337:   end

List the NCBI database names E-Utils (einfo) service

 pubmed protein nucleotide nuccore nucgss nucest structure genome
 books cancerchromosomes cdd gap domains gene genomeprj gensat geo
 gds homologene journals mesh ncbisearch nlmcatalog omia omim pmc
 popset probe proteinclusters pcassay pccompound pcsubstance snp
 taxonomy toolkit unigene unists

Usage

 ncbi = Bio::NCBI::REST.new
 ncbi.einfo

 Bio::NCBI::REST.einfo

Returns:array of string (database names)

[Source]

     # File lib/bio/io/ncbirest.rb, line 179
179:   def einfo
180:     serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi"
181:     opts = default_parameters.merge({})
182:     response = ncbi_post_form(serv, opts)
183:     result = response.body
184:     list = result.scan(/<DbName>(.*?)<\/DbName>/m).flatten
185:     return list
186:   end

Search the NCBI database by given keywords using E-Utils (esearch) service and returns an array of entry IDs.

For information on the possible arguments, see

Usage

 ncbi = Bio::NCBI::REST.new
 ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"})
 ncbi.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"})
 ncbi.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5})

 Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"count"})
 Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nucleotide", "rettype"=>"gb"})
 Bio::NCBI::REST.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmax"=>5})

Arguments:

  • str: query string (required)
  • hash: hash of E-Utils option {"db" => "nuccore", "rettype" => "gb"}
    • db: "sequences", "nucleotide", "protein", "pubmed", "taxonomy", …
    • retmode: "text", "xml", "html", …
    • rettype: "gb", "medline", "count", …
    • retmax: integer (default 100)
    • retstart: integer
    • field:
      • "titl": Title [TI]
      • "tiab": Title/Abstract [TIAB]
      • "word": Text words [TW]
      • "auth": Author [AU]
      • "affl": Affiliation [AD]
      • "jour": Journal [TA]
      • "vol": Volume [VI]
      • "iss": Issue [IP]
      • "page": First page [PG]
      • "pdat": Publication date [DP]
      • "ptyp": Publication type [PT]
      • "lang": Language [LA]
      • "mesh": MeSH term [MH]
      • "majr": MeSH major topic [MAJR]
      • "subh": Mesh sub headings [SH]
      • "mhda": MeSH date [MHDA]
      • "ecno": EC/RN Number [rn]
      • "si": Secondary source ID [SI]
      • "uid": PubMed ID (PMID) [UI]
      • "fltr": Filter [FILTER] [SB]
      • "subs": Subset [SB]
    • reldate: 365
    • mindate: 2001
    • maxdate: 2002/01/01
    • datetype: "edat"
  • limit: maximum number of entries to be returned (0 for unlimited; nil for the "retmax" value in the hash or the internal default value (=100))
  • step: maximum number of entries retrieved at a time
Returns:array of entry IDs or a number of results

[Source]

     # File lib/bio/io/ncbirest.rb, line 246
246:   def esearch(str, hash = {}, limit = nil, step = 10000)
247:     serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
248:     opts = default_parameters.merge({ "term" => str })
249:     opts.update(hash)
250: 
251:     case opts["rettype"]
252:     when "count"
253:       count = esearch_count(str, opts)
254:       return count
255:     else
256:       retstart = 0
257:       retstart = hash["retstart"].to_i if hash["retstart"]
258: 
259:       limit ||= hash["retmax"].to_i if hash["retmax"]
260:       limit ||= 100 # default limit is 100
261:       limit = esearch_count(str, opts) if limit == 0   # unlimit
262: 
263:       list = []
264:       0.step(limit, step) do |i|
265:         retmax = [step, limit - i].min
266:         opts.update("retmax" => retmax, "retstart" => i + retstart)
267:         response = ncbi_post_form(serv, opts)
268:         result = response.body
269:         list += result.scan(/<Id>(.*?)<\/Id>/m).flatten
270:       end
271:       return list
272:     end
273:   end
Arguments:same as esearch method
Returns:array of entry IDs or a number of results

[Source]

     # File lib/bio/io/ncbirest.rb, line 277
277:   def esearch_count(str, hash = {})
278:     serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
279:     opts = default_parameters.merge({ "term" => str })
280:     opts.update(hash)
281:     opts.update("rettype" => "count")
282:     response = ncbi_post_form(serv, opts)
283:     result = response.body
284:     count = result.scan(/<Count>(.*?)<\/Count>/m).flatten.first.to_i
285:     return count
286:   end

[Validate]