xper_doc
returns an entity object ('XPER entity') created from an XML document
xper_doc
(in document varchar,
[in parser_mode integer],
[in base_uri varchar],
[in content_encoding varchar],
[in content_language varchar],
[in dtd_validator_config varchar],
[in index_attrs integer]);
Description
This parses the argument, which is expected to be a well formed XML
fragment and returns a parse tree as a special object with underlying disk structure, named
"persistent XML" or "XPER"
While the result of xml_tree is a memory-resident array of vectors,
the XPER object consumes only a little amount of memory, and almost all data are disk-resident.
XPERs are better then "XML trees" for large documents and
for "write once -- read many" stores such as a table with
one XML document per row used as a "library" of documents.
To be saved in a LONG VARCHAR column, "XML tree" entity will be
converted back to plain text of XML syntax; but "XPER" entity
will be saved as a ready-to-use disk structure.
Parameters
document –
well formed XML or HTML document
parser_mode –
0, 1 or 2; 0 - XML parser mode, 1 - HTML parser mode, 2 - 'dirty HTML'
mode (with quiet recovery after any syntax error)
base_uri –
in HTML parser mode change all absolute references to relative from
given base_uri (http://<host>:<port>/<path>)
content_encoding –
string with content encoding type of <document>; valid are
'ASCII', 'ISO', 'UTF8', 'ISO8859-1', 'LATIN-1' etc., defaults are 'UTF-8' for
XML mode and 'LATIN-1' for HTML mode.
content_language –
string with language tag of content of <document>; valid names
are listed in IETF RFC 1766, default is 'x-any' (it means 'mix of words from various
human languages)
index_attrs –
1 or 0, indicating if additional free-text indexing information must be
stored for all attributes of the document. It is 1 by default. If set to '0', it will
produce a disk structure compatible with old versions of Virtuoso and will give a small
benefit in disk usage but it will disable some important optimizations in free-text
search operations.
Return Types
XML entity with underlaying
parse tree of source document; the tree will be a special sort of BLOB.
Examples
Xper_Doc
declare tree any;
tree := xper_doc (file_to_string ('doc.html'), 1,
'http://localhost.localdomain/', 'ISO');
...
tree := xper_doc (file_to_string ('doc.xml'));
...
-- String cannot be longer than 10 megabytes. String session can.
tree := xper_doc (file_to_string_session ('huge_doc.xml'));
...
-- A special way to read local files.
-- Strings started from characters 'file://'
-- are treated as local filesystem URIs.
tree := xper_doc ('file://doc.xml');