Author: | Kirill Simonov |
---|---|
Contact: | xi@resolvent.net |
Web site: | http://pyyaml.org/wiki/PySyck |
YAML is a data serialization format designed for human readability and interaction with scripting languages.
Syck is an extension for reading and writing YAML in scripting languages. Syck provides bindings to the Python programming language, but they are somewhat limited and leak memory.
PySyck is aimed to update the current Python bindings for Syck. The new bindings provide a wrapper for the Syck emitter and give access to YAML representation graphs.
PySyck may be used for various tasks, in particular, as a replacement of the module pickle.
PySyck requires Syck 0.55 or higher and Python 2.3 or higher.
Please note that Syck 0.55 or higher must be installed. We recommend to use Syck from the Syck SVN repository together with my Syck patches. For your convenience, a tarball is provided: http://pyyaml.org/download/pysyck/syck-0.61+svn232+patches.tar.gz.
If you install PySyck from source, unpack the source tarball and type:
$ python setup.py install
Windows binaries for Python 2.3 and 2.4 are provided. Windows binaries are linked against Syck statically.
The documentation is still rough and incomplete. See the source code for more information.
>>> from syck import * >>> print load(""" ... - foo ... - bar ... - baz ... """) ['foo', 'bar', 'baz'] >>> print dump(['foo', 'bar', 'baz']) --- - foo - bar - baz
Important notice: Do not load a YAML stream from any untrusted source. Like pickle.load, syck.load may call an arbitrary Python function.
We do not describe the YAML syntax here. Please check http://yaml.org/ for the reference.
In addition to the tags defined in the YAML types repository, PySyck understands the following Python-specific tags:
Most of these tags are self-explanatory. The tags !python/name:..., !python/object:..., !python/new:..., and !python/apply:... are used for constructing Python functions, classes, and objects. See the sections Use Python-specific tags in YAML documents and Use Python-specific tags to construct Python objects for some examples.
>>> source = "..." >>> object = load(source)
>>> source = file(..., 'r') >>> object = load(source)
>>> object = ... >>> document = dump(object)
>>> object = ... >>> output = file(..., 'w') >>> dump(object, output)
>>> object = ... >>> output = file(..., 'w') >>> dump(object, output, ... headless=False, use_header=False, use_version=False, ... explicit_typing=True, style=None, best_width=80, indent=2)
>>> source = ... >>> objects = load_documents(source) >>> for object in objects: ... # ...
>>> objects = [...] >>> output = file(..., 'w') >>> dump_documents(objects, output)
>>> source = ... >>> root_node = parse(source)
>>> scalar_node = Scalar('...', tag='tag:...', ... style='...', indent=.., width=..) >>> sequence_node = Seq(list_of_nodes, tag='tag:...', inline=..) >>> mapping_node = Map(dictionary_of_nodes, tag='tag:...', inline=..) >>> root_node = ... >>> output = file(..., 'w') >>> emit(root_node, output)
>>> object = ... >>> stream = ... >>> dump(object, stream) >>> stream = ... >>> object = load(stream)
>>> object = ... >>> print dump(object)
>>> source = ... >>> node = parse(source) >>> print dump(node)
--- %YAML:1.0 - !python/none '' # You may also use '!null'. - !python/bool 'False' # You may also use '!bool'. - !python/int '123' # You may also use '!int'. - !python/long '1234567890' - !python/float '123.456789' # Also '!float'. - !python/str 'a string' # Also '!str'. - !python/unicode 'a unicode string encoded in utf-8' - !python/list [1, 2, 3] # The same as '!seq' or no tag. - !python/tuple [1, 2, 3] - !python/dict { 1: foo, 2: bar } # The same as '!map' or no tag.
--- %YAML:1.0 - !python/name:package.module.function_name '' - !python/name:package.module.class_name ''
--- %YAML:1.0 - !python/object:package.module.type attribute1: value1 attribute2: value2 # ... - !python/new:package.module.type - parameter1 - parameter2 # ... - !python/new:package.module.type args: [parameter1, parameter2, ...] kwds: {kwd1: val1, kwd2: val2, ...} state: {attr1: val1, attr2: val2, ...} # ... - !python/apply:package.module.function - parameter1 - parameter2 # ... - !python/apply:package.module.function args: [parameter1, parameter2, ...] kwds: {kwd1: val1, kwd2: val2, ...} state: {attr1: val1, attr2: val2, ...} # ...
>>> class MyClass: ... # ... >>> class MyLoader(Loader): ... def construct_private_my_tag(self, node): ... # ... ... return MyClass(...) >>> class MyDumper(Dumper): ... def represent_MyDumper(self, object): ... # ... ... return Map(...) >>> source = """--- !!my_tag { ... }""" >>> my_instance = load(source, Loader=MyLoader) >>> my_instance = MyClass(...) >>> output = dump(my_instance, Dumper=MyDumper)
load(source, Loader=Loader, **parameters)
The function load() returns a Python object corresponding to the first document in the source. If the source is empty, load() returns None. source must be a string or a file-like object that has the method read(max_length).
By default, the function load() uses an instance of the class Loader for parsing. You may use another class or pass additional parameters to the class constructor. See the section Parser for more details.
Example:
>>> load(""" ... - foo ... - bar ... - baz ... """) ['foo', 'bar', 'baz']
parse(source, Loader=Loader, **parameters)
The function parse() parses the source and returns a representation tree of the first document. source must be a string or a file-like object that has the method read(max_length).
By default, the function parse() uses an instance of the class Loader for parsing. You may use another class or pass additional parameters to the class constructor. See the section Parser for more details.
Example:
>>> parse(""" ... - foo ... - bar ... - baz ... """) <_syck.Seq object at 0xb7a3f2fc>
load_documents(source, Loader=Loader, **parameters)
The function load_documents() parses the source and an iterator. The iterator produces Python objects corresponding the documents of the source stream. source must be a string or a file-like object that has the method read(max_length).
By default, the function load_documents() uses an instance of the class Loader for parsing. You may use another class or pass additional parameters to the class constructor. See the section Parser for more details.
Example:
>>> source = """ ... --- > ... This is the ... first document. ... --- > ... This is the ... next document. ... --- > ... This is the ... last document. ... """ >>> for object in load_documents(source): print object ... This is the first document. This is the next document. This is the last document.
parse_documents(source, Loader=Loader, **parameters)
The function parse_documents() is similar to load_documents(), but produces representation graphs for all documents in the source.
dump(object, output=None, Dumper=Dumper, **parameters)
The function dump() converts object to a representation graph and write it to output. output must be None or a file-like object that has the method write(data). If output is None, dump() returns the generated document.
By default, the function dump() uses an instance of the class Dumper for emitting. You may use another class or pass additional parameters to the class constructor. See the section Emitter for more details.
Example:
>>> object = ['foo', 'bar', ['baz']] >>> dump(object, sys.stdout) --- - foo - bar - - baz >>> print dump(object) --- - foo - bar - - baz >>> print dump(object, use_version=True, indent=5) --- %YAML:1.0 - foo - bar - - baz
emit(node, output=None, Dumper=Dumper, **parameters)
The function emit() write the representation graph to the output stream. output must be None or a file-like object that has the method write(data). If output is None, emit() returns the generated document.
By default, the function emit() uses an instance of the class Dumper for emitting. You may use another class or pass additional parameters to the class constructor. See the section Emitter for more details.
Example:
>>> foo = Scalar('a string') >>> bar = Scalar('a unicode string', tag="tag:python.yaml.org,2002:unicode") >>> baz = Scalar('12345', tag="tag:yaml.org,2002:int") >>> seq = Seq([foo, bar, baz], tag="tag:python.taml.org,2002:tuple") >>> print emit(seq, use_version=True) --- %YAML:1.0 !python.taml.org,2002/tuple - a string - !python/unicode a unicode string - 12345
dump_documents(objects, output=None, Dumper=Dumper, **parameters)
The function dump_documents() takes a list of objects and converts each object to a YAML document. If output is None, it returns the produced documents. Otherwise it writes down them to output, which must be a file-like object with the method write(data).
By default, the function dump_documents() uses an instance of the class Dumper for emitting. You may use another class or pass additional parameters to the class constructor. See the section Emitter for more details.
Example:
>>> print dump_documents(['foo', 'bar', 'baz']) --- foo --- bar --- baz
emit_documents(nodes, output=None, Dumper=Dumper, **parameters)
The function emit_documents() is similar to dump_documents(), but it requires a list of representation graphs.
This exception is raised by the Syck parser when it detects a syntax error.
The attribute args of the exception is a triple: message, row, column.
Example:
>>> load("""--- ... - foo ... - ''' ... - bar ... """) Traceback (most recent call last): File "<stdin>", line 1, in ? File "build/lib.linux-i686-2.3/syck/loaders.py", line 384, in load File "build/lib.linux-i686-2.3/syck/loaders.py", line 42, in load _syck.error: ('syntax error', 4, 2)
The following four classes represents nodes in the YAML representation graph:
All instances of Scalar, Seq, and Map have the following attributes:
Scalar instances have additional attributes:
Seq and Map instances have an additional attribute:
For example, let us create a representation graph and transform it into a YAML stream:
>>> # Create three scalar nodes: >>> foo = Scalar('foo', tag="tag:example.com,2005:foo", style='fold', ... indent=5) >>> bar = Scalar('bar', style='1quote') >>> baz = Scalar('baz') >>> # Create a sequence node: >>> seq = Seq([foo, bar, baz], tag="x-private:seq") >>> # Emit it into a YAML stream: >>> print emit(seq) --- !!seq - !example.com,2005/foo >- foo - 'bar' - baz
Now let us construct a representation graph from a YAML document:
>>> # The function 'parse' generates a representation graph: >>> root = parse(""" ... - foo ... - bar ... - baz ... """) >>> # The object 'root' is a sequence node: >>> root <_syck.Seq object at 0xb7e124b4> >>> # We can transform 'root' back into a YAML stream: >>> print emit(root) --- - foo - bar - baz >>> # We can also display the structure of the representation tree using a >>> # clever trick: >>> print dump(root) --- !python/object:_syck.Seq value: - !python/object:_syck.Scalar value: foo tag: tag:yaml.org,2002:str - !python/object:_syck.Scalar value: bar tag: tag:yaml.org,2002:str - !python/object:_syck.Scalar value: baz tag: tag:yaml.org,2002:str
The class Parser is a low-level wrapper of a Syck YAML parser. It can generate a representation graph from a YAML stream.
The class constructor has the following arguments:
The parameter source is a YAML stream. It must be a string or a file-like object. If it is not a string, it should have a method named read(max_length) that returns a string.
It is not recommended to change the default values of the parameters implicit_typing and taguri_expansion. See the Syck documentation for more details about them.
The class defines a single method:
It parses the source and returns the root node of the corresponding representation graph. If the stream is finished, it returns None and set the flag eof on.
The subclass GenericLoader defines two additional methods:
The method load() parses the source and constructs the corresponding Python object. To generate an object by a node, load() uses the construct() method. The construct() method defined in GenericLoader just returns the value of the node: a string, a list, or a dictionary.
Loader : subclass of GenericLoader
Loader redefines the method
- Loader.construct(node),
defines an additional method:
- Loader.find_constructor(node),
and add many other auxiliary methods for constructing Python objects.
Loader.construct() calls find_constructor() for the given node, and uses the returned constructor to generate a Python object.
Loader.find_constructor() determines the constructor of a node by the following rules:
- If the node tag has the form tag:yaml.org,2002:type_id, returns the method Loader.construct_type_id.
- If the node tag has the form tag:python.yaml.org,2002:type_id, returns the method Loader.construct_python_type_id.
- If the node tag has the form x-private:type_id, returns Loader.construct_private_type_id.
- If the node tag has the form tag:domain.tld,year:type_id, returns Loader.construct_domain_tld_year_type_id.
See the source for more details.
Let us show how Parser, GenericLoader, and Loader parse the same document:
>>> # The source stream includes PySyck specific tags '!python/tuple' >>> # and '!python/unicode'. It also includes implicitly typed integer >>> # '12345' >>> source = """--- !python/tuple ... - a string ... - !python/unicode a unicode string ... - 12345 ... """ >>> # 'Parser.parse()' returns the root node of the representation tree: >>> p = Parser(source) >>> print p.parse() <_syck.Seq object at 0xb7a33f54> >>> # 'GenericLoader.load()' returns a Python object, but ignores the tags: >>> gl = GenericLoader(source) >>> print gl.load() ['a string', 'a unicode string', '12345'] >>> # 'Loader.load()' is aware of the tags: >>> l = Loader(source) >>> print l.load() ('a string', u'a unicode string', 12345)
The class Emitter is a low-level wrapper of a Syck YAML emitter. It can generate a YAML stream from a representation graph.
The class constructor has the following signature:
The parameter output must be a file-like object that provides a method write(data). The other parameters describe the formatting of the output document.
The class defines a single method:
The parameter node must be the root node of a YAML representation graph. The method emit() writes the generated YAML document to the output stream.
The subclass GenericDumper adds the following methods:
The method dump() converts the given object into a representation graph, generates a YAML document, and writes it to the output stream. It uses the method represent() to convert an object to a representation node. The method represent() defined in GenericDumper generates a sequence node for a list object and a mapping node for a dictionary object. Otherwise it generates a scalar node with the value equal to str(object).
The Syck YAML emitter automatically detects if the same object is reffered from different parts of the graph and generates aliases for it. Unfortunately it does not work well with immutable Python objects such as strings, numbers, and tuples. To prevent generating unnecessary aliases, the method allow_aliases() is used. If allow_aliases() for a given object returns False, the alias will never be generated.
The allow_aliases() method defined in GenericDumper always returns True.
The subclass Dumpers redefines the methods:
defines the method
and add many other auxiliary methods for representing objects as nodes.
Dumper.find_representer() finds a method that can represent the given object as a node in a representation tree. find_representer() checks the class of the object. If the class has the form package.module.type, find_representer() returns the method Dumper.represent_package_module_type if it exists. If this method does not exists, find_representer() consults its base class, and so on.
Dumper.represent() calls Dumper.find_representer() for the given object and uses the returned method to generate a representation node.
See the source for more details.
Let us show how Emitter, GenericDumper, and Dumper work:
>>> # For our demonstration, we define a representation tree named 'seq' >>> # and a Python tuple named 'object': >>> foo = Scalar('a string') >>> bar = Scalar('a unicode string', tag="tag:python.yaml.org,2002:unicode") >>> baz = Scalar('12345', tag="tag:yaml.org,2002:int") >>> seq = Seq([foo, bar, baz], tag="tag:python.taml.org,2002:tuple") >>> object = ('a string', u'a unicode string', 12345) >>> # An 'Emitter' instance can dump a representation tree into a stream, >>> # but obviously failed to dump a Python object: >>> e = Emitter(sys.stdout) >>> e.emit(seq) --- !python.taml.org,2002/tuple - a string - !python/unicode a unicode string - 12345 >>> e.emit(object) Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: Node instance is required >>> # A 'GenericDumper' instance dumps almost everything as a scalar: >>> gd = GenericDumper(sys.stdout) >>> gd.dump(seq) --- <_syck.Seq object at 0xb7a3c2fc> >>> gd.dump(object) --- ('a string', u'a unicode string', 12345) >>> # Finally, a 'Dumper' instance dumps a representation tree as a complex >>> # Python object: >>> d = Dumper(sys.stdout) >>> d.dump(seq) --- !python/object:_syck.Seq value: - !python/object:_syck.Scalar value: a string - !python/object:_syck.Scalar value: a unicode string tag: tag:python.yaml.org,2002:unicode - !python/object:_syck.Scalar value: "12345" tag: tag:yaml.org,2002:int tag: tag:python.taml.org,2002:tuple >>> # It also dumps the 'object' object as expected: >>> d.dump(object) --- !python/tuple - a string - !python/unicode a unicode string - 12345
You may check out the PySyck source code from PySyck SVN repository.
If you find a bug in PySyck, please file a bug report to PySyck BTS. You may review open bugs on the list of active tickets.
You may use YAML-core mailing list for discussions of PySyck.
PySyck does not support Unicode for real. It is a Syck limitation.
The PySyck module was written by Kirill Simonov.