XmlConfigFile Tutorial

by Maik Schmidt

Version 0.9.0

Introduction

A lot of modern software can be customized by configuration files and more and more applications use XML as the format for these configuration files. This makes sense because of at least the following reasons:

  • XML files can be edited easily and it will become even easier in the future.
  • XML provides everything you need in typical configuration files: hierarchical data, comments, ...
  • XML can be processed easily by a lot of modern tools and nearly all programming languages.
  • Usually, accessing a configuration file is not performance critical, because many configuration parameters are read only once.

So, creating and reading XML configuration files seems to be easy, but what about accessing the content of such a file? Many applications use "DOM tree traversing" or convert the XML document into a simpler internal structure (e.g. Hashes). Some advantages of XML get lost by doing so. If you want to change and write back a configuration, for example, you will have to write code that converts your internal structure back to XML.

But there is a better, easier, and standardized way for accessing elements in an XML document: XPath. With XmlConfigFile you can access configuration parameters via XPath expressions. This tutorial will show you how to do this.

Installation

You can download XmlConfigFile here. XmlConfigFile depends on REXML, so you will have to install it first. Then run

            ruby install.rb config
            ruby install.rb setup
            ruby install.rb install

A Simple Example

For our first example, we assume, that we have a configuration file called example.xml that looks like this:

            <!--
              A sample configuration file.
            -->

            <?xml version="1.0" encoding="iso-8859-1"?>

            <config>
              <version>1.7</version>
              <splash-screen enabled='yes' delay='5000' />
              <greeting lang="en">Hello, world!</greeting>
              <greeting lang="de">Hallo, Welt!</greeting>
              <base-dir>${BASEDIR}</base-dir>
              <db env="test">
                Standard connection.
                <name>addresses</name>
                <user role="admin">scott</user>
                <pwd>tiger</pwd>
                <host>example.com</host>
                <driver>
                  <vendor>MySql</vendor>
                  <version>3.23</version>
                </driver>
              </db>
              <db env="prod">
                <name>addresses</name>
                <user>production</user>
                <pwd>secret</pwd>
                <host>example.com</host>
                <driver>
                  <vendor>Oracle</vendor>
                  <version>8.1</version>
                </driver>
              </db>
            </config>

To load and parse this file, you have to do the following:

            require 'xmlconfigfile'
            config = XmlConfigFile.new('example.xml')

Now you can access all the configuration file's entries via XPath. To get the content of the version element as a String, simply call

            version = config.get_string('/config/version') # -> '1.7'
or even shorter
            version = config['/config/version'] # -> '1.7'

To get the version element as float value, call

            version = config.get_float('/config/version') # -> 1.7
This works similar for integer values:
            splash_delay = config.get_int('/config/splash-screen/@delay') # -> 5000
Of course, all Ruby literals for integer and float values (hex, octal, exponential notation, etc.) are supported.

Boolean values are a bit different. The following table shows, which values by default mean true respectively false in configuration files handled by XmlConfigFile:

true false
1 0
yes no
on off
true false
It doesn't matter, if they occur as element or as attribute values and both whitespace and case will be ignored.
            splash_enabled = config.get_boolean('/config/splash-screen/@enabled') # -> true

If you want to provide your own values for true and false, just do this:

            config.true_values  = ["HIja'", "HISlaH"]
            config.false_values = ["ghobe'"]
Now, the Klingon phrases HIja' and HISlaH mean true and ghobe' means false. Of course, whitespace and case will still be ignored. But keep in mind, please, that truth is always better than falseness, i.e., if true_values and false_values share common elements, the meaning of these elements is true.

Default Values

All the get methods described above allow you to provide a default value that will be returned in case the parameter requested does not exist:

            config['/config/version', '1.0'] # -> '1.7'
            config['/unknown/parameter', 'frodo'] # -> 'frodo'

            config.get_int('/unknown/parameter', 42) # -> 42
If you do not provide a default value, nil will be returned:
            config['/unknown/parameter'] # -> nil

Handling Sets of Configuration Parameters

Configuration files do often contain different variants of configuration parameters, for example for different countries, for different languages, or for different environments. With XPath, it's simple to keep them all in a single configuration file. If you want to get the german version of our friendly greeting element, just call

            greeting = config["/config/greeting[@lang='de']"] # -> 'Hallo, Welt!'

To get the name of your production database user, call

            user = config["/config/db[@env='prod']/user"] # -> 'production'

You will often need a bunch of related configuration parameters at the same time. XmlConfigFile offers different ways to achieve this. The simplest way is to use a so called path prefix, that will be put in front of each XPath, i.e., instead of calling

            name = config["/config/db[@env='prod']/name"] # -> 'addresses'
            user = config["/config/db[@env='prod']/user"] # -> 'production'
            pwd  = config["/config/db[@env='prod']/pwd"]  # -> 'secret'
you could use the slightly simpler version
            config.path_prefix = "/config/db[@env='prod']/"
            name = config["name"] # -> 'addresses'
            user = config["user"] # -> 'production'
            pwd  = config["pwd"]  # -> 'secret'

But what, if you need a set of parameters as a whole? Therefore the get_parameters method does exist. It converts a node list into a Hash. The keys of this Hash are the paths to the single elements, where the tag names are separated by the '.' character by default. The root element (config in our case) will be excluded.

So, to get all database configuration parameters for your test environment, you have to call

            dbParams = config.get_parameters("/config/db[@env='test']/*")

The resulting Hash looks like this:

            dbParams = {
              db.driver.vendor => 'MySql',
              db.driver.version => '3.23',
              db.host => 'example.com',
              db.pwd => 'tiger',
              db.user => 'scott',
              db.name => 'addresses'
            }

If you want to expand attributes, too, you have to do the following:

            config.expand_attributes = true
Now dbParams = config.get_parameters("/config/db[@env='test']/*") returns:
            dbParams = {
              db.driver.vendor => 'MySql',
              db.driver.version => '3.23',
              db.host => 'example.com',
              db.pwd => 'tiger',
              db.user => 'scott',
              db.user.role => 'admin',     # Yes! Attributes will be returned, too!
              db.name => 'addresses'
            }

By default, the single path elements will be separated by a '.' character. If you want to, you can specify an arbitrary string as path separator:

            dbParams = config.get_parameters("/config/db[@env='test']/*", "-silly-")

This will result in the following Hash:

            dbParams = {
              db-silly-driver-silly-vendor => 'MySql',
              db-silly-driver-silly-version => '3.23',
              db-silly-host => 'example.com',
              db-silly-pwd => 'tiger',
              db-silly-user => 'scott',
              db-silly-user-silly-role => 'admin',
              db-silly-name => 'addresses'
            }

To convert your whole configuration file into a Hash call:

            hashConfig = config.get_parameters('//*')
But be careful: In the example configuration file above, many elements have the same 'path name', e.g. there are two elements that will be converted to 'db.user'. Only the last entry will survive! Also note, that it is not possible to access "orphaned" text nodes, e.g. the text 'Standard connection.' in the db element of our example configuration file will be ignored.

If you really need access to a bunch of elements sharing the same name, you should try the following:

            dbParams = config.get_string_array("/config/db")
The result looks like this:
            dbParams = [
              { 
                'db.name' => 'addresses',
                'db.user' => 'scott',
                'db.pwd' => 'tiger',
                'db.host' => 'example.com',
                'db.driver.vendor' => 'MySql',
                'db.driver.version' => '3.23'
              },
              {
                'db.name' => 'addresses',
                'db.user' => 'production',
                'db.pwd' => 'secret',
                'db.host' => 'example.com',
                'db.driver.vendor' => 'Oracle',
                'db.driver.version' => '8.1'
              }
            ]

Advanced Features

In addition to the features described in the last section, you will find some advanced features, like referencing environment variables from your configuration files or an automatic reloading and observing mechanism.

Using Environment Variables

It is often useful to combine the usage of configuration files and environment variables. XmlConfigFile makes this task easy: You can put references to environment variables into your configuration files and they will get expanded to their actual values as the file is loaded. The syntax for such references is

            ${Name of environment variable}

So, if you have an environment variable, that specifies a base directory as defined in our example configuration file, you can use it like this:

            baseDir = config['/config/base-dir'] # -> Current value of $BASEDIR

Reloading the Configuration Periodically

It is often convenient, if you can reconfigure a running system without stopping and restarting it. XmlConfigFile supports such a mechanism. Simply provide the length of the reload period (measured in seconds) while creating a new object of class XmlConfigFile:

            config = XmlConfigFile.new('example.xml', 300)
The configuration file 'example.xml' will be checked for changes every five minutes now. If the file's modification timestamp has changed, it will be reloaded automatically. If the modified file is invalid or does not exist any longer, the last working version will be used and an error message will be sent to $stderr.

Every time a configuration file is reloaded, references to environment variables will be replaced by their actual values. Please note, that a configuration file only will be reloaded, if the modification timestamp of the file was changed. So, if you only change the environment, nothing will happen until you touch the file.

If you no longer need the reloading thread, you should be a good citizen and call config.close.

Observing a Configuration File

In many cases it will not be sufficient to reload a configuration file periodically. Probably some parts of your application need to be informed immediately about such changes. E.g. you have to re-initialize a database connection, if the according configuration parameters have changed. Therefore it is possible to add so called observers to an instance of class XmlConfigFile:

            class Orwell
              def initialize(config)
                config.add_observer(self)
                @config = config
              end

              def update(*args)
                filename = args[0]
                puts "Configuration file #{filename} has changed."
                sample = @config['/just/another/xpath']
              end
            end

            config = XmlConfigFile.new('config.xml', 300)
            orwell = Orwell.new(config)
In the example above, the configuration file 'config.xml' will be checked for changes every five minutes. If something has changed, the instance of class Orwell will be informed, i.e., its update method will be called with the configuration file's name. So, the two things you have to do to become an observer are:
  1. Call add_observer on the instance of XmlConfigFile you want to observe.
  2. Implement the method update in your observing class.

The observers of a configuration file will also be notified, if the configuration was changed using the set_parameter method described in the next section.

Changing and Storing a Configuration File

The ability to change and store a configuration file turns class XmlConfigFile into a preferences package. Currently it is only possible to change existing parameters, but future versions of XmlConfigFile will provide the possibility to create new nodes automatically, if it is necessary. To change an existing attribute or element, you have to do the following:

            config.set_parameter('/config/splash-screen/@enabled', 'no')
The statement above will set the enabled attribute of element splash-screen to false, i.e. it disables the splash screen. Of course, it would be very useful to make this user decision persistent:
            config.store
This will overwrite the original configuration file with the current configuration held in memory. If you want to store the current configuration into another file, you have to provide a filename:
            config.store('another_file.xml')

Further Reading

If you are interested in absolute truth, you will have to look at the source code or the API docs.

Acknowledgements

A big "Thank you!" (in no particular order) goes to

  • Yukihiro Matsumoto for Ruby.
  • Frank Tewissen for the Java™ reference implementation.
  • Sean Russell for REXML.
  • Dave Thomas for Rdoc.
  • Nathaniel Talbott for Test::Unit.
  • Minero Aoki for his setup package.
  • Sandra Silcot for tests and bug fixes.
  • Nigel Ball for contributing code.

Contact

If you have any suggestions or want to report bugs, please contact me (contact@maik-schmidt.de).


Copyright © 2003 by Maik Schmidt.