English

Unofficial OpenOffice Hacker's guide for 2.0

This guide is becoming obsolete - see the wiki for better maintained information

Building and hacking on OpenOffice.org (OO.o) entails climbing a fairly lengthy incline. Hopefully this document will make the learning curve somewhat steeper and more abrupt, and will give you a walking stick to help you out. Older hackers guides targetting the 1.1 tree are available (also in German & Japanese).

This document assumes that you'll be using a reasonably current Linux system, as a time saving feature. Real hackers use Free software, and don't have time to read about non-Free stuff.

We aim to answer at least the following questions:

If you need help getting OO.o build, and you intend to hack on it, please join the mailing list here and ask questions there.

0. Contents

1. Getting OO.o

There are loads of versions of OO.o, and several choices of branch, with multiple outstanding patch sets. I recommend you build from up-stream CVS HEAD milestones (SRC680 milestones), with patch sets to make them easier to build from here.

The very latest ooo-build (a small ~1.5Mb build wrapper) can be got from CVS thus:

    export CVSROOT=':pserver:anonymous@anoncvs.gnome.org:/cvs/gnome'
    cvs login
    cvs -z3 checkout -P ooo-build
    

Note: You are going to need to download an additional ~170Mb of compressed source, and have ~3Gb of space space to unpack and build it in.

2. Building OO.o

2.1. configure

The build process is pretty complicated; you have a choice of commands now; although running both won't actually hurt:

      ./autogen.sh # only for the CVS version
      ./configure  # the packaged version
    

This will guess which branch snapshot you want to build; if you have other ideas use the --with-tag option; eg. --with-tag=src680-m65 for a legacy branch.

If for some reason you have a 31337 multi-threaded computer, with great slabs of RAM; you'll want to use --with-num-cpus=8 etc. NB. it's not clever to force the build to swap like a demented pawnbroker by using an artificially high number; C++ compilation is seriously memory hungry.

In particular, building SRC680 requires a recent jdk & a version of apache-ant. If you use a Novell system, just do: sudo rug in apache-ant, alternatively download a package from rpmfind.net, or failing that see Ant download & set the ANT environment variable appropriately before configuring.

2.2 download

By the time you've upgraded your system to the point that it has all the packages you need to start building OO.o (mozilla, recent libart etc. etc.) you're almost at the point that you can download the bulk of the source. To do this, after a successful configure simply type: ./download and wait.

If for whatever reason this fails, you can verify your download by fetching the equivalent .md5 file & comparing it to the result of md5sum <archive>. The source archives are here - put the source in ooo-build/src.

make

This is the taxing bit - type make and don't forget to press enter. Quite possibly you want to log the output, so why not make 2>&1 | tee /tmp/log.

Since ooo-build wraps the actual OO.o configuration & build process, there are a number of internal config checks that also need to pass. For a first time build it's well worth staying near the console while everything unpacks, and the internal configure runs; if that completes without incident - you're usually into the heavy-duty thumb twiddling.

3. Installing OO.o

When everything has finished building; you should get some happy looking message. The easiest way to install is: bin/ooinstall -l <path-to-install-to> I often use /opt/OOInstall

If you are a packager, you'll want to run make install which honours DESTDIR & does other packager-like things.

Note: The '-l' to ooinstall runs a linkoo on the installed result.

4. Running OO.o

Now wander into /opt/OOInstall/program and do: source ./ooenv this will setup your (bash) shell for running OO.o directly. Then simply ./soffice.bin -writer. This is better than running soffice, or a wrapper script since it's very easy to use the debugger: gdb soffice.bin.

Note: ooenv was formerly known as env. It was renamed not to conflict with /usr/bin/env.

5. Hacking OO.o

5.0. My first hack

So - we've built and run OO.o, and we want to prove to ourselves that it is in fact possible to hack on it. So in a new terminal do this:

      cd build/src680-m66
      . ./LinuxIntelEnv.Set.sh
      cd vcl
    

Now have a hack at vcl/source/window/toolbox2.cxx; I suggest adding (eg.) an nPos = 0 anywhere before the m_aItems.insert in the 4rd InsertItem method: void ToolBox::InsertItem( USHORT nItemId, const XubString& rText, ToolBoxItemBits nBits, USHORT nPos ). Then save.

You're still in vcl/ yes ? then type 'build debug=true'; wait for the scrolling text to stop; (5 seconds?). Now re-run soffice -writer. You should notice the effect. If not, ensure the previous soffice.bin was dead with killall -9 soffice.bin

You can find more things to hack in the tutorials.

Note: for day to day hacking you want to just run 'build' inside the source tree. It is also highly recommended to work inside a copy of the build tree, and generate / test patches in an un-hacked version. To copy just the build/src680-m66 directory elsewhere, you need to use the relocate tool.

5.1. Read the Fine manual

With the power of C++ comes the ability to shoot yourself in the foot all the more easily; (and implicitly), cf. Holub, Rules for C and C++ programming, McGraw-Hill, 95.

The best way to prepare yourself for battle is to read the OpenOffice coding guidelines here, and for the easily confused c'tor / d'tor is short for constructor / destructor.

5.2. Sending patches

It is seldom clear which module a patch resides in in bugzilla. A quick way to try and work this out is to do: cvs status <somefile> | head This should give a 'Repository Revision:' line, with a path, the 2nd fragment of this is the project name, ooo-build/bin/owner automates that process for you.

In addition, since the mapping of module names to IssueZilla tickets is rather contorted & un-documented, if you know what module the bug is in, use this page to file it.

5.3. Starting the right app

As you start soffice.bin, there are several useful parameters to use to accelerate your debugging experience; particularly -writer, -calc, -draw, and (the wizardly painful) -impress arguments.

5.4. Understanding D' make (man)

While the build system is in similar to may other systems, it is also perhaps slightly different. The overview is that each module is built, and then the results are delivered into the solver. Each module builds against the headers in the solver. Thus there are a few intricacies.

5.4.1 Standard directories

There are various standard directories and files in most of the modules that make up OO.o, here are some of the more useful:

Build's mode of operation is to invoke 'dmake' in each of the projects' directories with a given dependency order. dmake then executes the rules in makefile.mk.

5.4.2 build.lst

On first view build.lst looks scary:

vc      vcl : NAS:nas FREETYPE:freetype psprint rsc sot ucbhelper unotools sysui NULL
vc      vcl                      usr1   -       all     vc_mkout NULL
vc      vcl\source\unotypes      nmake  -       all     vc_unot NULL
vc      vcl\source\glyphs        nmake  -       all     vc_glyphs vc_unot NULL
      
so we need to try and un-pack what's going on here, which is in fact not as odd as it might seem at first glance. Firstly lists are terminated by the 'NULL' string. Every line is prefixed by a shortcut which is irrelevant.

So we see in the vcl case that vcl\source\unotypes (vc_unot) has to be built before vcl\source\glyphs (vc_glyphs). It is important to understand that the order of the list is ~immaterial, and instead of a simple ordered list, we have a more complex internal dependency system — this contrasts with most other make systems.

There is also documentation here on it.

5.4.3 d.lst

The syntax of d.lst is more comprehensible than build.lst, it omits some default actions, such as copying build.lst into inc/<module>/build.lst.

A line is of the form:

[action]: [arguments]
mkdir:    %_DEST%\inc%_EXT%\external
      
where if '[action]:' is omitted, it defaults to the 'copy' action. Typical actions are copy, mkdir, touch, hedabu, dos and linklib.

The 'hedabu' action is particularly interesting, inasmuch that it cosmetically re-formats the header to shrink it on install (otherwise it's much like the copy action).

During the action, various macro variables are expanded some of which are:

Typically then, if indeed you need to add a rule (cf. implicit directory copies), it will be of the form:
..\%__SRC%\inc\sal\*.h %_DEST%\inc%_EXT%\sal\*.h
      
NB. relative paths are relative to the 'prj/' directory.

5.6 Can I get a char *, please?

Just barely. OO.o has at least six string wrappers, although the C implementations are of little interest:

A couple of conversion functions are really useful here, Particularly:

rtl::OString aOString = ::rtl::OUStringToOString (aOUString, RTL_TEXTENCODING_UTF8);

And the reverse:

rtl::OUString aOString = ::rtl::OStringToOUString (aOString, RTL_TEXTENCODING_UTF8);

If you just want to programattically print out a string for debugging purposes you probably want to see this.

5.7. Linkoo & Limitations

Linkoo is the tool that implements the -l functionality of bin/ooinstall. It essentially sym-links files of similar names into your local tree, allowing a fast development iteration.

It is however slightly limited - some of the modules cannot be linked for various reasons; these are: cppuhelper and configmgr, thus in the rare case that these are altered, they must be copied manually into /opt/OOInstall/program.

In addition symlinks cannot be used for soffice.bin, and this is more commonly altered - it has to be installed from desktop/unxlngi4.pro/bin/soffice NB. with an appended '.bin'

6. Debugging OO.o

This section assumes use of gdb, from the console.

6.1. Building with debugging symbols

OO.o includes a way to add debugging code in per module, via the build debug=true command in each module. This also adds lots of runtime assertions, churning warnings etc. in addition to debug symbols - which can be useful. To do just a plain build with debug symbols though use build debug=true dbg_build_only=true

You can also configure OO.o with --enable-symbols to build with symbolic generation.

6.2. Starting at the beginning

We start in 'main' with a sal wrapper, that calls vcl/source/app/svmain.cxx (SVMain). It invokes Main on pSVData->mpApp; but pSVData is an in-line local. To debug this use the pImplSVData global variable. eg:

      p pImplSVData->maAppData
      
This 'Main' method is typically: desktop/source/app/app.cxx (Main).

6.3. Examining strings

We have already seen that OO.o has it's own set of string classes, none of which gdb understands. You need to use: (gdb) print dbg_dump(sWhatEver) to print the contents of a UniString/ByteString/rtl::OUString/rtl::OString regardless of the type when debugging C++ code. See Caolan's write-up here for details.

6.4 Getting the build order right

The build dependencies of the modules are clearly crucial to getting a clean build. When you type 'build' in a module, first build examines prj/build.list, eg. neon/prj/build.lst:

xh      neon  :  soltools external expat NULL
      
this specifies that 'soltools', 'external' and 'expat' have to be satisfactorily built and delivered before neon can be built. Occasionally these rules get broken, and people don't notice for a while.

6.5 It crashes, but only in gdb

What fun — you symlinked desktop/unxlngi4.pro/bin/soffice to soffice.bin in your install tree didn't you. That works fine if you just run it, but it seems gdb unpacks the symlink and passes a fully qualified path as argv[0], which defeats the hunting for the binary in the path, so it assigns the program base path as /opt/OpenOffice/OOO_STABLE_1/desktop/unxlngi4.pro/bin and starts looking for (eg. applicat.rdb) in there. Of course when it fails to find any setup information, it silently crashes somewhere else yards away from the original problem.

6.6 It crashes, but doesn't crash

For various reasons signal handlers are trapped and life can get rather confusing; thus it's best for builders to apply something like this:

--- sal/osl/unx/signal.c
+++ sal/osl/unx/signal.c
@@ -188,6 +188,8 @@ static sal_Bool InitSignal()
             bSetILLHandler = sal_True;
        }
 
+       bSetSEGVHandler = bSetWINCHHandler = bSetILLHandler = bDoHardKill = sal_False;
+
        SignalListMutex = osl_createMutex();
 
        act.sa_handler = SignalHandlerFunction;

NB. trailing space.

6.7 I can't find the code from the trace

Some methods, are described as having a special linkage, such that they can be used in callbacks; these typically have a prefix: 'LinkStub', so search for the latter part of the identifier in a freetext search. eg.

      IMPL_LINK( Window, ImplHandlePaintHdl, void*, EMPTYARG )
      
builds the 'LinkStubImplHandlePaintHdl' method.

6.8 How can I re-build just the files I see in the trace

Often when you run gdb on a build without debugging symbols, you get an unhelpful gdb trace, but yet you can't afford the time/space to recompile all of OO.o with debugging symbols. Thus we have created a small perl helper, which will hunt for & touch files containing the symbols from your trace. This sub-set can then be re-built with debugging enabled for a better trace next time around:

    gdb ./soffice.bin
    ...
    bt
#0  0x40b4e0a1 in kill () from /lib/libc.so.6
#1  0x409acfe6 in raise () from /lib/libpthread.so.0
#2  0x447bcdbd in SfxMedium::DownLoad(Link const&) () from ./libsfx641li.so
#3  0x447be151 in SfxMedium::SfxMedium(String const&, unsigned short, unsigned char, SfxFilter const*, SfxItemSet*) ()
   from ./libsfx641li.so
#4  0x448339d3 in getCppuType(com::sun::star::uno::Reference const*) () from ./libsfx641li.so
...
    quit
    cd base/OOO_STABLE_1/sfx2
    ootouch SfxMedium
    build debug=true
    

Thus, all files referencing / implementing anything with SfxMedium will be touched, and hence rebuilt with debugging symbols.

6.9 How can I re-build all the files in one source directory

If you want to recompile the code in just your current directory, you can use the killobj dmake target to remove the object files:

    dmake killobj
    dmake
    

6.10 It always crashes in sal_XErrorHdl

You are a victim of asynchronous X error reporting; export SAL_SYNCHRONIZE=1 will make all the X traffic synchronous, and report the error by the method that caused it, it'll also make OO.o far slower, and the timing different.

6.11 It silently fails to load my word file

Caolan suggests: put breakpoints in ww8par.cxx top and tail of SwWW8ImplReader::LoadDoc, and confirm that the document gets as far as the import filter.

A handy human place to put a breakpoint is in SwWW8ImplReader::ReadPlainChars, you can see chunks of text as they are read in. Alternatively SwWW8ImplReader::AppendTxtNode as each paragraph is inserted.

6.12 How do I use the debug console ?

So OO.o contains some hefty debugging infrastructure; pictured here. Unfortunately enabling it is not altogether trivial. Firstly - none of it is built into a product build; so we need to go to re-build some core parts of OO.o as non-product builds; and then we need to re-run linkoo to link those new builds into our set.

First create a debug Environment file; I call it LinuxIntelEnv.Set.debug:

TMPFILE=~/.Env.Set.debug

# Purge .pro bits
sed 's/\.pro//g' LinuxIntelEnv.Set.sh > $TMPFILE
. $TMPFILE
rm $TMPFILE

# Clobber product parts
unset PRODUCT PROSWITCH PROFULLSWITCH 
Now do source ./LinuxIntelEnv.Set.debug, this sets up your environment for a non-product build.

cd vcl; build dbgutil=true --all linkoo

Now - just run OO.o, and when it's in full-flow, press <Alt>-<Shift>-<Control> 'D' in that order; this should popup a debugging options window. The debugging options are subsequently saved to the .dbgsv.init file for the next run; you can control the location of that with: export DBGSV_INIT=$(HOME)/.dbgsv.init eg. it is (unfortunately) a binary file.

6.13 Excel Interop debugging

This is fairly easy; edit sc/source/filter/inc/biffdump.hxx, define EXC_INCL_DUMPER to 1, and re-build 'sc'. Also, copy sc/source/filter/excel/biffrecdumper.ini to ~. Then run soffice.bin foo.xls and you should get a foo.txt with the debug data in it.

6.14 The trace shows a crash in 'poll'

OO.o is a fairly threaded program, you're prolly just looking at the wrong thread: there are not likely to be bugs in poll. Use thread apply all backtrace to get a backtrace of all threads - this will most likely fail. When it does do: thread 1 then bt - most crashers occur in the 'main' thread.

6.15 What does this trace mean ?

There are several typical stack-traces that come up again and again, one would be:

#15 0x4164a501 in raise () from /lib/tls/libc.so.6
#16 0x4164bcd9 in abort () from /lib/tls/libc.so.6
#17 0x415fb5a5 in std::set_unexpected ()
   from /home/mnagashree/m72install/program/libstdc++.so.5
#18 0x415fb5e2 in std::terminate ()
   from /home/mnagashree/m72install/program/libstdc++.so.5
#19 0x415fb69c in __cxa_rethrow ()
    

This section of trace means (essentially) that an exception was thrown - but there was no-one trying to catch it. Often this means there was a missing 'try {} catch()' clause in one of the calling frames.

A great way to debug exceptions is to add a breakpoint in catch/throw, do this with catch throw or catch catch in gdb.

7. Contributing patches

7.1. Diff style

Always use unified diffs 'cvs -z3 diff -u', since they are the most readable, (and sensible) types of diff to read and apply.

7.2. Some interaction

It tends to be a good idea to work out how best to implement your fix, and/or discuss it with a developer or two before hand. Some of the best ways to do this are to post to dev@openoffice.org or lurk on IRC at irc.freenode.net on the #OpenOffice.org channel. IRC is an awfully poor communication medium, but better than no communication. See here to unwind who is whom.

7.3. ooo-build patch creation

See here for more information on our patching infrastructure.

7.4 filing bugs

See here for a sane / hackers interface to OpenOffice's IssueZilla.

Since we can often extract the owner of a module by checking for the ADMIN_FILE_OWNER tag; there is a little tool in ooo-build: bin/owner <file-name> that helps you find out who to E-mail / interact with about a given module; it's worth assigning very specifically located bugs to that person.

8. Misc. tips

8.1. Getting an OO.o CVS account

This is the process for getting CVS accounts for the up-stream CVS server, ooo-build accounts are handled differently. To see how the issue raising process works see eg. issue #7270. Having got the account setup, you need to tunnel to the secure CVS server something like:

ssh -f -2 -P -L 2401:localhost:2401 tunnel@openoffice.org sleep 1400 < /dev/null > /dev/null

Then you need to change your CVSROOT to point at your local machine, since this is the endpoint of the tunnel:

:pserver:mmeeks@localhost:/cvs

Your account name and password - will be the same as you use for filing bugs etc. in the SourceCast system. Login, and ... you'll soon notice that you'll need to migrate your CVS settings to the new server, to do this without wasting B/W with duplicate checkouts do:

bin/re-root /path/to/checkout ":pserver:<account-name-here>@localhost:/cvs"

Of course, to commit anything, you'll need various project priviliges - and to battle the bureaucracy.

8.2 Using patch / diff

Patch/diff are a wonderful tools, however people often provide data that confuses them in a messy and difficult to un-tangle sort of a way. Here are some hints on untangling the mess:

Before committing a patch to ooo-build, test it with make patch.apply in the top-level, NB. it really pays to have 2 copies of the tree - 1 hacked, 1 pristine.

8.3 Make clean

Just use dmake clean in the build/src680 directory. Or for a more descructive version in ooo-build try rm -Rf build.

8.4 CVS setup

In order to make efficient use of bandwidth, generate sensible diffs by default, and follow the trend, you need this in your ~/.cvsrc.

cvs -z3 -q
diff -upN 
update -dP
checkout -P
status -v 

8.5. Adding header files to the build

Adding header files to the OO.o build is notoriously clunky. To add header files under external/, make sure you list them in external/prj/d.lst so that they get copied under the solver/680/unxlngi4.pro/inc/external directory when building.

8.6. Finding where to hack

Often there is some GUI element used near the thing you're trying to locate / fix. So, find some sufficiently unusual string and search for it in LXR's text search; this should reveal an identifier related to that string; eg. SID_AUTOFORMAT, or FN_NUM_BULLET_ON. Having obtained that, do a new text search for that string, and you'll find the usage [ or a chained define to something else ]. For eg. menus/toolbar buttons the functionality is usually in a case statement eg. case SID_AUTOFORMAT: ...

8.9. Adding an UNO interface

This is slightly more complex build wise than you might expect.

This should result in your type information being built into types.rdb & installed. This is however only part of the mix: the module 'offuh' builds & installs the .hdl/.hpp files we need (for C++), so if 'wherever' is a new path we need to update offuh/prj/d.lst to install those files too.

Finally, check that the types.rdb in the install set has your types; a regview types.rdb / | grep 'whatever' -i would work well for that. If not, copy it in from the solver.

9. Useful links

9.1. www.OpenOffice.org

While much of the initial openoffice.org structure seems not to be orientated towards hackers, there is much useful documentation if you dig for it.

For OO.o news, and a distinctive perspective on OO.o see ooodocs.org.

Other related pages are: OOExtras provides extra templates, macros, and clip art (curiously licensed under the LGPL). Quickstart applet for GNOME (and KDE). Dictionaries & Docs from Kevin Hendricks.

And an interesting portal.

9.2. Patch archives

While productising various releases of OpenOffice, different projects have come up with (quite huge) patch sets against OO.o. These have mostly been folded back into 2.0 but, there are still a few outstanding. The separate packaging efforts can be found here:

10. (Infrequently) FAQ

So no-one ever asked me these, I just made them up to astro-turf a bit (safer, wipe-clean, more durable questions).

10.1. Why do branches like 'mws_srx645' have odd numbers in them ?

By consulting various oracles, entrails etc. it transpires that in theory this number once incremented weekly, there being weekly freezes and hence solvers, development environments. The 'mws' stands for 'Master Workspace'. The latest 2.0 development is done with the SRC680 stem; with auto-incrementing milestones; hence tags like SRC680_m66 would be common.

10.2. Why does the build require Java ?

Essentially it seems there are a lot of XML files involved in component registration, and various other services. Also, the person who designed the XML files fell in love with trendy XML-things and used not-very-standard, very complicated bits gratuitously. It turns out that using Java is the best/only way to get this manipulation done. Also, Java can be used nicely at run-time if it's on the machine.

But from tag SRC680_m44 onwards there is an alternative python script included to address the issue of processing these XML files used for registration, so it should be possible to build versions after that date without java, though your milage may vary as the default build is with java.

10.3. Why is [t]csh so broken ?

This is rather inscrutable; some particularly curious brokenness would be the way piping commands on stdin is crucially different to inputting them from the tty thus: echo 'echo #define DLL_NAME "libsch641li.so" >./foo.hxx' | /bin/tcsh -s fails to do anything whereas typing the same thing into the shell works just fine. Even more oddly: tcsh -fc 'echo #define DLL_NAME "libsch641li.so" >./foo.hxx' does do the right thing. See also csh.

10.4. I just tried to re-locate the build, why doesn't it work ?

The simple answer is: you need to run relocate /path/to/new/build; another more complex answer is:

Well, assuming you have re-configured things (LinuxIntelEnv.Set will need paths tweaking too — and re-importing to your shell) — then it's most likely down to the ubiquitous non-relative paths, coded in lots of generated / built files, particularly '.dpc*' (dependency) files. Try: find -name '*.dpc*' -exec rm {} \;

The stlport does some really broken things, so you will also need to edit the 'stl_gcc.h' inside the solver/, and replace the two path instances there (see inc/stl/config/stl_gcc.h).

10.5 CVS says 'Fatal Error aborting. [acc] no such user', why ?

While of course it's possible that your user name is not registered; often this just means your ~/.cvspass got lost and/or that you haven't logged in. cvs login, and repeat the command.

10.6 What does '.pro' in 'unxlngi4.pro' mean ?

Product — isn't it obvious ?

10.7 What does OpenOffice really look like ?

Today I found a photograph of it on my system, so I stuck it in here:

Abstraction Layers

10.8 How do I take a screenshot of OO.o ?

OOo does some very odd things with X resources, thus some conventional screenshot apps fail to take accurate shots. ImageMagick's 'import' however does a good job; use: import foo.png from the console, or sleep 2; import -window root foo.png instead. NB. unless you want your world to look tiny, you need to turn large toolbar icons first.

10.9 Why does the code look so ugly?

The authors must be using a really strange editor. It thinks tab stops are on every fourth column. Of course, the files come out ugly in Unix editors which know that tabs are eight characters wide.

If you happen to use a Real Editor, we have some pink glasses to sell to you. Paste the contents of http://go-oo.org/emacs.el into your .emacs, or load it with a line like this: (load "/path/to/that/file.el"). Don't forget to adapt my-openoffice-path-regexp to your needs.

Henceforth emacs will use 4-column tabs for your OOo source files. (And use C++-Mode for sdi-, hrc-, and src-files.) Alternatively if you are sufficiently set in your ways that you can't cope with investing these few seconds do: M-x set-variable\ntab-width 4 & learn to love change.

Apparently if you use vi you can do: :set ts=4, and good luck to you.

11. Working with us

See the About ooo-build document.


If you have more hacking tips, corrections, a grip of correct spelling etc. please do mail me, at michael.meeks@novell.com.