Next Previous Contents

2. Using distcc

2.1 Invoking distcc

distcc is prefixed to compiler command lines and acts as a wrapper to invoke the compiler either on the local client machine, or on a remote volunteer host.

For example, to compile the standard application program:

distcc gcc -o hello.o -c hello.c

Standard Makefiles, including those using the GNU autoconf/automake system use the $CC variable as the name of the compiler to run. In most cases, it is sufficient to just override this variable, either from the command line, or perhaps from your login script if you wish to use distcc for all compilation. For example:

make CC='distcc'

2.2 Options

Options to distcc must precede the compiler name. Any arguments or options following the name of the compiler are passed through to the compiler.

--help

Print a detailed usage message and exit.

--version

Show distcc version and exit.

2.3 Environment Variables

The way in which distcc runs the compiler is controlled by a few environment variables.

NOTE:

Some versions of make do not export Make variables as environment variables by default. Also, assignments to variables within the Makefile may override their definitions in the environment that calls make. The most reliable method seems to be to set DISTCC_* variables in the environment of Make, and to set CC on the right-hand-side of the Make command line. For example:

$ DISTCC_HOSTS='localhost wistful toey'
$ export DISTCC_HOSTS
$ CC='distcc' ./configure
$ make CC='distcc' all
          

Some Makefiles may, contrary to convention, explicitly call gcc or some other compiler, in which case overriding $CC will not be enough to call distcc. This should be harmless, however: those jobs will just run locally. The best solution is to update the Makefile to compile and link using $(CC) to promote future maintainability.

DISTCC_HOSTS

Space-separated list of volunteer host specifications.

DISTCC_VERBOSE

If set to 1, distcc produces explanatory messages on the standard error stream. This can be helpful in debugging problems. Bug reports should include verbose output.

DISTCC_LOG

Log file to receive messages from distcc itself, rather than stderr.

DISTCC_SAVE_TEMPS

If set to 1, temporary files are not deleted after use. Good for debugging, or if your disks are too empty.

DISTCC_TCP_CORK

If set to 0, disable use of "TCP corks", even if they're present on this system. Using corks normally helps pack requests into fewer packets and aids performance.

2.4 Which Jobs are Distributed?

Building a C or C++ program on Unix involves several phases:

distcc only ever runs the compiler and assembler remotely. The preprocessor must always run locally because it needs to access various header files on the local machine which may not be present, or may not be the same, on the volunteer. The linker similarly needs to examine libraries and object files, and so must run locally.

The compiler and assembler take only a single input file, the preprocessed source, produce a single output, the object file. distcc ships these two files across the network and can therefore run the compiler/assembler remotely.

Fortunately, for most programs running the preprocessor is relatively cheap, and the linker is called relatively infrequent, so most of the work can be distributed.

distcc examines its command line to determine which of these phases are being invoked, and whether the job can be distributed. Here is an example of a typical command that can be preprocessed locally and compiled remotely:

distcc gcc -o hello.o -DGREETING="hello" -c hello.c

The command-line scanner is intended to behave in the same way as gcc. In case of doubt, distcc runs the job locally.

In particular, this means that commands that compile and link in one go cannot be distributed. These are quite rare in realistic projects. Here is one example of a command that could not be distributed, because it calls the compiler and linker

distcc gcc -o hello hello.c

2.5 Running Jobs in Parallel

Moving source across the network is less efficient to compiling it locally. If you have access to a machine much faster than your workstation, the performance gain may overwhelm the cost of transferring the source code and it may be quicker to ship all your source across the network to compile it there.

In general, it is even better to compile on two or machines in parallel. Any number of invocations of distcc can run at the same time, and they will distribute their work across the available hosts.

distcc does not manage parallelization, but relies on Make or some other build system to invoke compiles in parallel.

With GNU Make, you should use the -j option to specify a number of parallel tasks slightly higher than the number of available hosts. For example:

$ export DISTCC_HOSTS='angry toey wistful localhost'
$ make -j5
            

2.6 Choosing a Host

The $DISTCC_HOSTS variable tells distcc which volunteer machines are available to run jobs. This is a space-separated list of host specifications, each of which has the syntax:

HOSTNAME[:PORT]       

A numeric TCP port may optionally be specified after a colon. If no port is specified, it uses the default, which is currently 4200.

If only one invocation of distcc runs at a time, it will always execute on the first host in the list. (This behaviour is not absolutely guaranteed, however, and may change in future versions.)

The name localhost is handled specially by running the compiler in place.

The daemon may be tested on localhost by setting

DISTCC_HOSTS=127.0.0.1

Although localhost causes distcc to execute the job directly, using an IP address will cause it to make a TCP connection to a daemon on localhost. This is slower, but useful for testing.

2.7 Load Distribution Algorithm

When distcc is invoked, it needs to decide which of the volunteers in DISTCC_HOSTS should be used to compile a job. It uses a simple heuristic to try to spread load across machines appropriately.

You can imagine all of the compile machines as being leaky buckets, some with larger holes (faster CPUs) than others. The distcc client tries to keep water at the same level on each one (the same number of jobs running), preferring hosts occurring earlier in DISTCC_HOSTS. Over the course of a build, the faster machines will complete jobs more quickly, and therefore be topped up more quickly and do more work overall, but without the client ever actually needing to know which one is fastest.

This design has the advantage of not requiring the client to know in advance the speeds of the volunteers, and being quite simple to implement. It copes quite well with machines that are temporarily slowed down: they are just topped-up more slowly in the future.

Scheduling is coordinated between different invocations of the distcc client by lockfiles in the temporary directory. There is no coordination between clients running as different users, on different hosts, or with different TMPDIR paths.

On Linux, scheduling slightly too many jobs on any machine is quite harmless, as long as the number is not so high that the machine begins thrashing. So it's OK to provide a -j number substantially higher than the number of available processors.

The biggest problem with this design is that it handles multiprocessor machines poorly: they probably ought to have jobs scheduled proportional to the number of processors. At the moment, the best thing is to run with a -j factor equal to the product of the maximum number of CPUs in any machine (MAX_CPUS) and the number of machines. This should make sure that roughly MAX_CPUS tasks run on every machine at all times, and will therefore keep all CPUs loaded, but will cause excessive task-switching on machines with fewer CPUs. Task switching is not very expensive on Linux so it is not a big problem, but it does lose a few percentage points of speed. This should be fixed in a future release.

2.8 Diagnostic Messages

Error messages or warnings from local or remote compilers are passed through to diagnostic output on the client. The compiler takes all file names and line numbers from pragmas in the preprocessed output, so error messages will always have the correct pathnames for files on the client.

distcc prints a message when it runs a command locally or remotely. For more information, set $DISTCC_VERBOSE and look at the server's log file.

By default, distcc prints diagnostic messages to stderr. Sometimes these are too intrusive into the output of the regular compiler, and so they may be selectively redirected by setting the $DISTCC_LOG environment variable to a filename.

The current version of the distcc daemon writes diagnostic messages only to files on its own machine. (By default, it uses the syslog daemon channel.) If compilation is failing, please examine the log file on the relevant volunteer machine.

2.9 distcc Exit Codes

The exit code of distcc is normally that of the compiler: zero for successful compilation and non-zero otherwise.

If distcc fails to distribute a job to a selected volunteer machine, it will try to run the compiler locally on the client. distcc only tries a single remote machine for each job.

distcc tries to distinguish between a failure to distribute the job, and a "genuine" failure of the compiler on the remote machine, for example because of a syntax error in the program. In the second case, distcc does not re-run the compiler locally, and returns the same exit code as the remote compiler.

If distcc fails to run the compiler, it may return one one of the following error codes. These are also used by distccd.

100 EXIT_DISTCC_FAILED

Generic or unspecified failure in distcc.

102 EXIT_BIND_FAILED

Failed to bind and listen on network socket. Port may already be in use.

103 EXIT_CONNECT_FAILED

Failed to establish network connection or listen on socket. The host may be invalid or unreachable, or there may be no daemon listening.

104 EXIT_COMPILER_CRASHED

The underlying compiler exited because of a signal. This probably indicates a compiler bug, or a problem with the hardware or OS on the server.

105 EXIT_OUT_OF_MEMORY

Obvious.

106 EXIT_BAD_HOSTSPEC

$DISTCC_HOSTS was undefined, empty, or syntactically invalid. (At the moment, you should never see this code because distcc will fall back to building locally. Let me know if you would prefer a hard error.)

2.10 Cross-Compilation

Cross compilation means building programs to run on a machine with a different processor, architecture, or operating system to where they were compiled. distcc supports cross compilation, including teams of mixed-architecture machines, although some changes to the compilation commands may be required.

The compilation command passed to distcc must be one that will execute properly on every volunteer machine to produce an object file of the appropriate type. If the machines have different processors, then simply using distcc cc will probably not work, because that will normally invoke the volunteer's native compiler.

Machines with the same instruction set but different operating systems may not necessarily generate compatible .o files. Empirically it seems that the native FreeBSD compiler generates object files compatible with Linux for C programs, but not for C++. It may be a good idea to install a Linux cross compiler on BSD volunteers.

Different versions of the compiler may generate incompatible object files. This seems to be much more of a problem with C++ than with C, because the C++ ABI (application binary interface) has changed in recent years. If you will be building C++ programs, it may be a good idea to install the same version of g++ on all machines.

gcc has two options to select at run time the target platform (-b) and the gcc version (-V) to be used. Several different gcc configurations can be installed side-by-side on any machine, and these options are used by the top-level "driver" program to switch between them. For more information, see Specifying Target Machine and Compiler Version in the gcc manual.

For example, adding -b i386-linux to $CFLAGS ought to make sure the correct compiler is invoked to build Linux/x86 programs. This has no particular effect if all the volunteers are natively of that type, but is very useful if some of the volunteer machines are different: either the correct compiler will be used, or you will see an error message like this if it is not installed.

gcc: installation problem, cannot exec `cpp0': No such file or directory
gcc: file path prefix `/usr/lib/gcc-lib/i386-freebsd/2.95.4/' never used

The parts of gcc particular to target machines and versions are normally kept in the directory /usr/local/lib/gcc-lib/MACHINE/VERSION.

Alternatively, you might specify as the compiler command the name of a script or symbolic link that calls the appropriate version of gcc on each machine. For example:

CC='distcc gcc-i386-linux'

In general, using the -b option is probably better, because it does not require any special creation of scripts on the volunteer machines beyond installing the appropriate gcc configuration. However, using a special compiler name may be useful if you need to make sure that a particular version of gcc's driver program is used, perhaps because you are testing gcc. This approach might also be useful with compilers other than gcc that have no built-in mechanism for choosing a target.

Suggestions for other ways to support cross-compilation or automatically detecting incompatibilities are welcome.

2.11 distcc Compatibility

distcc with ccache

distcc works well with the ccache tool for caching compilation results. To use the two of them together, simply set

CC='ccache distcc'

distcc with autoconf

distcc works quite well with autoconf.

DISTCC_VERBOSE can give autoconf trouble because autoconf tries to parse error messages from the compiler. If you redirect distcc's diagnostics using DISTCC_LOG then it seems to be fine.

Some autoconf-based systems "freeze" the compiler name used for configure into their Makefiles. To make them use distcc, you must either set $CC when running ./configure, and/or override $CC on the right-hand-side of the Make command line.

Some poorly-written shell scripts may assume that $CC is a single word. At the moment the best fix is to use a shell script that calls distcc.

distcc with libtool

Some versions of libtool seem not to cope well when CC is set to more than one word, such as "distcc gcc". Setting CC=distcc, which is supported in 0.10 and later, seems to work well.

distcc with MOC

MOC is the Qt meta-object compiler.

2.12 File Metadata

distcc transfers only the binary contents of source, error, and object files, without any concern for metadata, attributes, character sets or end-of-line conventions.

distcc never transmits file times across the network or modifies them, and so should not care whether the clocks on the client and volunteer machines are synchronized or not. When an object file is received onto the client, its modification time will be the current time on the client machine.


Next Previous Contents
distcc User Manual