Feel free to skip this page if you feel you have a good grasp on what the compilation commands do.
I find that distressingly few people seem to be taught in their programming classes is how to go about compiling programs once they've written them. Novices rely either on a single memorized command, or else on the builtin rules in make. I have been surprised by extremely computer literate people who learned to compile with optimization because they simply never were told how important it is. Rudimentary knowledge of how compilation commands work may make your programs run twice as fast or more, so it's worth at least five minutes. This page describes just about everything you'll need to know to compile C or C++ programs on just about any variant of unix.
The examples will be mostly for C, since C++ compilation is
identical except that the name of the compiler is different.
Suppose you're compiling source code in a file called
xyz.c
and you want to build a program called
xyz
. What must happen?
You may know that you can build your program in one step, using a command like this:
cc -g xyz.c -o xyz
This will work, but it conceals a two-step process that you must understand if you are writing makefiles. (Actually, there are more than two steps, but you only have to understand two of them.) For a program of more than one module, the two steps are usually explicitly separated.
The first step is the translation of your C or C++ source code
into a binary file called an object file. Object files usually have
an extension of .o
. (For some more recent projects,
.lo
is also used for a slightly different kind of
object file.)
The command to produce an object file on unix looks something like this:
cc -g -c xyz.c -o xyz.o
cc
is the C compiler. Sometimes alternate C
compilers are used; a very common one is called gcc
. A
common C++ compiler is the GNU compiler, usually called
g++
. Virtually all C and C++ compilers on unix have
the same syntax for the rest of the command (at least for basic
operations), so the only difference would be the first word.
We'll explain what the -g
option does later.
The -c
option tells the C compiler to produce a
.o
file as output. (If you don't specify
-c
, then it performs the second compilation step
automatically.)
The -o xyz.o
option tells the compiler what the
name of the object file is. You can omit this, as long as the
name of the object file is the same as the name of the source file
except for the .o
extension.
For the most part, the order of the options and the file names
does not matter. One important exception is that the output file
must immediately follow -o
.
The second step of building a program is called linking. An object file cannot be run directly; it's an intermediate form that must be linked to other components in order to produce a program. Other components might include:
printf
function,
then the definition of the printf
function must be
included from the system C library. Some libraries are
automatically linked into your program (e.g., the one containing
printf
) so you never need to worry about them.
The linker is the program responsible for taking a collection of object files and libraries and linking them together to produce an executable file. The executable file is the program you actually run.
The command to link the program looks something like this:
cc -g xyz.o -o xyz
It may seem odd, but we usually run the same program
(cc
) to perform the linking. What happens under the
surface is that the cc
program immediately passes off
control to a different program (the linker, sometimes called the
loader, or ld
) after addding a number of complex pieces
of information to the command line. For example, cc
tells ld
where the system library is that includes the
definition of functions like printf
. Until you start
writing shared libraries, you usually do not need to deal directly
with ld
.
If you do not specify -o xyz
, then the output
file will be called a.out
, which seems to me to be a
completely useless and confusing convention. So always specify
-o
on the linking step.
If your program has more than one object file, you should specify all the object files on the link command.
Why not just use the simple, one-step command, like this:
cc -g xyz.c -o xyz
instead of the more complicated two-stage compilation
cc -g -c xyz.c -o xyz.o cc -g xyz.o -o xyz
if internally the first is converted into the second? The
difference is important only if there is more than one module in
your program. Suppose we have an additional module,
abc.c
. Now our compilation looks like this:
# One-stage command. cc -g xyz.c abc.c -o xyzor
# Two-stage command. cc -g -c xyz.c -o xyz.o cc -g -c abc.c -o abc.o cc -g xyz.o abc.o -o xyz
The first method, of course, is converted internally into the
second method. This means that both xyz.c
and
abc.c
are recompiled each time the command is run. But
if you only changed xyz.c
, there's no need to recompile
abc.c
, so the second line of the two-stage commands
does not need to be done. This can make a huge difference in
compilation time, especially if you have many modules. For this
reason, virtually all makefiles keep the two compilation steps
separate.
That's pretty much the basics, but there are a few more little details you really should know about.
Usually programmers compile a program either either for debug or for speed. Compilation for speed is called optimization; compiling with optimization can make your code run up to 5 times faster or more, depending on your code, your processor, and your compiler.
With such dramatic gains possible, why would you ever not want to
optimize? The most important answer is that optimization makes use
of a debugger much more difficult (sometimes impossible). (If you
don't know anything about a debugger, it's time to learn. The half
hour or hour you'll spend learning the basics will be repayed many
many times over in the time you'll save later when debugging. I'd
recommend starting with a GUI debugger like kdbg
,
ddd
, or gdb
run from within emacs (see the
info pages on gdb for instructions on how to do this).)
Optimization reorders and combines statements, removes unnecessary
temporary variables, and generally rearranges your code so that it's
very tough to follow inside a debugger. The usual procedure is to
write your code, compile it without optimization, debug it, and then
turn on optimization.
In order for the debugger to work, the compiler has to cooperate
not only by not optimizing, but also by putting information about
the names of the symbols into the object file so the debugger knows
what things are called. This is what the -g
compilation option does.
If you're done debugging, and you want to optimize your code,
simply replace -g
with -O
. For many
compilers, you can specify increasing levels of optimization by
appending a number after -O
. You may also be able to
specify other options that increase the speed under some
circumstances (possibly trading off with increased memory usage).
See your compiler's man page for details. For example, here is an
optimizing compile command that I use frequently with the
gcc
compiler:
gcc -O6 -malign-double -c xyz.c -o xyz.o
You may have to experiment with different optimization options
for the absolute best performance. You may need different options
for different pieces of code. Generally speaking, a simple
optimization flag like -O6
works with many compilers
and usually produces pretty good results.
Warning: on rare occasions, your program doesn't actually do
exactly the same thing when it is compiled with optimization. This
may be due to (1) an invalid assumption you made in your code that
was harmless without optimization, but causes problems because the
compiler takes the liberty of rearranging things when you optimize;
or (2) sadly, compilers have bugs too, including bugs in their
optimizers. For a stable compiler like gcc
on a common
platform like an pentium, optimization bugs are seldom a problem (as
of the year 2000--there were problems a few years ago).
If you don't specify either -g
or -O
in
your compilation command, the resulting object file is suitable
neither for debugging nor for running fast. For some reason, this
is the default. So always specify either -g
or
-O
.
On some systems, you must supply -g
on both the
compilation and linking steps; on others (e.g., linux), it needs to
be supplied only on the compilation step. On some systems,
-O
actually does something different in the linking
phase, while on others, it has no effect. In any case, it's always
harmless to supply -g
or -O
for both
commands.
Most compilers are capable of catching a number of common
programming errors (e.g., forgetting to return a value from a
function that's supposed to return a value). Usually, you'll want
to turn on warnings. How you do this depends on your compiler (see
the man page), but with the gcc
compiler, I usually use
something like this:
gcc -g -Wall -c xyz.c -o xyz.o
(Sometimes I also add -Wno-uninitialized
after
-Wall
because of a warning that is usually wrong that
crops up when optimizing.)
These warnings have saved me many many hours of debugging.
Often, necessary include files are stored in some directory other
than the current directory or the system include directory
(/usr/include
). This frequently happens when you are
using a library that comes with include files to define the
functions or classes.
Suppose, for example, you are writing an application that uses
the Qt libraries. You've installed a local copy of the Qt library
in /home/users/joe/qt
, which means that the include
files are stored in the directory
/home/users/joe/qt/include
. In your code, you want
to be able to do things like this:
#include <qwidget.h>instead of
#include "/home/users/joe/qt/include/qwidget.h"
You can tell the compiler to look for include files in a
different directory by using the -I
compilation
option:
g++ -I/home/users/joe/qt/include -g -c mywidget.cpp -o mywidget.o
There is no space between the -I
and the directory
name.
When the C++ compiler is looking for the file
qwidget.h
, it will look in
/home/users/joe/qt/include
before looking in the system
include directory. You can specify as many -I
options
as you want.
You will often have to tell the linker to link with specific
external libraries, if you are calling any functions that aren't
part of the standard C library. The -l
(lowercase L)
option says to link with a specific library:
cc -g xyz.o -o xyz -lm
-lm
says to link with the system math library, which
you will need if you are using functions like
sqrt
.
Beware: if you specify more than one -l
option, the order can make a difference on some systems. If you are
getting undefined variables when you know you have included the
library that defines them, you might try moving that library to the
end of the command line, or even including it a second time at the
end of the command line.
Sometimes the libraries you will need are not stored in the
default place for system libraries. -labc
searches for
a file called libabc.a
or libabc.so
or
libabc.sa
in the system library directories
(/usr/lib
and usually a few other places too, depending
on what kind of unix you're running). The -L
option
specifies an additional directory to search for libraries. To take
the above example again, suppose you've installed the Qt libraries
in /home/users/joe/qt
, which means that the library
files are in /home/users/joe/qt/lib
. Your link step
for your program might look something like this:
g++ -g test_mywidget.o mywidget.o -o test_mywidget -L/home/users/joe/qt/lib -lqt
(On some systems, if you link in Qt you will need to add other
libraries as well (e.g.,
-L/usr/X11R6/lib -lX11 -lXext
). What you need to do
will depend on your system.)
Note that there is no space between -L
and the
directory name. The -L
option usually goes before any
-l
options it's supposed to affect.
How do you know which libraries you need? In general, this is a
hard question, and varies depending on what kind of unix you are
running. The documentation for the functions or classes you are
using should say what libraries you need to link with. If you are
using functions or classes from an external package, there is
usually a library you need to link with; the library will usually
be a file called libabc.a
or libabc.so
or libabc.sa
if you need to add a -labc
option.
You may have noticed that it is possible to specify options which normally apply to compilation on the linking step, and options which normally apply to linking on the compilation step. For example, the following commands are valid:
cc -g -I/somewhere/include xyz.o -o xyz cc -g -L/usr/X11R6/lib -c xyz.c -o xyz
The irrelevant options are ignored; the above commands are exactly equivalent to this:
cc -g xyz.o -o xyz cc -g -c xyz.c -o xyz