BuildLectern - building or editing Lectern files
BuildLectern options... dest
This program constructs in the file named dest a Lectern document composed of page images, attributes and other data, suitable for viewing with Lectern(1), and suitable for indexing with BuildLecternIndex(1). The page images come from files each containing a single image, or from a file containing a PostScript job, or from pre-existing Lectern documents. The resulting documents include computed copies of the images at reduced scales, and derived OCR data.Note: the arguments to BuildLectern are processed sequentially (as described below): an argument such as -PScolor affects only input files later in the argument list.
Caution: it is surprisingly easy to forget the dest argument, and this can cause BuildLectern to think that the name of your last input file is the intended destination, and so to write the output there.
The program constructs the document in a temporary file in the same directory as the final destination; on success the temporary file is renamed as the destination, and on failure the temporary file is deleted. So it is acceptable to use this program to modify a file in-place by naming the same destination file in an -images, -include or -rescale option.
Uncolored Lectern documents generally occupy about 100 KBytes per page, if you use the -noUnscaled option. The unscaled images (not normally included for PostScript jobs) add about 100 KBytes per page, and color roughly doubles the size. If you use the -only 3 option, the size is reduced to about 35 KBytes per page.
This program constructs various temporary files in the same directory as the final destination file. These include a temporary copy of the output, and a temporary copy of one image if the image is color or in TIFF format or comes from standard input or PostScript. The temporary output file is the same size as the final output, and the temporary image copy can be up to 32 MBytes (for 300 DPI 24-bit color plus the corresponding black and white). You need to have this much disk space available!
For the purposes of describing the behavior of this program, the images within a Lectern document should be viewed as being numbered sequentially from 1, regardless of the user's notion of how pages are numbered. The -page1 option allows you to specify the user's notion of page numbering, but does not affect the image numbers considered by this program. (Programmers should note that the internal format of a Lectern document uses yet another numbering system for pages, starting from 0; but that need not concern users of this program.)
The program constructs the document by processing its arguments sequentially. While doing so, it maintains a current image number (initially 1), a mode (initially simplex), a current gamma value (initially 1.0), a current resolution (initially 300 DPI), a flag indicating whether to include the unscaled images in the output file (initially set), and an OCR flag (initially clear unless BuildLectern was linked with an OCR library, in which case it is real). The destination document initially has no images, no attributes and no original; its contents is 0 (meaning undefined), its page1 is 1 (meaning the first image), and its index is 0 (meaning undefined).
If an argument is not one of the options described below, it should be a file's pathname. The named file should contain either the image of a single page at the current resolution, in PBM, PGM, PPM, or TIFF format, a Lectern file, or a PostScript job. It is also possible to read a PostScript job from standard input, by specifying - as the file name on the command line.
When the program encounters an image file, it copies the image into the document at the current image number, and updates the current image number: if the mode is simplex it adds 1, if the mode is recto it adds 2 and if the mode is verso it subtracts 2 (subject to a minimum of 1). While copying the image, the program creates copies of the image at reduced scales (using the current gamma value), and applies an OCR algorithm to the image unless the OCR flag is not real. The reduced images are scaled down by the integer nearest to (current resolution * n / 300), for n equal to 2, 3, and 4 (but see -only). The original image is copied to the output only if the include unscaled flag is set. Note the -stdin option, which provides an alternative way to process a sequence of images without keeping them all in files.
When the program encounters a Lectern file, its contents are included as if -include file 1 999999 had been specified.
When the program encounters a PostScript job, the Ghostscript interpreter is run to produce a sequence of images and OCR data (if the OCR flag is not cleared), which are copied into the document in the normal way. While processing a PostScript job, the program consults these additional variables: an image type (initially black&white), an orientation (initially portrait), a scale (initially 1.0), and a flag indicating whether to include the PostScript in the final document (initially set). Also, different defaults are used for several normal variables: gamma is 0.454, the include unscaled flag is cleared, and the OCR flag is fake. The image type enforces an upper bound on the kind of images that will be produced by the PostScript interpreter, with black&white < grayscale < color.
Whenever the program is placing a color image into dest, it reduces the set of colors in the image to a small set, chosen from a 4 by 4 by 4 color cube (i.e., 64 colors), so that the image will be suitable for color mapped monitors. Similarly, it reduces the set of grays in a grayscale image to 16 gray levels. Note that this means that any subsequent rescaling or other image processing will be based on an image from which information has been lost, so the image quality might then suffer. There is no loss of information for black-and-white images.
The simplest use is to invoke the program with a PostScript job constituting the entire document.
This will construct in dest a Lectern document consisting of the pages produced by the PostScript job, each scaled to the various resolutions, with no imaging adjustments and no attributes. When you view such a document with Lectern, the pages will be numbered from 1 and the Contents and Index commands will not work. For example:
BuildLectern a.ps a.lectYou can assemble a document from several PostScript files:BuildLectern cover.ps chap1.ps chap2.ps chap3.ps index.ps book.lectNote that concatenating PostScript files is non-trivial, so BuildLectern doesn't try; instead it just keeps the first PostScript file it encounters as the document's original. In this case, it may be better to specify -PSnoOriginal before the first PostScript argument.If you have a set of image files resulting from scanning a document, you can use BuildLectern to build a document from them like this:
BuildLectern *.tif a.lectAlternatively, if the document was scanned from two-sided originals you might use a command line such as the following:BuildLectern -recto *.fronts.tif -verso *.backs.tif a.lectIf you just want to preview or display images, and the images were created at 300 DPI and you want to display at 100 DPI, you could use:BuildLectern -only 3 *.pbm a.lectA third common usage pattern is to construct a new document from an existing one, adding attributes, specifying the location of contents and index pages, or modifying the images' gamma adjustments. You could do these in a single run of BuildLectern, or incrementally in several separate runs. For example:BuildLectern a.lect -contents 3 -index 57 b.lector, equivalently:BuildLectern -include a.lect 1 9999 -contents 3 -index 57 b.lector:BuildLectern b.lect -author: "Andrew Birrell" c.lector:BuildLectern -gamma 0.45 -rescale a.lect 1 9999 -index 59 d.lectThere are a few additional options, described below. You can also use the options to perform detailed re-arrangements, such as replacing a single page of a document, or constructing a single document from multiple documents, or selectively modifying the gamma adjustments of individual pages.
The complete set of options is as follows. (BuildLectern ignores case when checking for options.)
- -contents integer
- Specifies that the image containing the start of the document's table of contents is the image numbered integer, counting from 1 (regardless of the user's notion of page numbering).
- -gamma number
- Sets the current gamma value to the given number, which should be in the range [0.1 .. 10.0]. The current gamma value affects the appearance of the reduced scale images that the program creates. Gamma adjustment alters the mid-tones of an image, leaving pure white and pure black unchanged. Gamma values greater than 1.0 lighten the image, and values less than 1.0 darken it. See also the -rescale option, which lets you iterate on the choice of gamma values. The current gamma value has no effect on the unscaled images of the document. On most documents a gamma value somewhere in the range 0.4 to 1.0 is satisfactory.
- -image integer
- Sets the current image number to be the given integer (which must be no less than 1).
- -images file from for
- This option is the same as the -include option, except that only the images (and their scaled versions and OCR data) are copied. None of the source file's attributes and miscellaneous data (page1, contents, index and original values) is copied, only the images.
- -include file from for
- Includes the contents of a pre-existing Lectern document residing in file into the current document, by copying. All the miscellaneous data are copied: the page1 value, the contents value, the index value, the original, and the attributes, unless the corresponding datum is already defined for the destination document. (In other words, any miscellaneous data specified by an explicit command line option, or by an earlier -include or -rescale will dominate data from file).
The images in file starting at the image numbered from (counting from 1) are copied into the current document starting at the current image number (even if there is already such an image); after each image is copied, the current image number is modified in the same way as it is after processing an image file, with due regard to the current mode (recto, verso, or simplex). Images are copied until for images have been copied or until the last image in file has been copied. The relevant scaled images and OCR data are copied from file intact along with the original unscaled images (except that the unscaled images are omitted if the include unscaled flag is not currently set). See also the -rescale option, which is similar, but recreates the scaled images using the current gamma value; and the -images option, which is similar but does not copy the miscellaneous data.
- -includeUnscaled
- Sets the include unscaled flag, which controls whether original, unscaled, images are written to dest.
- -index integer
- Specifies that the image containing the start of the document's index pages is the image numbered integer, in the user's notion of page numbering. For example, if you have used the option -page1 3, and the document hardcopy has the first index page with a page number of 17 printed on it, you would say -index 17 (in this example, the index starts on the 20th image, counting from 1). Note that the meaning of this option is affected by any previous use of the -page1, -include or -rescale options.
- -key: value
- For any string key, adds a key-value pair to the document with the given key and value. For example, the option -author: Andrew specifies that the attribute author has the value Andrew. Adding an attribute with a given key implicitly removes any previous attribute with the identical key (case is significant).
- -noAttribute key
- If there is an attribute in the document with a key identical to key (case is significant), remove it. This is useful primarily to remove attributes that were copied in by using the -include option.
- -noOCR
- Clears the OCR flag, thus preventing any OCR data from being generated (either by running the real OCR algorithm or by doing fake OCR as part of interpreting PostScript) for subsequent images.
- -noUnscaled
- Clears the include unscaled flag, which controls whether original, unscaled, images are written to dest.
- -only n
- For subsequent pages, BuildLectern will compute only one version of the image, being the original image scaled down by a factor of n. (By default, BuildLectern includes three versions: the original image scaled down by factors of 4, 3, and 2.) With this option, OCR processing is disabled. This option also affects images processed by the -rescale option, but not those copied by the -include or -images, options. The purpose of this option is to permit fast document construction when only one scale and no OCR is needed, for example to proof PostScript files or to present slides. If both -only and -resolution are used, -only must come first, since -resolution alters the exact scale values used.
- -original file
- Copy the contents of file into the document as the document's original. This is intended to be the PostScript that was used to create the document, and if it is available it will be used by Lectern(1), for printing the document, in preference to printing the images themselves. (Potentially it could also be used to generate images at other resolutions, but certainly not today and probably never.)
- -page1 integer
- Specifies that the image which the user thinks of as the document's page 1 (i.e. the image whose hardcopy page has the digit "1" printed on it) is the image numbered integer, counting from 1.
- -PSblackAndWhite
- Sets the image type used for processing a PostScript job to black&white.
- -PScolor
- Sets the image type used for processing a PostScript job to color.
- -PSfakeOCR
- Sets the OCR flag to fake, which means that when PostScript is being interpreted, OCR data will be derived as a side-effect, rather than by running the actual OCR algorithm.
- -PSgray
- Sets the image type used for processing a PostScript job to grayscale.
- -PSgs
- Specifies the pathname of the Ghostscript executable (by default, gs).
- -PSincludeOriginal
- Sets the include original flag, which controls whether a PostScript job is included as the original in dest.
- -PSlandscape
- Sets the orientation to landscape, which causes output from a PostScript job to be rotated 90 degrees clockwise. If this results in upside-down images, use -PSlandscapeOther instead.
- -PSlandscapeOther
- Sets the orientation to upside-down landscape, which causes output from a PostScript job to be rotated 90 degrees counter-clockwise. If this results in upside-down images, use -PSlandscape instead.
- -PSportrait
- Sets the orientation portrait, which causes output from a PostScript job to be used as-is.
- -PSnoOriginal
- Clears the include original flag, which controls whether a PostScript job is included as the original in dest.
- -PSscale number
- Sets the scale factor to number, which causes PostScript jobs to generate images of number*300DPI (which are then reduced in the normal way). For example, -PSScale 1.1 increases the final image size by 10%.
- -realOCR
- Sets the OCR flag to real, which causes OCR data for subsequent images to be acquired by running the OCR algorithm (even when processing PostScript files).
- -recto
- Sets the mode to recto. In this mode the current image number increases by 2 after processing an image file or while processing an -include option. The expectation is that while in recto mode the program will be processing the recto (odd-numbered or front) pages of the document, in ascending order, and that recto mode will be followed by an equal number of images in verso mode.
- -resolution integer
- Sets the current resolution of image files (PNM and TIFF) to integer instead of the default, 300 DPI. If both -only and -resolution are used, -only must come first, since -resolution alters the exact scale values used.
- -rescale file from for
- This option is the same as the -include option, except that while copying images from file, the existing scaled images are discarded and replaced by new ones created with the current gamma value. This allows you to iterate on the gamma value for a document, or for particular images in a document.
- -simplex
- Sets the mode to simplex. In this mode the current image number increases by 1 after processing an image file or while processing an -include option. This contrasts with recto and verso modes.
- -stdin
- Processes a sequence of images from standard input, as if the images had been presented in separate files. The images should be in PPM, PGM or PBM raw (binary) format, and should appear sequentially on standard input, optionally separated by white space. The sequence is terminated by end of file on standard input. Note that TIFF images and non-raw PNM images are not yet supported in this option. This option is intended primarily for passing images produced by gs(1) into BuildLectern through a pipe. This avoids the use of a large amount of temporary disk storage for the complete set of images, since this option keeps only one image on disk at a time. Note that you can't just pipe the standard output of gs(1) into BuildLectern, since gs writes status messages on its standard output. Instead, give gs(1) an option such as
"-sOutputFile=|BuildLectern foo.lect -stdin"including the quotes.
- -verbose
- Writes details of the image scaling and OCR operations to the standard error stream.
- -verso
- Sets the mode to verso and immediately subtracts 1 from the current image number (recall that the current image number will have been increased by 2 after processing the final recto image). In this mode the current image number decreases by 2 after processing an image file or while processing an -include option. You will get an error message if this process would result in an image having a number less than 1. The expectation is that verso mode will be set immediately after recto mode, and that while in verso mode the program will be processing the verso (even-numbered or back) pages of the document, in descending order.
gs(1), pgmtopbm(1), ppmtopgm(1), tifftopnm(1).
Andrew D. Birrell and Paul McJones
Copyright 1994 Digital Equipment Corporation.
Distributed only by permission.
Last modified on Fri Jun 7 14:16:12 PDT 1996 by mcjones modified on Wed Jun 7 17:09:13 PDT 1995 by birrell modified on Sun Jan 1 16:18:32 PST 1995 by glassman