DataparkSearch Engine 4.38 reference manual

The Web searching software


Table of Contents
1. Introduction
1.1. DataparkSearch Features
1.2. Where to get DataparkSearch.
1.3. Disclaimer
1.4. Authors
1.4.1. Contributors
2. Installation
2.1. SQL database requirements
2.2. Supported operating systems
2.3. Tools required for installation
2.4. Installing DataparkSearch
2.5. Possible installation problems
2.6. Installation registration
3. Indexing
3.1. Indexing in general
3.1.1. Configuration
3.1.2. Running indexer
3.1.3. How to create SQL table structure
3.1.4. How to drop SQL table structure
3.1.5. Subsection control
3.1.6. How to clear database
3.1.7. Database Statistics
3.1.8. Link validation
3.1.9. Parallel indexing
3.2. Supported HTTP response codes
3.3. Content-Encoding support
3.4. indexer configuration
3.4.1. Specifying WEB space to be indexed
3.4.2. Aliases
3.4.3. ServerTable
3.4.4. FlushServerTable
3.4.5. External parsers
3.4.6. Other commands uses in indexer.conf
3.5. Extended indexing features
3.5.1. Indexing SQL database tables (htdb: virtual URL scheme)
3.5.2. Indexing binaries output (exec: and cgi: virtual URL schemes)
3.5.3. Mirroring
3.6. Using syslog
3.7. Storing compressed document copies
3.7.1. Configure stored
3.7.2. How stored works
3.7.3. Using stored during search
4. DataparkSearch HTML parser
4.1. Tag parser
4.2. Special characters
4.3. META tags
4.4. Links
4.5. Comments
4.6. Body patterns
5. Storing data
5.1. SQL storage types
5.1.1. General storage information
5.1.2. Various modes of words storage
5.1.3. Storage mode - single
5.1.4. Storage mode - multi
5.1.5. Storage mode - crc
5.1.6. Storage mode - crc-multi
5.1.7. Storage mode - cache
5.1.8. SQL structure notes
5.1.9. Additional features of non-CRC storage modes
5.2. Cache mode storage
5.2.1. Introduction
5.2.2. Cache mode word indexes structure
5.2.3. Cache mode tools
5.2.4. Starting cache mode
5.2.5. Optional usage of several splitters
5.2.6. Using run-splitter script
5.2.7. Doing search
5.2.8. Using search limits
5.3. DataparkSearch performance issues
5.3.1. searchd usage recommendation
5.3.2. Memory based filesystem (mfs) usage recommendation
5.3.3. MySQL performance
5.3.4. Post-indexing optimization
5.4. SearchD support
5.4.1. Why using searchd
5.4.2. Starting searchd
5.5. Oracle notes
5.5.1.
5.5.2. Compilation, Installation and Configuration
6. Subsections
6.1. Tags
6.1.1. Tags in SQL version
6.2. Categories
7. Languages support
7.1. Character sets
7.1.1. Supported character sets
7.1.2. Character sets aliases
7.1.3. Recoding
7.1.4. Recoding at search time
7.1.5. Document charset detection
7.1.6. Automatic charset guesser
7.1.7. Default charset
7.1.8. Default Language
7.1.9. Recoding during search
7.2. Making multi-language search pages
7.2.1. How does it work?
7.2.2. Possible troubles
7.3. Segmenters for Chinese, Japanese, Korean and Thai languages
7.3.1. Japanese language phrase segmenter
7.3.2. Chinese language phrase segmenter
7.3.3. Thai language phrase segmenter
7.3.4. Korean language phrase segmenter
7.4. Multilingual servers support
8. Searching documents
8.1. Using search front-ends
8.1.1. Performing search
8.1.2. Search parameters
8.1.3. Changing different document parts weights at search time
8.1.4. Using front-end with an shtml page
8.1.5. Using several templates
8.1.6. Advanced boolean search
8.1.7. How search handles expired documents
8.2. mod_dpsearch module for Apache httpd
8.2.1. Why using mod_dpsearch
8.2.2. Configuring mod_dpsearch
8.3. How to write search result templates
8.3.1. Template sections
8.3.2. Variables section
8.3.3. Includes in templates
8.3.4. Conditional template operators
8.3.5. Security issues
8.4. Designing search.html
8.4.1. How the results page is created
8.4.2. Your HTML
8.4.3. Forms considerations
8.4.4. Relative links in search.htm
8.4.5. Adding Search form to other pages
8.5. Relevance
8.5.1. Ordering documents
8.5.2. Relevance calculation
8.5.3. Popularity rank
8.5.4. Boolean search
8.5.5. Crosswords
8.5.6. The Summary Extraction Algorithm (SEA)
8.6. Search queries tracking
8.7. Search results cache
8.8. Fuzzy search
8.8.1. Ispell
8.8.2. Aspell
8.8.3. Synonyms
8.8.4. Accent insensitive search
8.8.5. Acronyms and abbreviations
9. Miscellaneous
9.1. Reporting bugs
9.1.1. Core dump reports
9.2. Using libdpsearch library
9.2.1. dps-config script
9.2.2. DataparkSearch API
9.3. Database schema
A. Donations
Index
List of Tables
3-1. Verbose levels
5-1. Cache limit types
7-1. Language groups
7-2. Charsets aliases
8-1. Available search parameters
8-2. Configure-time parameters to tune relevance calculation (switches for configure)
9-1. server table schema
9-2. Several server's parameters values in srvinfo table