DataparkSearch Engine 4.38 reference manual
The Web searching software
Copyright © 2003-2006 by Datapark corp.
Copyright © 2001-2003 by Lavtech.com corp.
Table of Contents
1.
Introduction
1.1.
DataparkSearch Features
1.2.
Where to get
DataparkSearch
.
1.3.
Disclaimer
1.4.
Authors
1.4.1.
Contributors
2.
Installation
2.1.
SQL database requirements
2.2.
Supported operating systems
2.3.
Tools required for installation
2.4.
Installing
DataparkSearch
2.5.
Possible installation problems
2.6.
Installation registration
3.
Indexing
3.1.
Indexing in general
3.1.1.
Configuration
3.1.2.
Running
indexer
3.1.3.
How to create SQL table structure
3.1.4.
How to drop SQL table structure
3.1.5.
Subsection control
3.1.6.
How to clear database
3.1.7.
Database Statistics
3.1.8.
Link validation
3.1.9.
Parallel indexing
3.2.
Supported HTTP response codes
3.3.
Content-Encoding support
3.4.
indexer configuration
3.4.1.
Specifying WEB space to be indexed
3.4.2.
Aliases
3.4.3.
ServerTable
3.4.4.
FlushServerTable
3.4.5.
External parsers
3.4.6.
Other commands uses in
indexer.conf
3.5.
Extended indexing features
3.5.1.
Indexing SQL database tables (htdb: virtual URL scheme)
3.5.2.
Indexing binaries output (exec: and cgi: virtual URL schemes)
3.5.3.
Mirroring
3.6.
Using syslog
3.7.
Storing compressed document copies
3.7.1.
Configure stored
3.7.2.
How stored works
3.7.3.
Using stored during search
4.
DataparkSearch
HTML parser
4.1.
Tag parser
4.2.
Special characters
4.3.
META tags
4.4.
Links
4.5.
Comments
4.6.
Body patterns
5.
Storing data
5.1.
SQL storage types
5.1.1.
General storage information
5.1.2.
Various modes of words storage
5.1.3.
Storage mode - single
5.1.4.
Storage mode - multi
5.1.5.
Storage mode - crc
5.1.6.
Storage mode - crc-multi
5.1.7.
Storage mode - cache
5.1.8.
SQL structure notes
5.1.9.
Additional features of non-CRC storage modes
5.2.
Cache mode storage
5.2.1.
Introduction
5.2.2.
Cache mode word indexes structure
5.2.3.
Cache mode tools
5.2.4.
Starting cache mode
5.2.5.
Optional usage of several splitters
5.2.6.
Using run-splitter script
5.2.7.
Doing search
5.2.8.
Using search limits
5.3.
DataparkSearch
performance issues
5.3.1.
searchd
usage recommendation
5.3.2.
Memory based filesystem (mfs) usage recommendation
5.3.3.
MySQL performance
5.3.4.
Post-indexing optimization
5.4.
SearchD support
5.4.1.
Why using searchd
5.4.2.
Starting searchd
5.5.
Oracle notes
5.5.1.
5.5.2.
Compilation, Installation and Configuration
6.
Subsections
6.1.
Tags
6.1.1.
Tags in SQL version
6.2.
Categories
7.
Languages support
7.1.
Character sets
7.1.1.
Supported character sets
7.1.2.
Character sets aliases
7.1.3.
Recoding
7.1.4.
Recoding at search time
7.1.5.
Document charset detection
7.1.6.
Automatic charset guesser
7.1.7.
Default charset
7.1.8.
Default Language
7.1.9.
Recoding during search
7.2.
Making multi-language search pages
7.2.1.
How does it work?
7.2.2.
Possible troubles
7.3.
Segmenters for Chinese, Japanese, Korean and Thai languages
7.3.1.
Japanese language phrase segmenter
7.3.2.
Chinese language phrase segmenter
7.3.3.
Thai language phrase segmenter
7.3.4.
Korean language phrase segmenter
7.4.
Multilingual servers support
8.
Searching documents
8.1.
Using search front-ends
8.1.1.
Performing search
8.1.2.
Search parameters
8.1.3.
Changing different document parts weights at search time
8.1.4.
Using front-end with an shtml page
8.1.5.
Using several templates
8.1.6.
Advanced boolean search
8.1.7.
How search handles expired documents
8.2.
mod_dpsearch
module for Apache httpd
8.2.1.
Why using
mod_dpsearch
8.2.2.
Configuring
mod_dpsearch
8.3.
How to write search result templates
8.3.1.
Template sections
8.3.2.
Variables section
8.3.3.
Includes in templates
8.3.4.
Conditional template operators
8.3.5.
Security issues
8.4.
Designing search.html
8.4.1.
How the results page is created
8.4.2.
Your HTML
8.4.3.
Forms considerations
8.4.4.
Relative links in search.htm
8.4.5.
Adding Search form to other pages
8.5.
Relevance
8.5.1.
Ordering documents
8.5.2.
Relevance calculation
8.5.3.
Popularity rank
8.5.4.
Boolean search
8.5.5.
Crosswords
8.5.6.
The Summary Extraction Algorithm (SEA)
8.6.
Search queries tracking
8.7.
Search results cache
8.8.
Fuzzy search
8.8.1.
Ispell
8.8.2.
Aspell
8.8.3.
Synonyms
8.8.4.
Accent insensitive search
8.8.5.
Acronyms and abbreviations
9.
Miscellaneous
9.1.
Reporting bugs
9.1.1.
Core dump reports
9.2.
Using
libdpsearch
library
9.2.1.
dps-config
script
9.2.2.
DataparkSearch
API
9.3.
Database schema
A.
Donations
Index
List of Tables
3-1.
Verbose levels
5-1.
Cache limit types
7-1.
Language groups
7-2.
Charsets aliases
8-1.
Available search parameters
8-2.
Configure-time parameters to tune relevance calculation (switches for
configure
)
9-1.
server
table schema
9-2.
Several server's parameters values in
srvinfo
table
Next
Introduction