This document describes how to operate the Harvest Tool software for use with PDS4 data product registration. This tool incorporates functionality from the Harvest Tool and the Search Core applications and utilizes as input, configuration files from both tools. For information on how to configure the tool harvesting or search index generation, please see the respective Operation guides for those tools. The following topics can be found in this document:
Note: The command-line examples in this section have been broken into multiple lines for readability. The commands should be reassembled into a single line prior to execution.
This section describes how to execute the Harvest Tool.
The following table describes the command-line options available:
Command-Line Option | Description |
---|---|
-c, --harvest-config <file> | Specify a harvest policy configuration file to set the tool behavior. (This flag is required) |
-C, --doc-config <dir> | Specify the directory location where the document generation configuration files reside. The default is to look in the 'search-conf' directory that resides in the tool package. |
-D, --ignore-dir | Specify patterns to look for when traversing a target directory for sub-directories to ignore. Each pattern must be surrounded by quotes (i.e. -D "CATALOG"). |
-e, --regexp | Specify file patterns to look for when registering products from a target directory. Each pattern must be surrounded by quotes (i.e. -e "*.xml"). |
-h, --help | Display harvest usage. |
-l, --log-file | Specify a log file name. Default is standard out. |
-o, --output-dir | Specify a directory location to tell the tool where to output the Solr documents. The default is to write to the current working directory. |
-P, --port | Specify a port number to use if running the tool in persistence mode. |
-pds3, --is-pds3-dir | Specify the flag to indicate that the target passed into the command-line is a PDS3 directory. The default assumes that any targets passed into the command-line are PDS4 directories. |
-v, --verbose | Specify the message severity level and above to include in the log (0=Debug, 1=Info, 2=Warning, 3=Error). Default is Info and above (level 1). |
-V, --version | Display the release number and copyright information. |
-w, --wait | Specify the time, in seconds, to wait in between the crawls if running the tool in persistence mode. |
The following command demonstrates the recommended way to run Harvest:
%> harvest -c ../harvest-conf/harvest-policy-example.xml -C ../search-conf/defaults/pds/pds4
In the example above, the -c flag option specifies the example harvest policy configuration file while the -C flag option specifies location for the default search policy configuration files. The following command is a MAVEN-specific example:
% ./harvest -c ../harvest-conf/harvest-policy-maven-ngims.xml -C ../search-conf/maven-ngims \ -o ../ -l ../harvest-maven-ngims.log
The above command will write out the search index files for the MAVEN NGIMS bundle into a solr-docs directory one directory up from the bin directory. In an environment where multiple bundles will be indexed, that directory should be renamed and then copied to the local Search Service installation where the search-core index and post commands can be executed to make that content available from the Search Service:
% mv ../solr-docs ../solr-docs-maven-ngims % cp -r ../solr-docs-maven-ngims /usr/local/search-service/pds/solr-docs % search-core -P -H /usr/local/search-service/pds