Operation

This document describes how to operate the Harvest Tool software for use with PDS4 data product registration. This tool incorporates functionality from the Harvest Tool and the Search Core applications and utilizes as input, configuration files from both tools. For information on how to configure the tool harvesting or search index generation, please see the respective Operation guides for those tools. The following topics can be found in this document:

Note: The command-line examples in this section have been broken into multiple lines for readability. The commands should be reassembled into a single line prior to execution.

Tool Execution

This section describes how to execute the Harvest Tool.

Command-Line Options

The following table describes the command-line options available:

Command-Line OptionDescription
-c, --harvest-config <file>Specify a harvest policy configuration file to set the tool behavior. (This flag is required)
-C, --doc-config <dir>Specify the directory location where the document generation configuration files reside. The default is to look in the 'search-conf' directory that resides in the tool package.
-D, --ignore-dirSpecify patterns to look for when traversing a target directory for sub-directories to ignore. Each pattern must be surrounded by quotes (i.e. -D "CATALOG").
-e, --regexpSpecify file patterns to look for when registering products from a target directory. Each pattern must be surrounded by quotes (i.e. -e "*.xml").
-h, --helpDisplay harvest usage.
-l, --log-fileSpecify a log file name. Default is standard out.
-o, --output-dirSpecify a directory location to tell the tool where to output the Solr documents. The default is to write to the current working directory.
-P, --portSpecify a port number to use if running the tool in persistence mode.
-pds3, --is-pds3-dirSpecify the flag to indicate that the target passed into the command-line is a PDS3 directory. The default assumes that any targets passed into the command-line are PDS4 directories.
-v, --verboseSpecify the message severity level and above to include in the log (0=Debug, 1=Info, 2=Warning, 3=Error). Default is Info and above (level 1).
-V, --versionDisplay the release number and copyright information.
-w, --waitSpecify the time, in seconds, to wait in between the crawls if running the tool in persistence mode.

Execute Harvest Tool

The following command demonstrates the recommended way to run Harvest:

%> harvest -c ../harvest-conf/harvest-policy-example.xml -C ../search-conf/defaults/pds/pds4
        

In the example above, the -c flag option specifies the example harvest policy configuration file while the -C flag option specifies location for the default search policy configuration files. The following command is a MAVEN-specific example:

% ./harvest -c ../harvest-conf/harvest-policy-maven-ngims.xml -C ../search-conf/maven-ngims \
-o ../ -l ../harvest-maven-ngims.log
        

The above command will write out the search index files for the MAVEN NGIMS bundle into a solr-docs directory one directory up from the bin directory. In an environment where multiple bundles will be indexed, that directory should be renamed and then copied to the local Search Service installation where the search-core index and post commands can be executed to make that content available from the Search Service:

% mv ../solr-docs ../solr-docs-maven-ngims
% cp -r ../solr-docs-maven-ngims /usr/local/search-service/pds/solr-docs
% search-core -P -H /usr/local/search-service/pds