NASA - National Aeronautics and Space Administration

+ NASA Homepage
+ NASA en Español
+ Contact NASA
Go
Planetary Data System - Engineering Node Banner

Operation

The following topics can be found in this section:

Note: The command-line examples in this section have been broken into multiple lines for readability. The commands should be reassembled into a single line prior to execution.

Tool Setup

In order to execute Harvest Tool, the user's environment must first be configured appropriately. This section describes how to setup the user environment on UNIX-based and Windows machines.

UNIX-Based Setup

This section details the environment setup for UNIX-based machines. The preferred method is to specify the shell script, Harvest, on the command-line. Setting the PATH environment variable to the location of the script, enables the shell script to be executed from any location on the user's machine.

The following command demonstrates how to set the PATH environment variable, by appending to its current setting:

% setenv PATH ${PATH}:$HOME/harvest-0.2.0/bin
        

The tool can now be executed via the shell script as demonstrated in the following example:

% Harvest <policy file> <command-line arguments>
        

Additional methods for setting up a UNIX-based environment can be found in the UNIX Setup Options section. If viewing this document in PDF form, see the appendix for details.

Windows Setup

This section details the environment setup for Windows machines. The preferred method is to specify the batch file, Harvest.bat, on the command-line. Setting the PATH environment variable to the location of the file, enables the batch file to be executed from any location on the user's machine.

The following command demonstrates how to set the PATH environment variable, by appending to its current setting:

C:\> set PATH = %PATH%;C:\harvest-0.2.0\bin
        

The tool can now be executed via the batch file as demonstrated in the following example:

C:\> Harvest <policy file> <command-line arguments>
        

Additional methods for setting up a Windows environment can be found in the Windows Setup Options section. If viewing this document in PDF form, see the appendix for details.

Additional Tool Setup

This section details how to re-configure the Harvest Tool to interface with another instance of the PDS Registry Service. This would be in cases where a user wants to test the registration of data products on a local instance of the Registry first before registering the products on the operational release of the PDS Registry. Users that wish to not do this can skip this section.

The Harvest Tool points to the PDS Registry and Security Service through the following Java System Properties:

System Property NameDescription
pds.registrySpecify the URL to the PDS Registry Service. This property is required.
pds.securitySpeicfy the URL to the PDS Security Service. If the PDS Registry being pointed to is not secured (i.e. a local instance), then this property can be omitted.

By default, the Harvest shell script and batch file point to the operational releases of the PDS Registry and Security Services. The sections below detail how to modify these scripts to point to another instance of these services.

UNIX-Based Users

Open the Harvest shell script and go to the last line in the file. It should look like the following:

% java \
-Dpds.registry="http://pdsops2.jpl.nasa.gov/registry-service" \
-Dpds.security="http://pdsops.jpl.nasa.gov/openam/identity" \
-jar ${HARVEST_JAR} "$@"
        

Replace the URL values of pds.registry and/or pds.security with URLs to the desired instance of the PDS Registry and/or Security. For example, making the following change to the script will have Harvest pointing to a non-secured, local instance of the PDS Registry on port 8080:

% java -Dpds.registry="http://localhost:8080/registry-service" -jar ${HARVEST_JAR} "$@"
        

Windows-Based Users

Open the Harvest batch and go to the last line in the file. It should look like the following:

% java \
-Dpds.registry="http://pdsops2.jpl.nasa.gov/registry-service" \
-Dpds.security="http://pdsops.jpl.nasa.gov/openam/identity" \
-jar "%HARVEST_JAR%" %*
        

Replace the URL values of pds.registry and/or pds.security with URLs to the desired instance of the PDS Registry and/or Security. For example, making the following change to the batch file will have Harvest pointing to a non-secured, local instance of the PDS Registry on port 8080:

% java -Dpds.registry="http://localhost:8080/registry-service" -jar "%HARVEST_JAR%" %*
        

Tool Execution

Harvest Tool can be executed in various ways. This section describes how to run the tool, as well as its behaviors and caveats.

Command-Line Options

The following table describes the command-line options available:

Command-Line OptionDescription
-u, --usernameSpecify a username for authentication with the PDS Security Service.
-p, --passwordSpecify a password associated with the username.
-l, --log-fileSpecify a log file name. Default is standard out.
-V, --versionDisplay the release number and copyright information.
-h, --helpDisplay Harvest usage.

Execute Harvest Tool

This section demonstrates execution of the tool using the command-line options. The examples below execute the tool via the batch/shell script. Alternate methods for executing the tool can be found in the Tool Setup section.

The Harvest Tool operates with a policy file to register product metadata. Details on how to create this policy file can be found in the Harvest Policy File section.

The following command demonstrates how to run the Harvest Tool against a policy file, policy.xml, using a valid username, pds, and password, mypwd, with the output going to standard out:

% Harvest policy.xml -u {username} -p {password}
        

The following command demonstrates how to run the Harvest Tool with the output going to a log file, log.txt instead of standard out:

% Harvest policy.xml -u {username} -p {password} -l log.txt
        

Harvest Policy File

The Harvest policy file is an XML file that the tool uses to find products and register its metadata. The schema for the policy file can be found in the Harvest Policy Schema section. If viewing this document in PDF form, see the appendix for details.

The following is an example of a policy file:

<?xml version="1.0" encoding="UTF-8"?>
<policy>
   <bundles>
      <file>/home/pds4/context-bundle/bundle.xml</file>
   </bundles>
   <collections>
      <file>/home/pds4/insthost/collection_instrument_host.xml</file>
   </collections>
   <directories>
      <path>/home/user/pds4/geo/product_files</path>
      <filePattern>*.xml</filePattern>
   </directories>
   <candidates>
      <namespace prefix="geo" uri="http://pds.nasa.gov/schema/pds4/geo"/>
      <productMetadata objectType="character_table">
         <xPath>//geo:Product_Identification_Area/geo:creation_date_time</xPath>
         <xPath>//geo:Subject_Area/geo:instrument_name</xPath>
         <xPath>//Subject_Area/observing_system_name</xPath>
      </productMetadata>
      <productMetadata objectType="Product_Target">
         <xPath>//alternate_title</xPath>
         <xPath>//creation_date_time</xPath>
         <xPath>//identifier</xPath>
         <xPath>//Subject_Area/target_name</xPath>
      </productMetadata>
   </candidates>
</policy>
        

The policy file is made up of the following complex type elements: bundles, collections, directories, candidates, and productMetadata.

bundles

Specify this element to tell the Harvest Tool to register and crawl a bundle file. The following table describes the elements that are allowed:

Element NameDescription
fileSpecify a bundle file. Specify this element tag more than once to point to multiple bundle files.

In the example above, the Harvest Tool will register the bundle file named /home/pds4/context-bundle/bundle.xml. It will then crawl the bundle file, looking for collection files to register and process.

collections

Specify this element to tell the Harvest Tool to register and crawl a collection file. Crawling only occurs when the collection file is a primary collection. This is indicated by a value of true in the is_primary_collection element tag within the collection.

The following table describes the elements that are allowed:

Element NameDescription
fileSpecify a collection file. Specify this element tag more than once to point to multiple collection files.

In the example above, the Harvest Tool will register the collection file named /home/pds4/insthost/collection_instrument_host.xml. It will then crawl the file, looking for products to register if it is a primary collection.

directories

Specify this element to tell the Harvest Tool where to crawl for data products. The following table describes the elements that are allowed:

Element NameDescription
pathSpecify a directory path to start crawling. Specify this element tag more than once to point to multiple directories to crawl.
filePatternSpecify a file pattern to look for specific files. If omitted, the default is to get all files within a directory.

In the example above, the Harvest tool will crawl the directory location, /home/user/pds4/geo/product_files, looking for files that have a .xml file extension. The default is to touch all files in the directory if the filePattern element is omitted from the policy file.

candidates

Specify this element to tell the Harvest Tool what product types to register and what metadata to extract from a data product. This is a required element in the policy file. The following table describes the elements that are allowed:

Element NameDescription
namespaceSpecify to allow the Harvest Tool to extract metadata that is in a namespace other than the default PDS namespace.
productMetadataSpecify to tell the tool what object types and what metadata to register.

By default, the Harvest Tool defines the default namespace to be the PDS namespace, http://pds.nasa.gov/schema/pds4/pds. To override this default, specify the default attribute in the namespace element and give it a value of true. The following makes the geo namespace the default namespace:

          <candidates>
          <namespace prefix="geo" uri="http://pds.nasa.gov/schema/pds4/geo" default="true"/>
          ...
        

Namespaces need to be defined in the Harvest policy file only if the metadata to be extracted exists in a namespace other than the PDS namespace. In the example above, a namespace with the prefix geo and uri http://pds.nasa.gov/schema/pds4/geo has been defined. This means that any xPath expressions defined in the policy file will be able to use the geo prefix to be able to extract metadata that are within the geo namespace. xPaths will be explained in greater detail in the productMetadata section.

productMetadata

Specify this element to tell the Harvest Tool what metadata to register. It requires an attribute called objectType that tells the Harvest Tool what product types to register. The following table describes the elements that are allowed:

Element NameDescription
xPathSpecify an XPath expression to extract metadata.

In the example above, the policy file tells the Harvest Tool to look for and register the character_table and Product_Target object types.

Also in the example is a set of xPath elements found under each productMetadata element. This defines what metadata to extract from the different products. XPath is a query language that uses path expressions to select nodes in an XML document. These path expressions look very much like expressions in a traditional computer file system. In its simplest form, prepending a // before a name will find the element no matter where it is in the XML file.

The following XPath expression will find the creation_date_time element within the default namespace, no matter where this element is located in the file:

//creation_date_time
        

The following XPath expression will find the creation_date_time element within the geo namespace, no matter where this element is located in the file:

//geo:creation_date_time
        

The following XPath expression will find all target_name elements that are children of Subject_Area within the default namespace:

//Subject_Area/target_name
        

The following XPath expression will find all target_name elements that are children of Subject_Area and that have a value of MARS:

//Subject_Area/target_name[text()="MARS"]
        

For a more detailed explanation on XPath, go to your favorite search engine and type XPath tutorial.

Report Format

This section describes the contents of the Harvest Tool report. At this time, the Harvest Tool only outputs a series of log messages. The log will report the success or failure of a discovered product attempting to be registered. Additionally, any syntactical errors in a discovered product are reported. A log consists of a severity level, file name, and a message. The following is an example of some of the log messages that can be expected from the Harvest Tool:

PDS Harvest Tool Log

Version             Version 0.2.0-dev
Time                Wed, Sep 29 2010 at 02:02:27 PM
Registry Location   http://localhost:8080/registry-service

INFO:   [C:\pds4\geo\BUGLAB_Archive_Bundle.xml] Begin processing.
SKIP:   [C:\pds4\geo\BUGLAB_Archive_Bundle.xml] 'archive bundle' is not 
an object type found in the policy file.
INFO:   [C:\pds4\geo\schema\BUGLAB_Archive_Bundle.xml] Begin processing.
SKIP:   [C:\pds4\geo\schema\BUGLAB_Archive_Bundle.xml] 'XML_Schema' is not 
an object type found in the policy file.
INFO:   [C:\pds4\geo\schema\BUGLAB_Collection.xml] Begin processing.
SKIP:   [C:\pds4\geo\schema\BUGLAB_Collection.xml] 'XML_Schema' is not an 
object type found in the policy file.
INFO:   [C:\pds4\geo\schema\BUGLAB_Schema_Collection.xml] Begin processing.
SKIP:   [C:\pds4\geo\schema\BUGLAB_Schema_Collection.xml] 'collection' is 
not an object type found in the policy file.
INFO:   [C:\pds4\geo\schema\BUG_BDRF_product.xml] Begin processing.
SKIP:   [C:\pds4\geo\schema\BUG_BDRF_product.xml] 'XML_Schema' is not an 
object type found in the policy file.
INFO:   [C:\pds4\geo\schema\BUG_Document_Set.xml] Begin processing.
SKIP:   [C:\pds4\geo\schema\BUG_Document_Set.xml] 'XML_Schema' is not an 
object type found in the policy file.
INFO:   [C:\pds4\geo\schema\Data_Dict_2010-04-22f.xml] Begin processing.
SKIP:   [C:\pds4\geo\schema\Data_Dict_2010-04-22f.xml] 'XML_Schema' is not 
an object type found in the policy file.
INFO:   [C:\pds4\geo\schema\Data_Dict_commpds3_2010-04-22f.xml] Begin processing.
SKIP:   [C:\pds4\geo\schema\Data_Dict_commpds3_2010-04-22f.xml] 'XML_Schema' 
is not an object type found in the policy file.
INFO:   [C:\pds4\geo\schema\Data_Types_2010-04-22f.xml] Begin processing.
SKIP:   [C:\pds4\geo\schema\Data_Types_2010-04-22f.xml] 'XML_Schema' is not an 
object type found in the policy file.
INFO:   [C:\pds4\geo\schema\Product_XML_Schema.xml] Begin processing.
SKIP:   [C:\pds4\geo\schema\Product_XML_Schema.xml] 'XML_Schema' is not an 
object type found in the policy file.
INFO:   [C:\pds4\geo\mars_analog_data\aref_235_450.xml] Begin processing.
SUCCESS:   [C:\pds4\geo\mars_analog_data\aref_235_450.xml] Succesfully registered product: \
URN:NASA:PDS:BUGLAB-GB:BUGLAB-GB:MARS-ANALOG-SAMPLE-DATA:AREF_235_450::1.0
INFO:   [C:\pds4\geo\mars_analog_data\aref_235_480.xml] Begin processing.
SUCCESS:   [C:\pds4\geo\mars_analog_data\aref_235_480.xml] Succesfully registered product: \
URN:NASA:PDS:BUGLAB-GB:BUGLAB-GB:MARS-ANALOG-SAMPLE-DATA:AREF_235_480::1.0
INFO:   [C:\pds4\geo\mars_analog_data\aref_235_530.xml] Begin processing.
SUCCESS:   [C:\pds4\geo\mars_analog_data\aref_235_530.xml] Succesfully registered product: \
URN:NASA:PDS:BUGLAB-GB:BUGLAB-GB:MARS-ANALOG-SAMPLE-DATA:AREF_235_530::1.0
INFO:   [C:\pds4\geo\mars_analog_data\aref_235_600.xml] Begin processing.
SUCCESS:   [C:\pds4\geo\mars_analog_data\aref_235_600.xml] Succesfully registered product: \
URN:NASA:PDS:BUGLAB-GB:BUGLAB-GB:MARS-ANALOG-SAMPLE-DATA:AREF_235_600::1.0
INFO:   [C:\pds4\geo\mars_analog_data\aref_235_670.xml] Begin processing.
SUCCESS:   [C:\pds4\geo\mars_analog_data\aref_235_670.xml] Succesfully registered product: \
URN:NASA:PDS:BUGLAB-GB:BUGLAB-GB:MARS-ANALOG-SAMPLE-DATA:AREF_235_670::1.0
INFO:   [C:\pds4\geo\mars_analog_data\aref_235_750.xml] Begin processing.
SUCCESS:   [C:\pds4\geo\mars_analog_data\aref_235_750.xml] Succesfully registered product: \
URN:NASA:PDS:BUGLAB-GB:BUGLAB-GB:MARS-ANALOG-SAMPLE-DATA:AREF_235_750::1.0
INFO:   [C:\pds4\geo\mars_analog_data\aref_235_800.xml] Begin processing.
SUCCESS:   [C:\pds4\geo\mars_analog_data\aref_235_800.xml] Succesfully registered product: \
URN:NASA:PDS:BUGLAB-GB:BUGLAB-GB:MARS-ANALOG-SAMPLE-DATA:AREF_235_800::1.0
INFO:   [C:\pds4\geo\mars_analog_data\aref_235_860.xml] Begin processing.
SUCCESS:   [C:\pds4\geo\mars_analog_data\aref_235_860.xml] Succesfully registered product: \
URN:NASA:PDS:BUGLAB-GB:BUGLAB-GB:MARS-ANALOG-SAMPLE-DATA:AREF_235_860::1.0
INFO:   [C:\pds4\geo\mars_analog_data\aref_235_900.xml] Begin processing.
SUCCESS:   [C:\pds4\geo\mars_analog_data\aref_235_900.xml] Succesfully registered product: \
URN:NASA:PDS:BUGLAB-GB:BUGLAB-GB:MARS-ANALOG-SAMPLE-DATA:AREF_235_900::1.0
INFO:   [C:\pds4\geo\mars_analog_data\aref_235_930.xml] Begin processing.
SUCCESS:   [C:\pds4\geo\mars_analog_data\aref_235_930.xml] Succesfully registered product: \
URN:NASA:PDS:BUGLAB-GB:BUGLAB-GB:MARS-ANALOG-SAMPLE-DATA:AREF_235_930::1.0
INFO:   [C:\pds4\geo\mars_analog_data\aref_235_990.xml] Begin processing.
SUCCESS:   [C:\pds4\geo\mars_analog_data\aref_235_990.xml] Succesfully registered product: \
URN:NASA:PDS:BUGLAB-GB:BUGLAB-GB:MARS-ANALOG-SAMPLE-DATA:AREF_235_990::1.0
INFO:   [C:\pds4\geo\mars_analog_data\MAS_Data_Collection.xml] Begin processing.
SKIP:   [C:\pds4\geo\mars_analog_data\MAS_Data_Collection.xml] 'collection' is not 
an object type found in the policy file.
INFO:   [C:\pds4\geo\geometry\BUGLAB_Geometry_Collection.xml] Begin processing.
SKIP:   [C:\pds4\geo\geometry\BUGLAB_Geometry_Collection.xml] 'collection' is not 
an object type found in the policy file.
INFO:   [C:\pds4\geo\geometry\geominfo.xml] Begin processing.
SKIP:   [C:\pds4\geo\geometry\geominfo.xml] 'document_set' is not an object type 
found in the policy file.
INFO:   [C:\pds4\geo\context\BUGLAB_Context_Collection.xml] Begin processing.
SKIP:   [C:\pds4\geo\context\BUGLAB_Context_Collection.xml] 'collection' is not an 
object type found in the policy file.
INFO:   [C:\pds4\geo\context\bug_instrument.xml] Begin processing.
SKIP:   [C:\pds4\geo\context\bug_instrument.xml] 'document_set' is not an object 
type found in the policy file.
INFO:   [C:\pds4\geo\context\bug_laboratory.xml] Begin processing.
SKIP:   [C:\pds4\geo\context\bug_laboratory.xml] 'document_set' is not an object 
type found in the policy file.
INFO:   [C:\pds4\geo\context\bug_mars_data_set.xml] Begin processing.
SKIP:   [C:\pds4\geo\context\bug_mars_data_set.xml] 'document_set' is not an object 
type found in the policy file.
INFO:   [C:\pds4\geo\about\aareadme.xml] Begin processing.
SKIP:   [C:\pds4\geo\about\aareadme.xml] 'document_set' is not an object type found 
in the policy file.
INFO:   [C:\pds4\geo\about\BUGLAB_About_Collection.xml] Begin processing.
SKIP:   [C:\pds4\geo\about\BUGLAB_About_Collection.xml] 'collection' is not an object 
type found in the policy file.

Summary:

11 of 30 files are candidate products, 19 skipped
11 of 11 candidate products registered.
0 of 0 associations registered.

End of Log
      

FirstGov Logo
+ Freedom of Information Act
+ NASA 2003 Strategic Plan
+ NASA Privacy Statement, Disclaimer, and
   Accessiblity Certification

+ Copyright/Image Use Policy
NASA Logo
Curator: Emily.S.Law
Webmaster: Maryia Sauchanka-Davis
NASA Official: William Knopf
Last Updated:
+ Comments and Questions