Operation

This document describes how to operate the Harvest-PDAP Tool software to ingest into the PDS Registry Service. The following topics can be found in this document:

Note: The command-line examples in this section have been broken into multiple lines for readability. The commands should be reassembled into a single line prior to execution.

Tool Execution

Harvest-PDAP Tool can be executed in various ways. This section describes how to run the tool, as well as its behaviors and caveats.

Command-Line Options

The following table describes the command-line options available:

Command-Line OptionDescription
-c, --configSpecify a policy configuration file to set the tool behavior. (This flag is required)
-u, --usernameSpecify a username for authentication with the PDS Security Service.
-p, --passwordSpecify a password associated with the username.
-k, --keystore-passSpecify a keystore password associated with the keystore file being passed into the tool.
-l, --log-fileSpecify a log file name. Default is standard out.
-b, --batch-modeTells the tool to perform batch registration. Optionally specify an integer value that represents how many products to ingest at one time. The default is to register 50 products at a time if no value is specified.
-v, --verboseSpecify the message severity level and above to include in the log (0=Debug, 1=Info, 2=Warning, 3=Error). Default is Info and above (level 1).
-V, --versionDisplay the release number and copyright information.
-h, --helpDisplay harvest usage.

Execute Harvest-PDAP Tool

The Harvest-PDAP Tool operates with a policy fie to register product metadata. Details on host to create this policy file can be found in the Policy File section.

This section demonstrates some of the ways that the tool can be executed:

  • Registering Products from a Configured Source
  • Registering Products to a Secured Registry Instance

Registering Products from a Configured Source

The following command demonstrates how to register products to a non-secured registry instance from a source specified in the policy file and to direct the output to a log file:

% harvest-pdap -c policy.xml -l output.log
        

Registering Products to a Secured Registry Instance

The following command demonstrates the previous scenario but registering products to a secured registry instance:

% harvest-pdap -c policy.xml -u {username} -p {password} -k {keystorePassword} \
-l output.log
        

Policy File

The Harvest-PDAP policy file is an XML-based configuration file that the tool uses to find products and register their metadata. This section details how to setup the policy file to do PDS product registration. The following is an example of a policy file to perform registration of ESA/PSA products:

<policy>
   <pdsRegistry url="http://localhost:8080/registry">
     <packageName>Harvest-PDAP Package</packageName>
     <packageDescription>
       This is a run that includes registration of products from the PSA.
     </packageDescription>
   </pdsRegistry>
   <pdapServices>
     <!-- Currently, the only valid value for 'agency' is 'esa'. -->
     <!-- Can optionally specify a 'startDate' to only get back data sets from the given date. -->
     <pdapService agency="esa" url="https://archives.esac.esa.int/psa"/>
   </pdapServices>
   <productMetadata>
     <!--
       All context product metadata definitions are captured in the
       global policy.
     -->
   </productMetadata>
   <resourceMetadata>
     <title>The Planetary Science Archive METADATA Query Service</title>
     <type>System.Browse</type>
     <slot name="resource_name">
       <value>The Planetary Science Archive METADATA Query Service</value>
     </slot>
     <slot name="resource_description">
       <value>The Planetary Science Archive METADATA Query Service</value>
     </slot>
   </resourceMetadata>
</policy>
      

The policy file is made up of the following complex type elements: pdsRegistry, pdapServices, productMetadata and resourceMetadata.

pdsRegistry

Each time the Harvest Tool runs, it creates a package in the registry to group the product registrations together. Specify this element to give a registry package a name and/or description. A required attribute of the pdsRegistry element named url, must be populated to specify the endpoint of the Registry Service. The following table describes the child elements that are allowed:

Element NameDescription
packageNameSpecify a package name. If this element is not specified, the default is to create a package with the name Harvest-Package_<current datetimestamp>.
packageDescriptionSpecify a package description. If this element is not specified, the default is to create a description that lists the targets that were specified in the policy config file.

pdapServices

Specify this element to indicate the PDAP service endpoint for accessing product metadata for registration with the Registry Service. The following table describes the child elements that are allowed:

Element NameDescription
pdapServiceSpecify the PDAP service endpoint by populating the two required attributes. The attributes are named agency and url. The only valid value for agency at this time is "esa" with the corresponding url value of "https://archives.esac.esa.int/psa".

An optional atrribute can be specified within the pdapService element to fine tune the query: startDate. This attribute represents the date at which the data set was released to the public. The format of the date value should be YYYY-MM-DD As an example, the following specification will return ESA data sets starting from 2015-01-01:

<pdapService agency="esa" url="https://archives.esac.esa.int/psa" startDate="2015-01-01"/>
      

productMetadata

Specify this element to include additional metadata in the form of registry slots for every product. The following table describes the child elements that are allowed:

Element NameDescription
staticMetadataSpecify static metadata for every product.
dynamicMetadataSpecify dynamic metadata for every product.

staticMetadata

Specify this element to include static metadata for every product. The following table describes the child elements that are allowed:

Element NameDescription
slotThis element contains a required name attribute to specify the name of the slot to use in the registry. The value child element specifies the slot value. This child element may be repeated multiple times to indicate multiple values.

dynamicMetadata

Specify this element to include dynamic metadata for every product. The following table describes the child elements that are allowed:

Element NameDescription
elementThis element contains a required name attribute to specify the keyword in the target metadata. The slotName child element specifies the slot name to populate with the value from the keyword. This child element may be repeated multiple times to indicate multiple slots.

resourceMetadata

Specify this element to include metadata for every resource product. A corresponding resource product is registered for every data set product registered. The following table describes the child elements that are allowed:

Element NameDescription
titleSpecify a title for the resource product.
typeSpecify the type of resource product. The most common value is "System.Browse".
slotSpecify additional metadata for the resource product. This element contains a required name attribute to specify the name of the slot to use in the registry. The value child element specifies the slot value. This child element may be repeated multiple times to indicate multiple values.

Report Format

This section describes the contents of the Harvest-PDAP Tool report. At this time, the tool only outputs a series of log messages. The log will report the success or failure of a discovered product attempting to be registered. A log consists of a severity level, file name, and a message. The following is an example of some of the log messages that can be expected from the Harvest Tool:

PDS Harvest-PDAP Tool Log

Version                     Version 0.1.1
Time                        Sun, Jun 09 2013 at 01:40:00 PM
Severity Level              INFO
PDAP Target(s)              [https://archives.esac.esa.int/psa]
Registry Location           http://localhost:8080/registry-psa
Registry Package Name       Harvest-PDAP Package
Registration Package GUID   urn:uuid:c554631c-a353-4b0f-9643-4796a921d523

INFO:   Connecting to PDAP Service: https://archives.esac.esa.int/psa
** AdaptiveByteStore default memory limit = 986M * 0.125 = 123M **
** malloc 2778306 bytes **
INFO:   [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \
Processing dataset.
INFO:   [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \
Additional metadata needed. Getting dataset catalog file.
INFO:   [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \
Retrieving VOLDESC.CAT to look up the data set catalog file name.
INFO:   [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \
Retrieving catalog file: DATASET.CAT
SUCCESS:   [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \
Successfully ingested product: \
urn:nasa:pds:...AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0::1.0
INFO:   [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \
Product guid is urn:uuid:e7bba690-c946-4a80-b911-2a3a47fdde14
SUCCESS:   [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \
Successfully ingested product: \
urn:nasa:pds:...AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0::1.0
INFO:   [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \
Product guid is urn:uuid:5147a7a6-dced-4402-8dd7-0b401c56b1f3

...

Summary:

2712 dataset(s) processed
64 error(s), 7290 warning(s)

2712 of 2712 datasets registered.

2712 of 2712 resources registered.

End of Log