This document describes how to operate the Harvest-PDAP Tool software to ingest into the PDS Registry Service. The following topics can be found in this document:
Note: The command-line examples in this section have been broken into multiple lines for readability. The commands should be reassembled into a single line prior to execution.
Harvest-PDAP Tool can be executed in various ways. This section describes how to run the tool, as well as its behaviors and caveats.
The following table describes the command-line options available:
Command-Line Option | Description |
---|---|
-c, --config | Specify a policy configuration file to set the tool behavior. (This flag is required) |
-u, --username | Specify a username for authentication with the PDS Security Service. |
-p, --password | Specify a password associated with the username. |
-k, --keystore-pass | Specify a keystore password associated with the keystore file being passed into the tool. |
-l, --log-file | Specify a log file name. Default is standard out. |
-b, --batch-mode | Tells the tool to perform batch registration. Optionally specify an integer value that represents how many products to ingest at one time. The default is to register 50 products at a time if no value is specified. |
-v, --verbose | Specify the message severity level and above to include in the log (0=Debug, 1=Info, 2=Warning, 3=Error). Default is Info and above (level 1). |
-V, --version | Display the release number and copyright information. |
-h, --help | Display harvest usage. |
The Harvest-PDAP Tool operates with a policy fie to register product metadata. Details on host to create this policy file can be found in the Policy File section.
This section demonstrates some of the ways that the tool can be executed:
Registering Products from a Configured Source
The following command demonstrates how to register products to a non-secured registry instance from a source specified in the policy file and to direct the output to a log file:
% harvest-pdap -c policy.xml -l output.log
Registering Products to a Secured Registry Instance
The following command demonstrates the previous scenario but registering products to a secured registry instance:
% harvest-pdap -c policy.xml -u {username} -p {password} -k {keystorePassword} \ -l output.log
The Harvest-PDAP policy file is an XML-based configuration file that the tool uses to find products and register their metadata. This section details how to setup the policy file to do PDS product registration. The following is an example of a policy file to perform registration of ESA/PSA products:
<policy> <pdsRegistry url="http://localhost:8080/registry"> <packageName>Harvest-PDAP Package</packageName> <packageDescription> This is a run that includes registration of products from the PSA. </packageDescription> </pdsRegistry> <pdapServices> <!-- Currently, the only valid value for 'agency' is 'esa'. --> <!-- Can optionally specify a 'startDate' to only get back data sets from the given date. --> <pdapService agency="esa" url="https://archives.esac.esa.int/psa"/> </pdapServices> <productMetadata> <!-- All context product metadata definitions are captured in the global policy. --> </productMetadata> <resourceMetadata> <title>The Planetary Science Archive METADATA Query Service</title> <type>System.Browse</type> <slot name="resource_name"> <value>The Planetary Science Archive METADATA Query Service</value> </slot> <slot name="resource_description"> <value>The Planetary Science Archive METADATA Query Service</value> </slot> </resourceMetadata> </policy>
The policy file is made up of the following complex type elements: pdsRegistry, pdapServices, productMetadata and resourceMetadata.
pdsRegistry
Each time the Harvest Tool runs, it creates a package in the registry to group the product registrations together. Specify this element to give a registry package a name and/or description. A required attribute of the pdsRegistry element named url, must be populated to specify the endpoint of the Registry Service. The following table describes the child elements that are allowed:
Element Name | Description |
---|---|
packageName | Specify a package name. If this element is not specified, the default is to create a package with the name Harvest-Package_<current datetimestamp>. |
packageDescription | Specify a package description. If this element is not specified, the default is to create a description that lists the targets that were specified in the policy config file. |
pdapServices
Specify this element to indicate the PDAP service endpoint for accessing product metadata for registration with the Registry Service. The following table describes the child elements that are allowed:
Element Name | Description |
---|---|
pdapService | Specify the PDAP service endpoint by populating the two required attributes. The attributes are named agency and url. The only valid value for agency at this time is "esa" with the corresponding url value of "https://archives.esac.esa.int/psa". |
An optional atrribute can be specified within the pdapService element to fine tune the query: startDate. This attribute represents the date at which the data set was released to the public. The format of the date value should be YYYY-MM-DD As an example, the following specification will return ESA data sets starting from 2015-01-01:
<pdapService agency="esa" url="https://archives.esac.esa.int/psa" startDate="2015-01-01"/>
productMetadata
Specify this element to include additional metadata in the form of registry slots for every product. The following table describes the child elements that are allowed:
Element Name | Description |
---|---|
staticMetadata | Specify static metadata for every product. |
dynamicMetadata | Specify dynamic metadata for every product. |
staticMetadata
Specify this element to include static metadata for every product. The following table describes the child elements that are allowed:
Element Name | Description |
---|---|
slot | This element contains a required name attribute to specify the name of the slot to use in the registry. The value child element specifies the slot value. This child element may be repeated multiple times to indicate multiple values. |
dynamicMetadata
Specify this element to include dynamic metadata for every product. The following table describes the child elements that are allowed:
Element Name | Description |
---|---|
element | This element contains a required name attribute to specify the keyword in the target metadata. The slotName child element specifies the slot name to populate with the value from the keyword. This child element may be repeated multiple times to indicate multiple slots. |
resourceMetadata
Specify this element to include metadata for every resource product. A corresponding resource product is registered for every data set product registered. The following table describes the child elements that are allowed:
Element Name | Description |
---|---|
title | Specify a title for the resource product. |
type | Specify the type of resource product. The most common value is "System.Browse". |
slot | Specify additional metadata for the resource product. This element contains a required name attribute to specify the name of the slot to use in the registry. The value child element specifies the slot value. This child element may be repeated multiple times to indicate multiple values. |
This section describes the contents of the Harvest-PDAP Tool report. At this time, the tool only outputs a series of log messages. The log will report the success or failure of a discovered product attempting to be registered. A log consists of a severity level, file name, and a message. The following is an example of some of the log messages that can be expected from the Harvest Tool:
PDS Harvest-PDAP Tool Log Version Version 0.1.1 Time Sun, Jun 09 2013 at 01:40:00 PM Severity Level INFO PDAP Target(s) [https://archives.esac.esa.int/psa] Registry Location http://localhost:8080/registry-psa Registry Package Name Harvest-PDAP Package Registration Package GUID urn:uuid:c554631c-a353-4b0f-9643-4796a921d523 INFO: Connecting to PDAP Service: https://archives.esac.esa.int/psa ** AdaptiveByteStore default memory limit = 986M * 0.125 = 123M ** ** malloc 2778306 bytes ** INFO: [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \ Processing dataset. INFO: [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \ Additional metadata needed. Getting dataset catalog file. INFO: [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \ Retrieving VOLDESC.CAT to look up the data set catalog file name. INFO: [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \ Retrieving catalog file: DATASET.CAT SUCCESS: [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \ Successfully ingested product: \ urn:nasa:pds:...AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0::1.0 INFO: [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \ Product guid is urn:uuid:e7bba690-c946-4a80-b911-2a3a47fdde14 SUCCESS: [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \ Successfully ingested product: \ urn:nasa:pds:...AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0::1.0 INFO: [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \ Product guid is urn:uuid:5147a7a6-dced-4402-8dd7-0b401c56b1f3 ... Summary: 2712 dataset(s) processed 64 error(s), 7290 warning(s) 2712 of 2712 datasets registered. 2712 of 2712 resources registered. End of Log