Operation

This document describes how to operate the Harvest-PDAP Tool software to ingest into the PDS Search Service. The following topics can be found in this document:

Note: The command-line examples in this section have been broken into multiple lines for readability. The commands should be reassembled into a single line prior to execution.

Tool Execution

Harvest-PDAP Tool can be executed in various ways. This section describes how to run the tool, as well as its behaviors and caveats.

Command-Line Options

The following table describes the command-line options available:

Command-Line OptionDescription
-c, --configSpecify a policy configuration file to set the tool behavior. (This flag is required)
-l, --log-fileSpecify a log file name. Default is standard out.
-v, --verboseSpecify the message severity level and above to include in the log (0=Debug, 1=Info, 2=Warning, 3=Error). Default is Info and above (level 1).
-V, --versionDisplay the release number and copyright information.
-h, --helpDisplay harvest usage.

Execute Harvest-PDAP Tool

The Harvest-PDAP Tool operates with a policy fie to register product metadata. Details on host to create this policy file can be found in the Policy File section.

This section demonstrates some of the ways that the tool can be executed:

The following example demonstrates how to ingest the ESA datasets into the PDS Search Service:

%> harvest-pdap -c ../harvest-conf/examples/harvest-pdap-policy-esa-search.xml -C ../search-conf/defaults -l output.log
        

In the example above, the -c flag option specifies the example harvest policy configuration file while the -C flag option specifies location for the default search policy configuration files. The following command is a MAVEN-specific example:

The above command will register the full product label into a Solr Collection index named .system, where it can be looked up using its Logical Identifier, but with period . characters instead of colon : characters due to limitations with Solr. Additionally, Harvest will write out the search index files for the target bundle into a solr-docs directory at the current working directory. In an environment where multiple bundles will be indexed, that directory should be renamed and then reside in a location that can be retrieved at a later point in the event that the Search Service will need to be re-indexed.

Once the Harvest run is complete, use the Solr Post Tool to ingest the Search documents. Depending on the deployment set up of the Search Service, run the appropriate command below:

For Non-Dockerized Search Service Instances

%> $SOLR_HOME/bin/post -c pds -params tr="add-hierarchy.xsl" $HOME/harvest-pdap-2.0.0/bin/solr-docs
        

The above command assumes that you have SOLR_HOME defined in your environment. The last parameter assumes that the solr-docs directory was created in the harvest-pdap-2.0.0/bin directory.

For Dockerized Search Service Instances

%> docker exec -it search-service post -c pds -params "tr=add-hierarchy.xsl" /data/solr-docs
        

The /data/solr-docs directory references a location within the Search Service Docker Container that is bind-mounted to the solr-docs directory at the Host machine. So this path should always get passed in for each Docker Post command.

In both scenarios above, the Search Documents get ingested into a Solr collection named pds.

Policy File

The Harvest-PDAP policy file is an XML-based configuration file that the tool uses to find products and register their metadata. This section details how to setup the policy file to do PDS product registration. The following is an example of a policy file to perform registration of ESA/PSA products into the Search Service:

<policy>

   <pdsSearch url="http://localhost:8983/solr">
     <!-- Specify this attribute to point to the location of the Search Core configuration files.
          Default is to point to the harvest-pdap/search-conf/defaults location.
          
     <searchConfigDirectory>/path/to/search-conf</searchConfigDirectory>
     -->
     
     <!-- Specify this attribute to point to a location where Harvest-PDAP will write the Solr
          document files. These document files will be written to a directory called solr-docs.
          Default is to write to the current working directory.
          
     <solrDocsDirectory>/path/to/solr-docs</solrDocsDirectory>
     -->
   </pdsSearch>
   <pdapServices>
     <!-- Currently, the only valid value for 'agency' is 'esa'. -->
     <!-- Can optionally specify a 'startDate' to only get data sets from the given date. -->
     <pdapService agency="esa" url="https://archives.esac.esa.int/psa"/>
   </pdapServices>
   <resourceMetadata>
     <title>The Planetary Science Archive METADATA Query Service</title>
     <type>System.Browse</type>
     <slot name="resource_name">
       <value>The Planetary Science Archive METADATA Query Service</value>
     </slot>
     <slot name="resource_description">
       <value>The Planetary Science Archive METADATA Query Service</value>
     </slot>
   </resourceMetadata>
</policy>

      

The policy file is made up of the following complex type elements: pdsSearch, pdapServices, productMetadata and resourceMetadata.

pdsSearch

Specify this element to register products into the Search Service. The following table describes the child elements that are allowed:

Element NameDescription
searchConfigDirectorySpecify to point to the top level directory of the Search Core configuration files. If this is not specified, then the default is to point to the search-conf/ folder found in the Harvest-PDAP package.
solrDocsDirectorySpecify this to set the directory location to output the Solr Document files. If this is not specified, it will write these files to a solr-docs/ folder in the current working directory.

pdapServices

Specify this element to indicate the PDAP service endpoint for accessing product metadata for registration with the Registry Service. The following table describes the child elements that are allowed:

Element NameDescription
pdapServiceSpecify the PDAP service endpoint by populating the two required attributes. The attributes are named agency and url. The only valid value for agency at this time is "esa" with the corresponding url value of "https://archives.esac.esa.int/psa".

An optional atrribute can be specified within the pdapService element to fine tune the query: startDate. This attribute represents the date at which the data set was released to the public. The format of the date value should be YYYY-MM-DD As an example, the following specification will return ESA data sets starting from 2015-01-01:

<pdapService agency="esa" url="https://archives.esac.esa.int/psa" startDate="2015-01-01"/>
      

resourceMetadata

Specify this element to include metadata for every resource product. A corresponding resource product is registered for every data set product registered. The following table describes the child elements that are allowed:

Element NameDescription
titleSpecify a title for the resource product.
typeSpecify the type of resource product. The most common value is "System.Browse".
slotSpecify additional metadata for the resource product. This element contains a required name attribute to specify the name of the slot to use in the registry. The value child element specifies the slot value. This child element may be repeated multiple times to indicate multiple values.

Report Format

This section describes the contents of the Harvest-PDAP Tool report. At this time, the tool only outputs a series of log messages. The log will report the success or failure of a discovered product attempting to be registered. A log consists of a severity level, file name, and a message. The following is an example of some of the log messages that can be expected from the Harvest Tool:

PDS Harvest-PDAP Tool Log

Version                     Version 2.0.0-dev
Time                        Fri, Feb 22 2019 at 08:59:35 AM
Severity Level              INFO
PDAP Target(s)              [https://archives.esac.esa.int/psa]
Search Configuration        /Users/mcayanan/harvest-pdap-2.0.0-dev/search-conf
Solr Docs Output Directory  /Users/mcayanan/harvest-pdap-2.0.0-dev/bin/solr-docs
INFO:   Connecting to PDAP Service: https://archives.esac.esa.int/psa
******* AdaptiveByteStore default memory limit = 2048M * 0.125 = 256M ************
******* malloc 4861928 bytes ************
INFO:   [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \
Processing dataset.
INFO:   [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \
Additional metadata needed. Getting dataset catalog file.
INFO:   [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \
Attempting to retrieve the catalog file using the file name: DATASET.CAT
INFO:   [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \
Successfully parsed file: \
https://archives.esac.esa.int/psa/pdap/fileaccess?ID=EARTH/AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0/CATALOG/DATASET.CAT
SUCCESS:   [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \
Successfully created Solr Document for dataset.
SUCCESS:   [AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0] \
Successfully created Solr Document for resource.
INFO:   [ESO-C-DFOSC-3-RSA-WIRTANEN-IMG-V1.0] \
Processing dataset.
INFO:   [ESO-C-DFOSC-3-RSA-WIRTANEN-IMG-V1.0] \
Additional metadata needed. Getting dataset catalog file.
INFO:   [ESO-C-DFOSC-3-RSA-WIRTANEN-IMG-V1.0] \
Attempting to retrieve the catalog file using the file name: DATASET.CAT
INFO:   [ESO-C-DFOSC-3-RSA-WIRTANEN-IMG-V1.0] \
Could not retrieve catalog file using the file name DATASET.CAT
INFO:   [ESO-C-DFOSC-3-RSA-WIRTANEN-IMG-V1.0] \
Retrieving VOLDESC.CAT to look up the data set catalog file name.
INFO:   [ESO-C-DFOSC-3-RSA-WIRTANEN-IMG-V1.0] \
Successfully parsed file: \
https://archives.esac.esa.int/psa/pdap/fileaccess?ID=EARTH/ESO-C-DFOSC-3-RSA-WIRTANEN-IMG-V1.0/VOLDESC.CAT
INFO:   [ESO-C-DFOSC-3-RSA-WIRTANEN-IMG-V1.0] \
Retrieved the catalog file name from the VOLDESC.CAT: ESO_C_DFOSC_3_RSA_WIRT_DS.CAT
INFO:   [ESO-C-DFOSC-3-RSA-WIRTANEN-IMG-V1.0] \
Retrieving the catalog file 'ESO_C_DFOSC_3_RSA_WIRT_DS.CAT'
INFO:   [ESO-C-DFOSC-3-RSA-WIRTANEN-IMG-V1.0] \
Successfully parsed file: \
https://archives.esac.esa.int/psa/pdap/fileaccess?ID=EARTH/ESO-C-DFOSC-3-RSA-WIRTANEN-IMG-V1.0/CATALOG/ESO_C_DFOSC_3_RSA_WIRT_DS.CAT
SUCCESS:   [ESO-C-DFOSC-3-RSA-WIRTANEN-IMG-V1.0] \
Successfully created Solr Document for dataset.
SUCCESS:   [ESO-C-DFOSC-3-RSA-WIRTANEN-IMG-V1.0] \
Successfully created Solr Document for resource.