Operation (PDS4)

This document describes how to operate the Harvest Tool software for use with PDS4 data product registration. The following topics can be found in this document:

Note: The command-line examples in this section have been broken into multiple lines for readability. The commands should be reassembled into a single line prior to execution.

Quick Start

This section is intended to give a quick and easy way to set up the Harvest policy configuration file and run the tool. For a more detailed explanation on other ways to set up the policy configuration file as well as other ways of running the tool, go to the Harvest Policy File and Advanced Usage sections.

Setup

Included in the Harvest package is an example policy configuration file for PDS4 product registration. Go to the examples/ folder, make a copy of the harvest-policy-example.xml file and modify it as necessary. The following breaks down the policy file example and indicates what modifications are to be performed.

Registry Package

<registryPackage>
  <name>Harvest Package Example Bundle Run</name>
  <description>This is a Harvest run of the example bundle.</description>
</registryPackage>
        

Specify a Registry package name and description. This allows the Registry to associate registered products of a Harvest run to a particular package.

Collections

<collections>
  <file>$HOME/dph_example_archive_VG2PLS/browse/Collection_browse.xml</file>
  <file>$HOME/dph_example_archive_VG2PLS/context/Collection_context.xml</file>
  <file>$HOME/dph_example_archive_VG2PLS/data/Collection_data.xml</file>
  <file>$HOME/dph_example_archive_VG2PLS/document/Collection_document.xml</file>
  <file>$HOME/dph_example_archive_VG2PLS/xml_schema/Collection_xml_schema.xml</file>
</collections>
        

Specify collection products here. This allows Harvest to be able to distinguish between primary members and secondary members of a collection while traversing a target directory.

Directories

<directories>
  <path>$HOME/dph_example_archive_VG2PLS</path>
  <fileFilter>
    <include>*.xml</include>
  </fileFilter>
</directories>
        

Specify the top level directory for Harvest to crawl for products to register.

Access URLs

<accessUrls registerFileUrls="false">
  <accessUrl>
    <baseUrl>http://starbase.jpl.nasa.gov</baseUrl>
    <offset>$HOME</offset>
  </accessUrl>
</accessUrls>
        

Specify the base URL of where the physical data products are located. This allows Harvest to provide links to the physical data products in the slots of each registered product in the Registry.

Checksums

<checksums generate="true">
  <manifest>$HOME/dph_example_archive_VG2PLS/bundle_checksums.txt</manifest>
</checksums>
        

Specify a Checksum Manifest file. With this configuration, Harvest will generate checksums for each file object to be registered and compare against the supplied checksums in the data product label as well as the manifest file.

Candidate Products

<candidates>

...

  <productMetadata objectType="Product_Browse">
    <xPath slotName="information_model_version">
      //Identification_Area/information_model_version
    </xPath>

...

<candidates>
        

Specify any additional product types for Harvest to register here. Note that Harvest reads in an internal global policy config which already includes the following product types:

  • Product_Attribute_Definition
  • Product_Bundle
  • Product_Class_Definition
  • Product_Collection
  • Product_Context
  • Product_Data_Set_PDS3
  • Product_Instrument_Host_PDS3
  • Product_Instrument_PDS3
  • Product_Mission_PDS3
  • Product_Subscription_PDS3
  • Product_Target_PDS3
  • Product_Volume_PDS3
  • Product_Volume_Set_PDS3

Additionally, the candidates section in this policy config example already specifies the following product types:

  • Product_Browse
  • Product_Document
  • Product_File_Text
  • Product_Observational
  • Product_XML_Schema

For the Product_Observational area:

<productMetadata objectType="Product_Observational">
  <!-- Identification_Area -->
  <xPath slotName="information_model_version">
    //Identification_Area/information_model_version
  </xPath>
  <xPath slotName="product_class">
    //Identification_Area/product_class
  </xPath>
  <xPath slotName="alternate_id">
    //Identification_Area/Alias_List/Alias/alternate_id
  </xPath>
  <xPath slotName="alternate_title">
    //Identification_Area/Alias_List/Alias/alternate_title
  </xPath>
  <xPath slotName="citation_author_list">
    //Identification_Area/Citation_Information/author_list
  </xPath>
...
</productMetadata>
        

add XPaths, as well as a meaningful slot name, for the <Discipline_Area> and <Mission_Area> metadata of interest to be included for every Product_Observational registration.

Note that the XPaths already defined in the example policy should not be modified as it allows the Registry to be populated with a consistent set of metadata for every product that is registered.

Namespaces

It is highly encouraged to define this in the policy when specifying XPaths to metadata that reside in a namespace other than the default PDS namespace. Doing so will make the policy more readable.

As an example, check out this PDS4 data product label here, which contains metadata within the <Discipline_Area> and <Mission_Area> sections that reside in namespaces other than the default PDS namespace. To extract metadata from these sections, do the following:

Set the namespaces in the candidates section of the policy config file:

<candidates>
  <namespace prefix="disp" uri="http://pds.nasa.gov/pds4/disp/v1"/>
  <namespace prefix="sp" uri="http://pds.nasa.gov/pds4/sp/v1"/>
  <namespace prefix="geom" uri="http://pds.nasa.gov/pds4/geom/v0"/>
  <namespace prefix="sbn" uri="http://pds.nasa.gov/pds4/sbn/v0"/>
  <namespace prefix="epoxi" uri="http://pds.nasa.gov/pds4/mission/epoxi/v0"/>

...

</candidates>
        

Then add the additional XPaths to extract metadata within the <Discipline_Area> and <Mission_Area> sections from Product_Observational products:

  <productMetadata objectType="Product_Observational">

...

    <!-- Mission_Area metadata -->
    <xPath slotName="spacecraft_clock_start_count">
      //Mission_Area/epoxi:EPOXI_Attributes/epoxi:Observation_Parameters/epoxi:spacecraft_clock_start_count
    </xPath>
    <xPath slotName="spacecraft_clock_stop_count">
      //Mission_Area/epoxi:EPOXI_Attributes/epoxi:Observation_Parameters/epoxi:spacecraft_clock_stop_count
    </xPath>
    <xPath slotName="total_integration_time">
      //Mission_Area/epoxi:EPOXI_Attributes/epoxi:Observation_Parameters/epoxi:total_integration_time
    </xPath>

    <!-- Discipline_Area metadata -->
    <xPath slotName="start_julian_date">
      //Discipline_Area/sbn:Times/sbn:start_Julian_date
    </xPath>
    <xPath slotName="display_settings_local_reference_type">
      //Discipline_Area/disp:Display_Settings/disp:Local_Internal_Reference/disp:local_reference_type
    </xPath>
    <xPath slotName="special_characteristics_local_identifier_reference">
      //Discipline_Area/sp:Spectral_Characteristics/sp:local_identifier_reference
    </xPath>
    <xPath slotName="geometry_vertical_display_direction">
      //Discipline_Area/geom:Geometry/geom:Image_Display_Geometry/geom:Display_Direction/disp:vertical_display_direction
    </xPath>
  </productMetadata>

</candidates>
        

Note that the XPaths are utilizing the prefix values from the namespaces defined in the policy.

Running The Tool

The following command demonstrates the recommended way to run Harvest:

%> harvest -c harvest-policy-example.xml -l harvest.log
        

The -c flag option specifies the policy configuration file while the -l flag option specifies the log file for the Harvest output.

For large registrations (i.e. million products), it is recommended to run Harvest with the batch mode flag for faster performance:

%> harvest -c harvest-policy-example.xml -l harvest.log -b
        

Command-Line Options

The following table describes the command-line options available:

Command-Line OptionDescription
-c, --configSpecify a policy configuration file to set the tool behavior. (This flag is required)
-u, --usernameSpecify a username for authentication with the PDS Security Service.
-p, --passwordSpecify a password associated with the username.
-k, --keystore-passSpecify a keystore password associated with the keystore file being passed into the tool.
-l, --log-fileSpecify a log file name. Default is standard out.
-b, --batch-modeTells the tool to perform batch registration. Optionally specify an integer value that represents how many products to ingest at one time. The default is to register 50 products at a time if no value is specified.
-v, --verboseSpecify the message severity level and above to include in the log (0=Debug, 1=Info, 2=Warning, 3=Error). Default is Info and above (level 1).
-e, --regexpSpecify file patterns to look for when registering products from a target directory. Each pattern must be surrounded by quotes (i.e. -e "*.xml").
-D, --ignore-dirSpecify patterns to look for when traversing a target directory for sub-directories to ignore. Each pattern must be surrounded by quotes (i.e. -D "CATALOG").
-P, --portSpecify a port number to use if running the tool in persistence mode. See the Persistence Mode section for more details.
-w, --waitSpecify the time, in seconds, to wait in between the crawls if running the tool in persistence mode. See the Persistence Mode section for more details.
-V, --versionDisplay the release number and copyright information.
-h, --helpDisplay harvest usage.

Advanced Usage

This section describes more advanced ways to run the tool, as well as its behaviors and caveats.

Tool Execution

The Harvest Tool operates with a policy file to register product metadata. Details on how to create this policy file can be found in the Harvest Policy File section.

This section demonstrates some of the other ways that the tool can be executed:

  • Registering Products From a Single Target
  • Registering Products From Multiple Targets
  • Registering Products From Targets Specified In The Policy File
  • Registering a Million Products
  • Registering Products To A Secured Registry Instance
  • Excluding Sub-Directories To Traverse From a Target

Registering Products From a Single Target

The following command demonstrates how to register products to a non-secured registry instance from a target directory, $HOME/directory, where only files that end with a .xml file extension, will be processed:

% harvest $HOME/directory -e "*.xml" -c policy.xml
        

Registering Products From Multiple Targets

The following command demonstrates how to register products to a non-secured registry instance from two target directories, $HOME/directory1 and $HOME/directory2, using the policy file, policy.xml. Only files that end with a .xml file extension will be processed. The output will go to a log file, log-file.txt:

% harvest $HOME/directory1, $HOME/directory2 -e "*.xml" -c policy.xml -l log-file.txt
        

Registering Products From Targets Specified In The Policy File

Targets can either be specified on the command-line or in in the policy file. Any targets specified on the command-line will overwrite any targets specified in the policy file. The following command demonstrates registering products based on targets specified in the policy file, policy.xml:

% harvest -c policy.xml
        

Registering a Million Products

When registering a million products in a single run, it is recommended to run Harvest with the -b, --batch-mode flag option for faster performance. The following command demonstrates running Harvest in batch mode:

% harvest -c policy.xml -b
        

In batch mode, the default is to register 50 products at a time. To modify this behavior, specify an integer value after the -b, --batch-mode flag. The following command demonstrates running Harvest in batch mode, where it will register 100 products at a time.

% harvest -c policy.xml -b 100
        

Registering Products To A Secured Registry Instance

The following command demonstrates how to register products to a secured registry instance from a target directory, $HOME/directory, using the policy file, policy.xml:

% harvest $HOME/directory -c policy.xml -u {username} -p {password} \
  -k {keystorePassword}
        

Excluding Sub-Directories To Traverse From a Target

The following command demonstrates registering products from a target directory, $HOME/directory, where the tool will not traverse the sub-directory CONTEXT:

% harvest $HOME/directory -c policy.xml -D "CONTEXT"
        

Persistence Mode

The Harvest Tool can be run in persistence mode through an XML-RPC accessible web service called a daemon. Under this scenario, the Harvest Tool wakes up periodically, inspects a target directory or directories, and registers the latest products. This section details how to set this up.

In order to run the tool through the daemon, the command-line option flags -P and -w need to be used. This tells the Harvest Tool the port number to use and how long to sleep in between crawls, respectively. When the daemon is running, it can be accessed through the following url: http://localhost:{port number}/xmlrpc. The following command demonstrates launching the Harvest Tool through the daemon on port 9001, where it will wait 120 seconds in between crawls:

% harvest -c policy.xml -u {username} -p {password} \
  -k {keystorePassword} -l log.txt -P 9001 -w 120
        

After running the above command, the daemon will be accessible at http://localhost:9001/xmlrpc.

In order to stop the daemon from running, a daemon controller is needed. The bin/ directory of the Harvest Tool release package contains a shell script, harvest-ctrl, and a batch file, harvest-ctrl.bat, which are used to gracefully shut down the daemon service on a UNIX-like and Windows system, respectively. In addition, they can provide a few additional statistics about the crawling.

The following table describes the command-line options available for harvest-ctrl:

Command-Line OptionDescription
--urlSpecify the URL of the daemon service running the Harvest Tool.
--operationSpecify a single operation to perform. List of valid operations can be found in the next table.

The following table describes the operation names available to pass with the --operation command-line option:

Operation OptionDescription
--stopSpecify this operation to shut down the daemon service.
--isRunningGives an indication whether the daemon service is running.
--getNumCrawlsReturns the number of crawls that have occurred.
--getWaitIntervalReturns the time, in seconds, that the crawler has to wait in between crawls.
--getMilisCrawlingReturns the amount of milliseconds spent crawling.
--getAverageCrawlTimeReturns the average amount of time, in milliseconds, spent during each crawl.

The following examples demonstrate how to run harvest-ctrl using a few of the different operations. For demonstration purposes, assume that the daemon service is located at the following url: http://localhost:9001/xmlrpc.

Determine the Status of the Daemon Service

The following command is used to find out if the daemon service is still running:

% harvest-ctrl --url http://localhost:9001/xmlrpc --operation --isRunning
        

Shutdown the Daemon Service

The following command demonstrates shutting down the daemon service:

% harvest-ctrl --url http://localhost:9001/xmlrpc --operation --stop
        

Harvest Policy File

The Harvest policy file is an XML-based configuration file that the tool uses to find products and register their metadata. The schema for the policy file can be found in the Harvest Policy Schema section. This section describes the valid elements that are available to setup the policy file to do PDS4 data product registration.

PDS4 Data Product Registration

The following is an example of a policy file to perform registration of PDS4 data products:

<policy>
  <registryPackage>
    <name>Harvest Package Example</name>
    <description>This is an example of a Harvest run.</description>
  </registryPackage>

  <collections>
    <file>$HOME/VG2PLS_archive/data/Collection_Data.xml</file>
    <file>$HOME/VG2PLS_archive/document/Collection_document.xml</file>
  </collections>

  <directories>
    <path>$HOME/VG2PLS_archive</path>
    <fileFilter>
      <include>*.xml</include>
    </fileFilter>
    <directoryFilter>
      <exclude>CONTEXT</exclude>
    </directoryFilter>
  </directories>

  <accessUrls registerFileUrls="true">
    <accessUrl>
      <baseUrl>http://pds.nasa.gov/pds4</baseUrl>
      <offset>$HOME</offset>
    </accessUrl>
  </accessUrls>

  <checksums generate="true">
    <manifest>$HOME/VG2PLS_archive/vg2pls_archive.md5</manifest>
  </checksums>

  <storageIngestion>
    <serverUrl>http://localhost:9000</serverUrl>
  </storageIngestion>

  <candidates>
    <namespace prefix="dph" uri="http://pds.nasa.gov/schema/pds4/dph/v01"/>

    <productMetadata objectType="Product_Document">
      <xPath slotName="information_model_version">
        //Identification_Area/information_model_version
      </xPath>
      <xPath slotName="product_class">
        //Identification_Area/product_class
      </xPath>
      <xPath slotName="alternate_id">
        //Identification_Area/Alias_List/Alias/alternate_id
      </xPath>
      <xPath slotName="alternate_title">
        //Identification_Area/Alias_List/Alias/alternate_title
      </xPath>
      <xPath slotName="citation_author_list">
        //Identification_Area/Citation_Information/author_list
      </xPath>
      <xPath slotName="citation_editor_list">
        //Identification_Area/Citation_Information/editor_list
      </xPath>
      <xPath slotName="citation_publication_year">
        //Identification_Area/Citation_Information/publication_year
      </xPath>
      <xPath slotName="citation_keywords">
        //Identification_Area/Citation_Information/keywords
      </xPath>
      <xPath slotName="citation_description">
        //Identification_Area/Citation_Information/description
      </xPath>
      <xPath slotName="modification_date">
        //Identification_Area/Modification_History/Modification_Detail/modification_date
      </xPath>
      <xPath slotName="modification_version_id">
        //Identification_Area/Modification_History/Modification_Detail/version_id
      </xPath>
      <xPath slotName="modification_description">
        //Identification_Area/Modification_History/Modification_Detail/description
      </xPath>
      <xPath slotName="external_reference_description">
        //Reference_List/External_Reference/description
      </xPath>
      <xPath slotName="document_revision_id">
        //Document_Description/revision_id
      </xPath>
      <xPath slotName="document_name">
        //Document_Description/document_name
      </xPath>
      <xPath slotName="document_doi">
        //Document_Description/doi
      </xPath>
      <xPath slotName="document_author_list">
        //Document_Description/author_list
      </xPath>
      <xPath slotName="document_editor_list">
        //Document_Description/editor_list
      </xPath>
      <xPath slotName="document_acknowledgement_text">
        //Document_Description/acknowledgement_text
      </xPath>
      <xPath slotName="document_copyright">
        //Document_Description/copyright
      </xPath>
      <xPath slotName="document_description">
        //Document_Description/description
      </xPath>
      <xPath slotName="document_publication_date">
        //Document_Description/publication_date
      </xPath>
    </productMetadata>

    <productMetadata objectType="Product_Observational">
      <xPath slotName="information_model_version">
        //Identification_Area/information_model_version
      </xPath>
      <xPath slotName="product_class">
        //Identification_Area/product_class
      </xPath>
      <xPath slotName="alternate_id">
        //Identification_Area/Alias_List/Alias/alternate_id
      </xPath>
      <xPath slotName="alternate_title">
        //Identification_Area/Alias_List/Alias/alternate_title
      </xPath>
      <xPath slotName="citation_author_list">
        //Identification_Area/Citation_Information/author_list
      </xPath>
      <xPath slotName="citation_editor_list">
        //Identification_Area/Citation_Information/editor_list
      </xPath>
      <xPath slotName="citation_publication_year">
        //Identification_Area/Citation_Information/publication_year
      </xPath>
      <xPath slotName="citation_keywords">
        //Identification_Area/Citation_Information/keywords
      </xPath>
      <xPath slotName="citation_description">
        //Identification_Area/Citation_Information/description
      </xPath>
      <xPath slotName="modification_date">
        //Identification_Area/Modification_History/Modification_Detail/modification_date
      </xPath>
      <xPath slotName="modification_version_id">
        //Identification_Area/Modification_History/Modification_Detail/version_id
      </xPath>
      <xPath slotName="modification_description">
        //Identification_Area/Modification_History/Modification_Detail/description
      </xPath>
      <xPath slotName="observation_comment">
        //Observation_Area/comment
      </xPath>
      <xPath slotName="observation_start_date_time">
        //Observation_Area/Time_Coordinates/start_date_time
      </xPath>
      <xPath slotName="observation_stop_date_time">
        //Observation_Area/Time_Coordinates/stop_date_time
      </xPath>
      <xPath slotName="observation_local_mean_solar_time">
        //Observation_Area/Time_Coordinates/local_mean_solar_time
      </xPath>
      <xPath slotName="observation_local_true_solar_time">
        //Observation_Area/Time_Coordinates/local_true_solar_time
      </xPath>
      <xPath slotName="observation_solar_longitute">
        //Observation_Area/Time_Coordinates/solar_longitude
      </xPath>
      <xPath slotName="primary_result_type">
        //Observation_Area/Primary_Result_Description/type
      </xPath>
      <xPath slotName="primary_result_purpose">
        //Observation_Area/Primary_Result_Description/purpose
      </xPath>
      <xPath slotName="primary_result_data_regime">
        //Observation_Area/Primary_Result_Description/data_regime
      </xPath>
      <xPath slotName="primary_result_reduction_level">
        //Observation_Area/Primary_Result_Description/reduction_level
      </xPath>
      <xPath slotName="primary_result_description">
        //Observation_Area/Primary_Result_Description/description
      </xPath>
      <xPath slotName="investigation_name">
        //Observation_Area/Investigation_Area/name
      </xPath>
      <xPath slotName="investigation_type">
        //Observation_Area/Investigation_Area/type
      </xPath>
      <xPath slotName="observing_system_name">
        //Observation_Area/Observing_System/name
      </xPath>
      <xPath slotName="observing_system_description">
        //Observation_Area/Observing_System/description
      </xPath>
      <xPath slotName="observing_system_component_name">
        //Observation_Area/Observing_System/Observing_System_Component/name
      </xPath>
      <xPath slotName="observing_system_component_description">
        //Observation_Area/Observing_System/Observing_System_Component/description
      </xPath>
      <xPath slotName="observing_system_component_type">
        //Observation_Area/Observing_System/Observing_System_Component/\
        observing_system_component_type
      </xPath>
      <xPath slotName="target_name">
        //Observation_Area/Target_Identification/name
      </xPath>
      <xPath slotName="target_alternate_designation">
        //Observation_Area/Target_Identification/alternate_designation
      </xPath>
      <xPath slotName="target_type">
        //Observation_Area/Target_Identification/type
      </xPath>
      <xPath slotName="target_description">
        //Observation_Area/Target_Identification/description
      </xPath>
      <xPath slotName="spacecraft_clock_start_count">
        //Observation_Area/Mission_Area/dph:spacecraft_clock_start_count
      </xPath>
      <xPath slotName="spacecraft_clock_stop_count">
        //Observation_Area/Mission_Area/dph:spacecraft_clock_stop_count
      </xPath>
      <xPath slotName="external_reference_description">
        //Reference_List/External_Reference/description
      </xPath>
    </productMetadata>

  </candidates>
</policy>
       

The policy file is made up of the following complex type elements: registryPackage, collections, directories, checksums, storageIngestion, accessUrls, candidates, and productMetadata.

registryPackage

Each time the Harvest Tool runs, it creates a package in the registry to group the product registrations together. Specify this element to give a registry package a name and/or description. The following table describes the elements that are allowed:

Element NameDescription
nameSpecify a package name. If this element is not specified, the default is to create a package with the name Harvest-Package_<current datetimestamp>.
descriptionSpecify a package description. If this element is not specified, the default is to create a description that lists the targets that were specified in the policy config file.

collections

Specify this element to tell the Harvest Tool to register the collections first before crawling a target directory. This is required if the target directory contains collections that are co-located with its members and in order to distinguish primary versus secondary members.

The following table describes the elements that are allowed:

Element NameDescription
fileSpecify a collection file. Specify this element tag more than once to point to multiple collection files.

In the example above, the Harvest Tool will register the following collections before crawling the target directory:

  • $HOME/VG2PLS_archive/data/Collection_Data.xml
  • $HOME/VG2PLS_archive/document/Collection_document.xml

Once these collections are registered, the primary and secondary members are cached in memory and as the Harvest Tool crawls through a target directory, any secondary members will be identified and will not be registered. In addition, a SKIP message will be issued in the log report to indicate that the tool has identified a non-primary member.

In the case where the target directory consists of a hierarchy structure where the collection product is located one-level above its members, much like the PDS context bundle, then there is no need to specify the collections in the Harvest policy config file. Under this scenario, the collections will be registered first before the Harvest Tool traverses down the sub-directory containing the members.

directories

Specify this element to tell the Harvest Tool where to crawl for data products. The following table describes the elements that are allowed:

Element NameDescription
pathSpecify a directory path to start crawling. Specify this element tag more than once to point to multiple directories to crawl.
fileFilterSpecify one or more include elements, where each element value contains a pattern to look for when crawling a target directory for files to register. If omitted, the default is to get all files within a directory.
directoryFilterSpecify one or more exclude elements, where each element value contains a pattern to look for when crawling a target directory for sub-directories to ignore.

In the example above, the Harvest tool will crawl the directory location, $HOME/VG2PLS_archive, looking for files that have a .xml file extension. The default is to touch all files in the directory if the filePattern element is omitted from the policy file. In addition, the CONTEXT directory will be ignored while traversing the target directory.

accessUrls

Specify this element to provide links to the physical data products. The links will be placed in the registry as slots under the slot name accessUrl. An optional attribute can be specified named registerFileUrls, which if set to true, will create file url links.

The accessUrls element can contain multiple accessUrl element tags. The following table describes the elements that are allowed within the accessUrl tag:

Element NameDescription
baseUrlSpecify a base url.
offsetOptionally specify an offset to nix from the absolute path of each product before appending it to the base url. Can be specified more than once.

In the policy example above, the Harvest Tool will nix any absolute path of a product starting with $HOME before appending it to the starting base url of http://pds.nasa.gov/pds4. The following example demonstrates what the resulting access url will be for a registered product located at $HOME/VG2PLS_archive/browse/Collection_Browse.xml:

http://pds.nasa.gov/pds4/VG2PLS_archive/browse/Collection_Browse.xml
       

checksums

Checksum generation is turned off by default in the Harvest Tool. In order to turn this on, set the generate attribute to true. The following table describes the elements that are allowed within the checksum tag:

Element NameDescription
manifestSpecify a checksum manifest file. Can be specified more than once.

The following describes the tool behavior based on the different checksum settings:

Checksum Manifest File Provided and Generate Flag Set To true

Harvest will generate a checksum for each file encountered and verify it against the supplied checksum file. If the data file checksum was supplied in the label, Harvest will verify it as well. A warning message will be issued in the log report if a mismatch occurs. In any case, the generated checksum value is included in the associated Product_File_Repository product.

Checksum Manifest File Provided and Generate Flag Set To False (or not set at all)

Harvest will not generate checksums, but will use the value from the checksum manifest file to populate the associated Product_File_Repository product. If a data file checksum was supplied in the label, compare the value from the manifest against the value supplied in the label and issue a warning if there is a mismatch.

Checksum Manifest File Not Provided and Generate Flag Set To True

Harvest will generate a checksum for each file encountered and verify it against an optional checksum supplied in the label. If there is a mismatch, a warning message will be issued in the log report. The generated value is included in the associated Product_File_Repository product.

Checksum Manifest File Not Provided and Generate Flag Set To False

Harvest will not generate checksums. If the data file checksum was supplied in the label, populate the associated Product_File_Repository product with that value.

storageIngestion

Specify this element to tell the Harvest Tool to ingest data products to the PDS Storage Service. The following table describes the elements that are allowed:

Element NameDescription
serverUrlSpecify the url to the PDS Storage Service.

In the example above, the Harvest Tool will ingest data products to the PDS Storage Service at http://localhost:9000. When a data product is ingested to the PDS Storage, it returns a product id which is a reference to the ingested product. This id is placed as a slot in the registry under the slot name storageServiceProductId.

candidates

Specify this element to tell the Harvest Tool what product types to register and what metadata to extract from a data product. This is a required element in the policy file. The following table describes the elements that are allowed:

Element NameDescription
namespaceSpecify to allow the Harvest Tool to extract metadata that is in a namespace other than the default PDS namespace.
productMetadataSpecify to tell the tool what object types and what metadata to register.

By default, the Harvest Tool defines the default namespace to be the PDS namespace. To override this default, specify the default attribute in the namespace element and give it a value of true. The following sets the dph namespace to be the default namespace in Harvest:

<candidates>
  <namespace prefix="dph" uri="http://pds.nasa.gov/schema/pds4/dph/v01" default="true"/>
          ...
       

Namespaces need to be defined in the Harvest policy file only if the metadata to be extracted exists in a namespace other than the PDS namespace. In the example above, a namespace with the prefix dph and uri http://pds.nasa.gov/schema/pds4/dph/v01 has been defined. This means that any xPath expressions defined in the policy file will be able to use the dph prefix to be able to extract metadata that are within the dph namespace. xPaths will be explained in greater detail in the productMetadata section.

productMetadata

Specify this element to tell the Harvest Tool what metadata to register. It requires an attribute called objectType that tells the Harvest Tool what product types to register. The following table describes the elements that are allowed:

Element NameDescription
xPathSpecify an XPath expression to extract metadata.

In the example above, the policy file tells the Harvest Tool to look for and register the Product_Document and Product_Observational object types.

Also in the example is a set of xPath elements found under each productMetadata element. This defines what metadata to extract from the different products. XPath is a query language that uses path expressions to select nodes in an XML document. These path expressions look very much like expressions in a traditional computer file system. In its simplest form, prepending a // before a name will find the element no matter where it is in the XML file.

The following XPath expression will find the start_date_time element within the default namespace, no matter where this element is located in the file:

//start_date_time
       

The following XPath expression will find the spacecraft_clock_start_count element within the dph namespace, no matter where this element is located in the file:

//dph:spacecraft_clock_start_count
       

The following XPath expression will find all information_model_version elements that are children of Identification_Area within the default namespace:

//Identification_Area/information_model_version
       

The following XPath expression will find all name elements that are children of Target_Identification and that have a value of MARS:

//Target_Identification/name[text()="MARS"]
       

For a more detailed explanation on XPath, go to your favorite search engine and type XPath tutorial.

The slotName attribute within the xPath element allows the renaming of metadata element names when they are registered as slots in the registry. By default, the slot name is set to the element name that results from an xpath expression. For example, for the xpath expression, //Target_Identification/name, the slot name will be set to name.

The following demonstrates setting the policy file to find any name elements that are children of Target_Identification and setting the slot name to target_identification_name:

<xPath slotName="target_identification_name">//Target_Identification/name</xPath>
       

Report Format

This section describes the contents of the Harvest Tool report. At this time, the Harvest Tool only outputs a series of log messages. The log will report the success or failure of a discovered product attempting to be registered. Additionally, any syntactical errors in a discovered product are reported. A log consists of a severity level, file name, and a message. The following is an example of some of the log messages that can be expected from the Harvest Tool:

PDS Harvest Tool Log

Version                     Version 1.7.0-dev
Time                        Thu, Sep 18 2014 at 03:18:45 PM
Target(s)                   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS]
File Inclusions             [*.xml]
Severity Level              INFO
Registry Location           http://localhost:8080/registry
Registry Package Name       Harvest Package Example
Registration Package GUID   urn:uuid:e421ff5f-1e2a-447f-b035-2e2b6d01b507

INFO:   XML extractor set to the following default namespace: \
http://pds.nasa.gov/pds4/pds/v1
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/bundle_checksums.txt] \
Processing checksum manifest.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Begin processing.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
line 58: Mapping reference type 'bundle_to_investigation' to 'investigation_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
line 69: Mapping reference type 'is_instrument_host' to 'instrument_host_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
line 78: Mapping reference type 'is_instrument' to 'instrument_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
line 88: Mapping reference type 'bundle_to_target' to 'target_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
line 134: Mapping reference type 'bundle_has_browse_collection' to 'collection_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
line 140: Mapping reference type 'bundle_has_context_collection' to 'collection_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
line 146: Mapping reference type 'bundle_has_data_collection' to 'collection_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
line 152: Mapping reference type 'bundle_has_document_collection' to 'collection_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
line 158: Mapping reference type 'bundle_has_schema_collection' to 'collection_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Setting LIDVID-based association, 'urn:nasa:pds:context:investigation:mission.voyager::1.0', under slot name 'investigation_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Setting LID-based association, 'urn:nasa:pds:context:instrument_host:instrument_host.vg2', under slot name 'instrument_host_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Setting LID-based association, 'urn:nasa:pds:context:instrument:instrument.pls__vg2', under slot name 'instrument_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Setting LIDVID-based association, 'urn:nasa:pds:context:target:planet.jupiter::1.0', under slot name 'target_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Setting LID-based association, 'urn:nasa:pds:example.dph.sample_archive_bundle:browse', under slot name 'collection_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Setting LID-based association, 'urn:nasa:pds:example.dph.sample_archive_bundle:context', under slot name 'collection_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Setting LID-based association, 'urn:nasa:pds:example.dph.sample_archive_bundle:data', under slot name 'collection_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Setting LID-based association, 'urn:nasa:pds:example.dph.sample_archive_bundle:document', under slot name 'collection_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Setting LID-based association, 'urn:nasa:pds:example.dph.sample_archive_bundle:xml_schema', under slot name 'collection_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Created access url: http://starbase.jpl.nasa.gov/pds4/1300/dph_example_archive_VG2PLS/Product_Bundle.xml
SUCCESS:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Successfully registered product: urn:nasa:pds:example.dph.sample_archive_bundle::1.0
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Product has the following GUID: urn:uuid:244f8300-2b2a-4eaf-8747-7d0b9ba6301d
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Capturing file information for Product_Bundle.xml
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Found checksum in the manifest for file object '/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml': e19987082da554aea2f91e8206b34c11
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Capturing file object metadata for README.TXT
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Found checksum in the manifest for file object '/Users/mcayanan/pds4/dph_example_archive_VG2PLS/README.TXT': e7da7276dd553f30496b868a2007bf5b
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
No checksum to compare against in the product label for file object '/Users/mcayanan/pds4/dph_example_archive_VG2PLS/README.TXT'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
line 102: Setting file type for the file object 'README.TXT' to 'Text'
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Created access url: http://starbase.jpl.nasa.gov/pds4/1300/dph_example_archive_VG2PLS/Product_Bundle.xml
SUCCESS:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Successfully registered product: urn:nasa:pds:example.dph.sample_archive_bundle:product_bundle.xml::1.0
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Product has the following GUID: urn:uuid:f416c468-3f6c-451a-ad4c-2a5ed2730ce4
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Created access url: http://starbase.jpl.nasa.gov/pds4/1300/dph_example_archive_VG2PLS/README.TXT
SUCCESS:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Successfully registered product: urn:nasa:pds:example.dph.sample_archive_bundle:readme.txt::1.0
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/Product_Bundle.xml] \
Product has the following GUID: urn:uuid:be482995-780c-4b41-8d04-fde0e23176c2
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Begin processing.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Mapping reference type 'inventory_has_member_product' to 'member_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
line 61: Mapping reference type 'collection_to_investigation' to 'investigation_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
line 72: Mapping reference type 'is_instrument_host' to 'instrument_host_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
line 81: Mapping reference type 'is_instrument' to 'instrument_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
line 91: Mapping reference type 'collection_to_target' to 'target_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
line 99: Mapping reference type 'collection_curated_by_node' to 'node_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
line 101: Mapping reference type 'collection_to_context' to 'context_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
line 101: Mapping reference type 'collection_to_context' to 'context_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
line 101: Mapping reference type 'collection_to_context' to 'context_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
line 101: Mapping reference type 'collection_to_context' to 'context_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Setting LIDVID-based association, 'urn:nasa:pds:context:investigation:mission.voyager::1.0', under slot name 'investigation_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Setting LID-based association, 'urn:nasa:pds:context:instrument_host:instrument_host.vg2', under slot name 'instrument_host_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Setting LID-based association, 'urn:nasa:pds:context:instrument:instrument.pls__vg2', under slot name 'instrument_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Setting LIDVID-based association, 'urn:nasa:pds:context:target:planet.jupiter::1.0', under slot name 'target_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Setting LID-based association, 'urn:nasa:pds:context:node:node.ppi-ucla', under slot name 'node_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Setting LID-based association, 'urn:nasa:pds:context:instrument_host:instrument_host.vg2', under slot name 'context_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Setting LID-based association, 'urn:nasa:pds:context:instrument:instrument.pls__vg2', under slot name 'context_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Setting LID-based association, 'urn:nasa:pds:context:investigation:mission.voyager', under slot name 'context_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Setting LID-based association, 'urn:nasa:pds:context:target:planet.jupiter', under slot name 'context_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Created access url: http://starbase.jpl.nasa.gov/pds4/1300/dph_example_archive_VG2PLS/data/Collection_data.xml
SUCCESS:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Successfully registered product: urn:nasa:pds:example.dph.sample_archive_bundle:data::1.0
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Product has the following GUID: urn:uuid:bf5eccd3-3f0c-4fe9-9eaf-3517d5d7a21d
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Capturing file information for Collection_data.xml
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Found checksum in the manifest for file object '/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml': dd32522a6d225ce3fdacf716fe10dafd
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Capturing file object metadata for Collection_data_inventory.tab
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Found checksum in the manifest for file object '/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data_inventory.tab': 2cef640ecac021083ef1fe3f03f4c6f6
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Checksum in the manifest '2cef640ecac021083ef1fe3f03f4c6f6' matches the checksum in the product label '2cef640ecac021083ef1fe3f03f4c6f6' for file object \
'/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data_inventory.tab'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
line 118: Setting file type for the file object 'Collection_data_inventory.tab' to 'Inventory'
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Created access url: http://starbase.jpl.nasa.gov/pds4/1300/dph_example_archive_VG2PLS/data/Collection_data.xml
SUCCESS:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Successfully registered product: urn:nasa:pds:example.dph.sample_archive_bundle:data:collection_data.xml::1.0
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Product has the following GUID: urn:uuid:2da0dcda-63b9-49d6-8e29-20a2028d6e7e
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Created access url: http://starbase.jpl.nasa.gov/pds4/1300/dph_example_archive_VG2PLS/data/Collection_data_inventory.tab
SUCCESS:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Successfully registered product: urn:nasa:pds:example.dph.sample_archive_bundle:data:collection_data_inventory.tab::1.0
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/Collection_data.xml] \
Product has the following GUID: urn:uuid:3dc660c6-ff43-4851-97eb-aa92ad00e2b3
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
Begin processing.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
line 54: Mapping reference type 'data_to_investigation' to 'investigation_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
line 65: Mapping reference type 'is_instrument_host' to 'instrument_host_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
line 74: Mapping reference type 'is_instrument' to 'instrument_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
line 84: Mapping reference type 'data_to_target' to 'target_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
line 103: Mapping reference type 'data_curated_by_node' to 'node_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
Setting LID-based association, 'urn:nasa:pds:context:investigation:mission.voyager', under slot name 'investigation_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
Setting LID-based association, 'urn:nasa:pds:context:instrument_host:instrument_host.vg2', under slot name 'instrument_host_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
Setting LID-based association, 'urn:nasa:pds:context:instrument:instrument.pls__vg2', under slot name 'instrument_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
Setting LID-based association, 'urn:nasa:pds:context:target:planet.jupiter', under slot name 'target_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
Setting LID-based association, 'urn:nasa:pds:context:node:node.ppi-ucla', under slot name 'node_ref'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
Created access url: http://starbase.jpl.nasa.gov/pds4/1300/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml
SUCCESS:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
Successfully registered product: urn:nasa:pds:example.dph.sample_archive_bundle:data:tablechar.vg2-j-pls-5-summ-ele-mom-96.0sec-v1.0::1.0
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
Product has the following GUID: urn:uuid:020b4ac7-9c03-4419-9385-e1103376834d
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
Capturing file information for ele_mom_tblChar.xml
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
Found checksum in the manifest for file object '/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml': 34842f734ac4df865b2a0a78163a6aab
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
Capturing file object metadata for ELE_MOM.TAB
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
Found checksum in the manifest for file object '/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ELE_MOM.TAB': 2b555c42a7e7b4981407c9a824237f4a
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
Checksum in the manifest '2b555c42a7e7b4981407c9a824237f4a' matches the checksum in the product label '2b555c42a7e7b4981407c9a824237f4a' for file object \
'/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ELE_MOM.TAB'.
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
line 108: Setting file type for the file object 'ELE_MOM.TAB' to 'Observation'
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
Created access url: http://starbase.jpl.nasa.gov/pds4/1300/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml
SUCCESS:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
Successfully registered product: urn:nasa:pds:example.dph.sample_archive_bundle:data:tablechar.vg2-j-pls-5-summ-ele-mom-96.0sec-v1.0:ele_mom_tblchar.xml::1.0
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
Product has the following GUID: urn:uuid:0668a9d6-1d1a-4208-ac7a-f7cbee3cdd08
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
Created access url: http://starbase.jpl.nasa.gov/pds4/1300/dph_example_archive_VG2PLS/data/ELE_MOM.TAB
SUCCESS:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
Successfully registered product: urn:nasa:pds:example.dph.sample_archive_bundle:data:tablechar.vg2-j-pls-5-summ-ele-mom-96.0sec-v1.0:ele_mom.tab::1.0
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml] \
Product has the following GUID: urn:uuid:f976c4d3-04f7-4d59-b272-1a6bea2855a8

...

SUCCESS:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/browse/Collection_browse.xml] \
Successfully registered association to urn:nasa:pds:example.dph.sample_archive_bundle:browse:collection_browse.xml::1.0
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/browse/Collection_browse.xml] \
Association has the following GUID: urn:uuid:7dd4f202-3b9a-41eb-a2a8-633b202e93c4
SUCCESS:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/browse/Collection_browse.xml] \
Successfully registered association to urn:nasa:pds:example.dph.sample_archive_bundle:browse:collection_browse_inventory.tab::1.0
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/browse/Collection_browse.xml] \
Association has the following GUID: urn:uuid:2ab90a3f-58f8-4778-b648-129803cf07f3
SUCCESS:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/browse/ele_mom_browse.xml] \
Successfully registered association to urn:nasa:pds:example.dph.sample_archive_bundle:browse:ele_mom:ele_mom_browse.xml::1.0
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/browse/ele_mom_browse.xml] \
Association has the following GUID: urn:uuid:5d9bc31f-9315-4ec9-857b-df0c4d0ef0e2
SUCCESS:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/browse/ele_mom_browse.xml] \
Successfully registered association to urn:nasa:pds:example.dph.sample_archive_bundle:browse:ele_mom:ele_mom.pdf::1.0
INFO:   [/Users/mcayanan/pds4/dph_example_archive_VG2PLS/browse/ele_mom_browse.xml] \
Association has the following GUID: urn:uuid:226d96bf-cecd-42a7-913a-db76f33861df

Summary:

13 of 13 file(s) processed, 5 other file(s) skipped
0 error(s), 7 warning(s)

13 of 13 products registered.
24 of 24 ancillary products registered.

Product Types Registered:
2 Product_Document
1 Product_Browse
1 Product_Observational
3 Product_File_Text
1 Product_Bundle
5 Product_Collection
24 Product_File_Repository

9 of 9 checksums in the manifest matched the supplied value in their product label, 1 value(s) not checked.

35 of 35 associations registered.


End of Log

     

Common Errors

Execution of the Harvest Tool may result in the following message appearing in the log:

INFO:   XML extractor set to the following default namespace: \
http://pds.nasa.gov/schema/pds4/pds
INFO:   [/pds4/VG2PLS_archive/Product_Bundle.xml] Begin processing.
SKIP:   [/pds4/VG2PLS_archive/Product_Bundle.xml] No product_class element found.
      

The message above is normally the result of a namespace mismatch between the Harvest Tool configuration and the product labels being registered. See the PDS4 Data Product Registration section above for details on specifying the namespace in the configuration file. By the way, the message could be telling the truth where the product label does not contain the <product_class> element. If this is the case, then the file is not a valid PDS product label.