Operation

This document describes how to operate the Validate Tool. The following topics can be found in this document:

Note: The command-line examples in this section have been broken into multiple lines for readability. The commands should be reassembled into a single line prior to execution.

Quick Start

This section is intended to give a quick and easy way to run the Validate Tool. For a more detailed explanation on other ways to run the tool, go to the Advanced Usage section.

Validating a Product

The command below shows the recommended way to validate a single product:

 % validate product.xml -r validate-report.txt
      

This validates the given product against the latest core schema and schematron packaged with the tool and writes the results to a file.

Validating a Bundle

The command below shows the recommended way to validate a bundle:

% validate $HOME/pds/bundle -M checksum-manifest.txt -r validate-report.txt -R pds4.bundle
      

The -R flag indicates to the tool to apply bundle validation rules to the target bundle. This means that validation at the bundle level will be performed, which includes referential integrity checking among other things. Please see the Validation Rules section for more details. The -M flag performs additional checksum validation.

Command-Line Options

The following table describes the command-line options available:

Command-Line OptionDescription
-t, --target <files,directories>Explicitly specify the targets (product files, directories) to validate. Targets can be specified implicitly as well (example: Validate product.xml). For more details on target specification, see the Specifying Targets section.
-R, --rule <validation rule name>Specify the validation rules to apply. Valid values are "pds4.bundle", "pds4.collection", "pds4.folder", "pds4.label", or "pds3.volume". Default is to use "pds4.label" for target file inputs and to use "pds4.folder" for target directory inputs. See Validation Rules for more details.
-M, --checksum-manifest <file>Specify a Checksum Manifest file to perform additional checksum validation. For more details on checksum vaildation, see the Checksum Manifest File Validation section.
-B, --base-path <file>Specify a base path in order for the tool to properly resolve relative file references found in a checksum manifest file. For more details on checksum vaildation, see the Checksum Manifest File Validation section.
-x, --schema <schemas>Specify XML Schema files to use during validation. By using this flag, this will override using the PDS XML Schemas packaged with the tool. When passing in multiple schemas, specify the schema for the pds namespace (otherwise known as the core schema) first, followed by the other schemas. For more details on passing in multiple schemas, see the Passing in Multiple Schemas section.
-S, --schematron <schematrons>Specify Schematron files to use during validation. By using this flag, this will override using the PDS Schematron files packaged with the tool.
-C, --catalog <xml-catalogs>Specify XML Catalog files to use during validation.
-r, --report-file <file>Specify the report file name. Default is to output results to standard out.
-s, --report-style <json|xml>Specify the standard human-readable report format. Valid values are "full" for a full view, "xml" for an XML view, or "json" for a JSON view. Default is to generate a full report if this option is not specified. For more details on these report styles, see the Report Format section.
-D, --no-data-checkSpecify to disable data content validation.
--spot-check-data <num>Specify to skip every nth record (or line in Arrays) during data content validation.
--allow-unlabeled-filesSpecify this flag to tell the tool to not check for un-labeled files in a bundle or collection.
-v, --verbose <1|2|3>Specify the severity level and above to include in the human-readable report: (1=Info, 2=Warning, 3=Error). Default is Warning and above.
-e, --regexp <file-patterns>Specify file patterns to look for when validating a target directory. This flag is ignored when validating under the pds4.collection or pds4.bundle rules. Each pattern must be surrounded in quotes (example: "*.xml"). Pattern matching is case-insensitive in Windows, but case-sensitive for other systems. The default behavior is to search for files ending in "*.xml" or "*.XML". This flag option will override this default behavior.
-m, --model-version <model>Specify a model version to use during validation. The default is to use the latest PDS4 data model (1C00). Refer to the PDS4 Information Model Page for older models that are supported. The version number should be specified without periods, for example 1000 for version 1.0.0.0.
-L, --localValidate files only in the target directory instead of recursively traversing down the sub-directories.
-f, --forceForce the tool to validate against the schemas and schematrons specified in a label.
-c, --configSpecify a configuration file to set the tool behavior.
-E, --max-errorsSpecify the max number of errors that the tool will report on before gracefully exiting a validation run. Default is 100,000.
-V, --versionDisplay the release number and copyright information.
-h, --helpDisplay Harvest usage.

Advanced Usage

This section describes more advanced ways to run the tool, as well as its behaviors and caveats.

Tool Execution

This section demonstrates some of the ways that the tool can be executed using the command-line option flags:

  • Validating a Target Directory
  • Validating Against User-Specified Schemas
  • Validating Against User-Specified XML Catalogs
  • Validating Against User-Specified Schematron Files
  • Validating Against Label Specified Schemas and Schematrons
  • Spot Checking of Data
  • Validating Against an Older Version of the PDS4 Data Model
  • Ignoring Sub-Directories During Validation
  • Changing Tool Behaviors With The Configuration File

Validating a Target Directory

The following command demonstrates the validation of a target directory against the core PDS schemas:

% validate /home/pds/collection
        

Validating Against User-Specified Schemas and Schematrons

Specifying XML Schemas and schematrons on the command line will allow the Validate Tool to validate against the user-specified schemas and schematrons instead of those packaged with the tool. The following command demonstrates the validation of a single product label against a user-specified schema and schematron:

% validate /home/pds/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml -x /home/pds/dph_example_archive_VG2PLS/xml_schema/PDS4_PDS_1700.xsd, /home/pds/dph_example_archive_VG2PLS/xml_schema/PDS4_DPH_1700.xsd -S /home/pds/dph_example_archive_VG2PLS/xml_schema/PDS4_PDS_1700.sch        
        

The following command demonstrates the validation of a set of target files against a set of user-specified schemas:

% validate producta.xml, productb.xml -x producta.xsd, productb.xsd
        

Validating Against User-Specified XML Catalogs

The following command demonstrates the validation of a single data product against a user-specified XML Catalog:

% validate product.xml -C catalog.xml
        

Validating Against Label Specified Schemas and Schematrons

The following command demonstrates forcing the tool to validate against the schemas and schematrons specified in a given label.

% validate product.xml -f
        

Note that validating with the force flag option versus passing in the label specified schemas using the -x flag may yield different validation results. This is due to how the underlying Xerces library treats these 2 scenarios. Basically, with the force flag behavior, namespaces are resolved by matching the element namespace in the label to one of the namespaces declared in the schemaLocation attribute of the label. When passing in schemas using the -x flag, those user given schemas override any schemas specified in the schemaLocation attribute of the label. They are read in and cached in memory prior to the validation step. Namespaces are then resolved by matching the element namespace to one defined in one of those schemas cached in memory.

Spot Checking of Data

By default, the tool performs a byte by byte validation of the data content. This can lead to long validation run times for large tables and arrays. So in order to do some quick validation of the data content, the --spot-check-data flag option can be used. The command below demonstrates running the Validate Tool with spot checking turned on:

% validate product.xml --spot-check-data 100
        

In the above example, assuming the data is a table, the tool will perform content validation on every 100th record in the table. If the data content is an array, the tool will perform content validation on every 100th line in the array.

Validating Against an Older Version of the PDS4 Data Model

The following command demonstrates the validation of a single data product label against version 1000 (1.0.0.0) of the PDS4 data model:

% validate product.xml -m 1000
        

Ignoring Sub-Directories During Validation

By default, the Validate Tool will recursively traverse a target directory during validation. The local flag option is used to tell the Validate Tool to not perform recursion. The following command demonstrates the validation of a target directory without directory recursion:

% validate /home/pds/collection -L
        

Changing Tool Behaviors With The Configuration File

A configuration file can be passed into the command-line to change the default behaviors of the tool and to also provide users a way to perform validation with a single flag. For more details on how to setup the configuration file, see the Using a Configuration File section.

The following command demonstrates performing validation using a configuration file:

% validate -c config.txt
        

Specifying Targets

Targets are validated in the order in which they are specified on the command-line. They can be specified implicitly and explicitly.

To specify targets implicitly, it is best to specify them first on the command-line before any other command-line option flags. The following command demonstrates the validation of an implicitly defined, single target product label:

% validate product.xml
        

The following command demonstrates the validation of implicitly defined, multiple targets:

% validate product.xml, /home/pds/collection
        

Note: Implicit targets should not be specified after option flags that allow multiple arguments (see example below). Unexpected results can occur.

% validate -x product.xsd product.xml
        

In this example, the Validate Tool will inadvertently treat the implicit target, product.xml, as a schema file.

Targets can be specified both implicitly and explicitly at the same time. Targets specified implicitly are validated first, followed by those that are specified explicitly with the target flag.

The following command demonstrates the validation of multiple product labels, specified both implicitly and explicitly:

% validate producta.xml, productb.xml -t productc.xml, /home/pds/collection
        

In this example, producta.xml and productb.xml will get validated first, then productc.xml and the product labels in /home/pds/collection will get validated next.

In each scenario above, the target product(s) were the equivalent of observational or document products. The data model also consists of bundle and collection products, which in turn reference other products. When the Validate Tool encounters one of these products, it traverses the inventory associated with that product and validates each product referenced as well as the target product.

Validation Rules

The Validate Tool provides the capability of doing various additional checks passed the usual PDS4 label validation. This is done through the -R, --rule flag option. The valid values are the following:

  • pds4.label
  • pds4.folder
  • pds4.collection
  • pds4.bundle
  • pds3.volume

If the -R flag is not specified on the command-line, the default behavior is to use the pds4.label rule for target file inputs and to use the pds4.folder rule for target directory inputs. This section details the checks that are made for each of the rule types. The command-line run examples below reference the example PDS4 Archive Bundle that can be found on the PDS4 web site.

pds4.label

The pds4.label validation rule will:

  • Check that a product label conforms to a given schema and schematron.
  • Check that files referenced in a product label exist.
  • Check that the file references in a product label and the file on the file system match case.
  • Check that file reference checksums in a product label matches their actual checksums, if present.
  • Perform checksum validation against a Checksum Manifest file, if supplied.

The following example demonstrates validating a product label using the default pds4.label validation rule:

 
%> validate /home/pds/dph_example_archive_VG2PLS/data/ele_mom_tblChar.xml
        

pds4.folder

The pds4.folder validation rule allows the tool to do a file-by-file validation using the pds4.label rule on each file found in the target directory.

The following example demonstrates performing a file-by-file validation on a target directory:

%> validate /home/pds/dph_example_archive_VG2PLS        
        

pds4.collection

The pds4.collection validation rule applies the pds4.label validation rule on each product label found within a target collection. In addition, it will:

  • Check that file names follow the file-naming rules as defined in section 6C.1.1 of the PDS4 Standards Reference.
  • Check that file names do not match the prohibitied file names as defined in section 6C.1.2 of the PDS4 Standards Reference.
  • Check that file names do not contain the prohibitied base names as defined in section 6C.1.4 of the PDS4 Standards Reference.
  • Check that directory names follow directory-naming rules as defined in section 6C.2.1 of the PDS4 Standards Reference.
  • Check that directory names do not match prohibited directory names as defined in section 6C.2.3 of the PDS4 Standards Reference.
  • Check that all files in the target collection are referenced by some label.
  • Perform referential integrity checking among the target collection and its members.
  • Verify that the LID of the collection is used as the base of the LIDs of products that are members of the collection.

The following example demonstrates applying a PDS4 Collection validation rule on a target collection:

%> validate /home/pds/dph_example_archive_VG2PLS/data -R pds4.collection
        

Note that the -e flag option to filter on specific files is ignored when validating under the pds4.collection rules.

pds4.bundle

The pds4.bundle validation rule applies the pds4.collection validation rule on each collection found within a target collection. In addition, it will:

  • Check that files and directories at the root of the target bundle are valid as defined in section 2B.2.2.1 of the PDS4 Standards Reference.
  • Perform referential integrity checking among the target bundle and its members.
  • Verify that the LID of the bundle is used as the base of the LIDs of collections that are members of the bundle.

The following example demonstrates running the tool using the PDS4 Bundle validation rules on a target bundle:

%> validate /home/pds/dph_example_archive_VG2PLS -R pds4.bundle
        

Note that the -e flag option to filter on specific files is ignored when validating under the pds4.bundle rules.

pds3.volume

The pds3.volume validation rule allows the Validate Tool to perform PDS3 volume validation on a target directory. Currently, only local targets are supported for this validation rule.

The following example demonstrates running the tool against a PDS3 volume:

%> validate /home/pds/VG2-J-PLS-5-SUMM-ELE-MOM-96.0SEC-V1.0 -R pds3.volume
        

Referential Integrity Checking

The Validate Tool performs referential integrity checking within a given target directory when the pds4.collection or pds4.bundle validation rules are applied to a validation run (via the -R, --rule flag option). Assuming that the target directory points to a full PDS4 bundle, the following referential integrity checks are made as the tool validates each product:

For Bundle products,

  • Verify that each Collection member of a Bundle is present within the given target.
  • Verify that each Collection member of a Bundle is referenced only once within the given target.

For Collection products,

  • Verify that each Collection product is referenced by a Bundle product within the given target.
  • Verify that each Product member of a Collection is present within the given target.
  • Verify that each Product member of a Collection is referenced only once within the given target.

All other products are assumed to be members of a Collection. So for these, the tool will verify that each product is referenced by a Collection product within the given target.

It is important to note that the tool only performs referential integrity checking on the targets that the tool is given. As an example, it is fine to specify a target directory that contains just a Collection product and its members:

% validate ${HOME}/bundle/collection -R pds4.collection
        

In this case, the tool will not report that the given Collection is not referenced by a Bundle product.

Another thing to note is that the report will record verified references under the INFO message severity level. So in order to see these messages, run the tool with the verbose level set to 1 to produce a report with the INFO messages. The following command demonstrates running the tool with the severity level set to INFO and above:

% validate ${HOME}/bundle1 -R pds4.bundle -v1
        

Below is a snippet of an example report of a validation run using the pds4.bundle rules and the severity level set to INFO and above:

        ...

PDS4 Bundle Level Validation Results

  PASS: file:/home/pds4/dph_example_archive_VG2PLS/browse/Collection_browse.xml
      INFO  The member 'urn:nasa:pds:example.dph.sample_archive_bundle:browse:ele_mom::1.0' is identified in the following product: \
      file:/home/pds4/dph_example_archive_VG2PLS/browse/ele_mom_browse.xml
      INFO  Identifier 'urn:nasa:pds:example.dph.sample_archive_bundle:browse::1.0' is a member of \
      'file:/home/pds4/dph_example_archive_VG2PLS/bundle.xml'

  PASS: file:/home/pds4/dph_example_archive_VG2PLS/context/Collection_context.xml
      INFO  The member 'urn:nasa:pds:context:instrument_host:spacecraft.vg2::1.0' is identified in the following product: \
      file:/home/pds4/dph_example_archive_VG2PLS/context/PDS4_host_VG2_1.0.xml
      INFO  The member 'urn:nasa:pds:context:instrument:pls.vg2::1.0' is identified in the following product: \
      file:/home/pds4/dph_example_archive_VG2PLS/context/PDS4_inst_PLS_VG2_1.0.xml
      INFO  The member 'urn:nasa:pds:context:investigation:mission.voyager::1.0' is identified in the following product: \
      file:/home/pds4/dph_example_archive_VG2PLS/context/PDS4_mission_VOYAGER_1.0.xml
      INFO  The member 'urn:nasa:pds:context:target:planet.jupiter::1.0' is identified in the following product: \
      file:/home/pds4/dph_example_archive_VG2PLS/context/PDS4_target_JUPITER_1.0.xml
      INFO  Identifier 'urn:nasa:pds:example.dph.sample_archive_bundle:context::1.0' is a member of \
      'file:/home/pds4/dph_example_archive_VG2PLS/bundle.xml'
        ...
        
        

In the case that 2 targets are specified on the command line, the tool will treat each target as a separate entity when performing referential integrity checking. As an example, the following command demonstrates performing referential integrity on 2 different bundles:

% validate ${HOME}/bundle1, ${HOME}/bundle2 -R pds4.bundle
        

The tool will perform referential integrity checking within the products located in the ${HOME}/bundle1 target, then will perform referential integrity checking within the products located in the ${HOME}/bundle2 target. In other words, the tool will not cross over to the ${HOME}/bundle2 directory to find LID/LIDVID references of a product in the ${HOME}/bundle1 directory.

Checksum Manifest File Validation

When a Checksum Manifest file is passed into the tool, the generated checksum value of a file is compared against the supplied value in the Manifest file.

The dph example archive bundle, which can be downloaded from the PDS web site, contains an example of how a Checksum Manifest file could look like:

97a7569daeaacf57b5abceca57bdca43  .\Product_Bundle.xml
e7da7276dd553f30496b868a2007bf5b  .\README.TXT
84f68c7706f379c401e2f8e08a82edea  .\browse\Collection_browse.xml
bcb852a6a9292e304a93668e6cc2068c  .\browse\Collection_browse_inventory.tab
3ff61c98cbed11e99690f76b5f6831b0  .\browse\ELE_MOM.PDF
f2e5969b0f1a0f54e530e416b7f5d54a  .\browse\ele_mom_browse.xml
76d6463510bc48233d65b46d08087ef8  .\context\Collection_context.xml
b8f5301ad7c868c76f0e627f0da19aed  .\context\Collection_context_inventory.tab
c226a6a0867e003696a752b8c24e56f3  .\context\PDS4_host_VG2_1.0.xml
...
        

It is importatnt to note that the tool supports either absolute or relative file references specified in a Checksum Manifest file. In the event that the file references are relative paths, the tool assumes that the target root is the base path of these file references. The Parameter section of the Validate Tool Report will indicate the base path that the tool uses to resolve relative file references in a Manifest file. This is found under the setting Manifest File Base Path.

The following command demonstrates performing Checksum Manifest file validation against the dph example archive bundle using its manifest file bundle_checksums.txt:

% validate /home/pds4/dph_example_archive_VG2PLS -M /home/pds4/dph_example_archive_VG2PLS/bundle_checksums.txt
        

In the example above, the base path of the file references in the bundle_checksums.txt file is assumed to be /home/pds4/dph_example_archive_VG2PLS.

Another use case is doing checksum manifest file validation on a sub-directory of a bundle. Using the dph example archive bundle example, we would need to tell the tool to use a different base than the target root since the file references in the bundle_checksums.txt file would not match up with the target sub-directory. This can be done using the -B flag option. The following command demonstrates performing Checksum Manifest file validation on a a sub-directory of the dph example archive bundle:

% validate /home/pds4/dph_example_archive_VG2PLS/browse -M /home/pds4/dph_example_archive_VG2PLS/bundle_checksums.txt -B /home/pds4/dph_example_archive_VG2PLS
        

In the command line example above, the base path is set to /home/pds4/dph_example_archive_VG2PLS.

In the event that multiple targets are specified when performing checksum manifest file validation, the -B flag option must be specified. Continuing with the dph example archive bundle example, the following command demonstrates Checksum Manifest file validation on multiple sub-directories of the bundle:

% validate /home/pds4/dph_example_archive_VG2PLS/browse, /home/pds4/dph_example_archive_VG2PLS/context \ 
-M /home/pds4/dph_example_archive_VG2PLS/bundle_checksums.txt \
-B /home/pds4/dph_example_archive_VG2PLS
        

Using an XML Catalog

An XML Catalog allows the user to describe a mapping between external entity references in their products and locally-available XML Schema and Schematron documents. This section details some of the ways that the Catalog file can be set up that should support most use-case scenarios. The examples can be used to validate the ele_mom_tblChar.xml product label that can be found in the Data Provider's Handbook (DPH) example bundle, which is made available at the PDS website.

Mapping Label-Defined Schemas and Schematrons to Local Copies

The following is an example of an XML Catalog that maps schemas and schematrons defined in the product label to local copies:

<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
    <group xml:base="file:///home/pds/dph_example_archive_VG2PLS/xml_schema/">
        <uri name="https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1700.xsd"
            uri="PDS4_PDS_1700.xsd"/>
        <uri name="https://pds.nasa.gov/pds4/dph/v1/PDS4_DPH_1700.xsd"
            uri="PDS4_DPH_1700.xsd"/>
        <uri name="https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1700.sch"
            uri="PDS4_PDS_1700.sch"/>
    </group>
</catalog>
        

The xml:base attribute sets the top-level directory to where the local copies of the schemas and schematrons can be found. So, in this example, the https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1700.xsd schema maps to the local schema at file:///home/pds/dph_example_archive_VG2PLS/xml_schema/PDS4_PDS_1700.xsd. Note that the xml:base attribute value must end in the backslash '/' character in order for the tool to properly resolve the URI references set in the Catalog file.

Mapping Namespaces to Local Copies of Schemas

The following is an example of an XML Catalog that maps namespaces defined in the product label to local copies of the Schemas:

<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <group xml:base="file:///home/pds/dph_example_archive_VG2PLS/xml_schema/">
    <uri name="http://pds.nasa.gov/pds4/pds/v1"
         uri="PDS4_PDS_1700.xsd"/>
    <uri name="http://pds.nasa.gov/pds4/dph/v1"
         uri="PDS4_DPH_1700.xsd"/>
    <uri name="https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1700.sch"
         uri="PDS4_PDS_1700.sch"/>
  </group>
</catalog>
        

In this example, the namespace http://pds.nasa.gov/pds4/pds/v1 maps to the schema file:///home/pds/dph_example_archive_VG2PLS/xml_schema/PDS4_PDS_1700.xsd.

Mapping Many Schemas At Once

The following is an example of an XML Catalog that maps many Schemas and Schematrons at once based on a common initial substring in those URIs:

<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
    <rewriteURI uriStartString="https://pds.nasa.gov/pds4/pds/v1" 
                   rewritePrefix="file:///home/pds/dph_example_archive_VG2PLS/xml_schema"/>
    <rewriteURI uriStartString="https://pds.nasa.gov/pds4/dph/v1" 
                   rewritePrefix="file:///home/pds/dph_example_archive_VG2PLS/xml_schema"/>  
</catalog>
        

The uriStartString represents the prefix to look for in the product label. The rewritePrefix is the replacement string in order to resolve the URI reference. So in this example, the first entry maps the https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1700.xsd schema in the product label to file:///home/pds/dph_example_archive_VG2PLS/xml_schema/PDS4_PDS_1700.xsd and maps the https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1700.sch schematron to file:///home/pds/dph_example_archive_VG2PLS/xml_schema/PDS4_PDS_1700.sch.

Passing References to Another Catalog

The following example of an XML Catalog file passes off the DPH URI reference off to another XML Catalog named XMLCatalog_dph.xml for resolution:

<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
    <delegateURI uriStartString="https://pds.nasa.gov/pds4/dph/v1" 
                 catalog="file:///home/pds/catalog/dph/XMLCatalog_dph.xml"/>
    <rewriteURI uriStartString="https://pds.nasa.gov/pds4/pds/v1" 
                rewritePrefix="file:///home/pds/dph_example_archive_VG2PLS/xml_schema"/> 
</catalog>
        

The XMLCatalog_dph.xml would contain the following:

<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
    <rewriteURI uriStartString="https://pds.nasa.gov/pds4/dph/v1" 
                rewritePrefix="file:///home/pds/dph_example_archive_VG2PLS/xml_schema"/>  
</catalog>
        

In the first example, the uriStartString value in the delegateURI element matches the https://pds.nasa.gov/pds4/dph/v1/PDS4_DPH_1700.xsd DPH schema in the product label, so the tool passes this off to the XMLCatalog_dph.xml for resolution.

Using a Configuration File

A configuration file is an alternative way to set the different behaviors of the tool instead of the command-line option flags. It consists of a text file made up of keyword/value pairs. The configuration file follows the syntax of the stream parsed by the Java Properties.load(java.io.InputStream) method.

Some of the important syntax rules are as follows:

  • Blank lines and lines which begin with the hash character "#" are ignored.
  • Values may be separated on different lines if a backslash is placed at the end of the line that continues below.
  • Escape sequences for special characters like a line feed, a tabulation or a unicode character, are allowed in the values and are specified in the same notation as those used in Java strings (e.g. \n, \t, \r).

Since backslashes (\) have special meanings in a configuration file, keyword values that contain this character will not be interpreted properly by the Validate Tool even if it is surrounded by quotes. A common example would be a Windows path name (e.g. c:\pds\collection). Use the forward slash character instead (c:/pds/collection) or escape the backslash character (c:\\pds\\collection).

Note: Any flag specified on the command-line takes precedence over any equivalent settings placed in the configuration file.

The following table contains valid keywords that can be specified in the configuration file:

Property KeywordAssociated Command-Line Option
validate.target-t, --target
validate.rule-R, --rule
validate.checksum-M, --checksum-manifest
validate.basePath-B, --base-path
validate.catalog-C, --catalog
validate.schema-x, --xsd
validate.schematron-S, --schematron
validate.noDataCheck-D, --no-data-check
validate.spotCheckData--spot-check-data
validate.allowUnlabeledFiles--allow-unlabeled-files
validate.report-r, --report-file
validate.verbose-v, --verbose
validate.reportStyle-s, --report-style
validate.regexp-e, --regexp
validate.local-L, --local
validate.model-m, --model-version
validate.force-f, --force
validate.maxErrors-E, --max-errors

The following example demonstrates how to set a configuration file:

# This is a Validate Tool configuration file

validate.target = ./collection
validate.report = report.txt
validate.regexp = "*.xml"
        

This is equivalent to running the tool with the following flags:

-t ./collection -e "*.xml" -r report.txt
        

The following example demonstrates how to set a configuration file with multiple values for a keyword:

# This is a Validate Tool configuration file with multiple values

validate.target = product.xml, ./collection
validate.regexp = "*.xml", "Mars*"
        

This is equivalent to running the tool with the following flags:

-t product.xml, ./collection -e "*.xml", "Mars*"
        

The following example demonstrates how to set a configuration file with multiple values that span across multiple lines:

# This is a Validate configuration file with multiple values
# that span across multiple lines

validate.target = product.xml, \
                  ./collection
validate.regexp = "*.xml", \
                  "Mars*"
        

As previously mentioned, any flag options set on the command-line will overwrite settings set in the configuration file. The following example demonstrates how to override a setting in the configuration file.

Suppose the configuration file named config.txt is defined as follows:

validate.target = ./collection
validate.regexp = "*.xml"
        

This configuration allows the tool to validate files with a .xml extension in the collection directory. To change the behavior to validate all files instead of just files ending in .xml, then specify the regexp flag option on the command-line to overwrite the validate.regexp property:

% validate -c config.txt -e "*"
        

Passing in Multiple Schemas

The Validate Tool allows multiple XML Schemas to be passed in through the command-line via the -x flag option. When passing in multiple schemas, the definitions found in each file are merged internally. In the case where two schemas are passed in and both define the same element under the same namespace, then the definition in the first schema passed in will take precedence over the second schema. As an example, suppose a schema file, schema1.xsd, contains the following definition:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
  targetNamespace="http://pds.nasa.gov/pds4/pds/v1"
  xmlns:pds="http://pds.nasa.gov/pds4/pds/v1"
  elementFormDefault="qualified"
  attributeFormDefault="unqualified"
  version="1.1.0.0">

...

<xs:complexType name="File_Area_Browse">
  <xs:annotation>
    <xs:documentation> The File Area Browse class describes a file and one or more tagged_data_objects contained within the file. </xs:documentation>
  </xs:annotation>
  <xs:complexContent>
    <xs:extension base="pds:File_Area">
      <xs:sequence>
        <xs:element name="File" type="pds:File" minOccurs="1" maxOccurs="1"> </xs:element>
        <xs:choice minOccurs="1" maxOccurs="unbounded">
          <xs:element name="Array_1D" type="pds:Array_1D"> </xs:element>
          <xs:element name="Array_2D" type="pds:Array_2D"> </xs:element>
          <xs:element name="Array_2D_Image" type="pds:Array_2D_Image"> </xs:element>
          <xs:element name="Array_2D_Map" type="pds:Array_2D_Map"> </xs:element>
          <xs:element name="Array_2D_Spectrum" type="pds:Array_2D_Spectrum"> </xs:element>
          <xs:element name="Array_3D" type="pds:Array_3D"> </xs:element>
          <xs:element name="Array_3D_Image" type="pds:Array_3D_Image"> </xs:element>
          <xs:element name="Array_3D_Movie" type="pds:Array_3D_Movie"> </xs:element>
          <xs:element name="Array_3D_Spectrum" type="pds:Array_3D_Spectrum"> </xs:element>
          <xs:element name="Encoded_Header" type="pds:Encoded_Header"> </xs:element>
          <xs:element name="Encoded_Image" type="pds:Encoded_Image"> </xs:element>
          <xs:element name="Header" type="pds:Header"> </xs:element>
          <xs:element name="Stream_Text" type="pds:Stream_Text"> </xs:element>
          <xs:element name="Table_Binary" type="pds:Table_Binary"> </xs:element>
          <xs:element name="Table_Character" type="pds:Table_Character"> </xs:element>
          <xs:element name="Table_Delimited" type="pds:Table_Delimited"> </xs:element>
        </xs:choice>
      </xs:sequence>
    </xs:extension>
  </xs:complexContent>
</xs:complexType>

...
        

Suppose the other schema, schema2.xsd, contains the same element definition:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
  targetNamespace="http://pds.nasa.gov/pds4/pds/v1"
  xmlns:pds="http://pds.nasa.gov/pds4/pds/v1"
  elementFormDefault="qualified"
  attributeFormDefault="unqualified"
  version="1.0.0.0">

...

<xs:complexType name="File_Area_Browse">
  <xs:annotation>
    <xs:documentation> The File Area Browse class describes a file and one or more tagged_data_objects contained within the file. </xs:documentation>
  </xs:annotation>
  <xs:complexContent>
    <xs:extension base="pds:File_Area">
      <xs:sequence>
        <xs:element name="File" type="pds:File" minOccurs="1" maxOccurs="1"> </xs:element>
        <xs:choice minOccurs="1" maxOccurs="unbounded">
          <xs:element name="Array_2D" type="pds:Array_2D"> </xs:element>
          <xs:element name="Array_2D_Image" type="pds:Array_2D_Image"> </xs:element>
          <xs:element name="Array_2D_Map" type="pds:Array_2D_Map"> </xs:element>
          <xs:element name="Array_2D_Spectrum" type="pds:Array_2D_Spectrum"> </xs:element>
          <xs:element name="Array_3D" type="pds:Array_3D"> </xs:element>
          <xs:element name="Array_3D_Image" type="pds:Array_3D_Image"> </xs:element>
          <xs:element name="Array_3D_Movie" type="pds:Array_3D_Movie"> </xs:element>
          <xs:element name="Array_3D_Spectrum" type="pds:Array_3D_Spectrum"> </xs:element>
          <xs:element name="Encoded_Header" type="pds:Encoded_Header"> </xs:element>
          <xs:element name="Encoded_Image" type="pds:Encoded_Image"> </xs:element>
          <xs:element name="Header" type="pds:Header"> </xs:element>
          <xs:element name="Stream_Text" type="pds:Stream_Text"> </xs:element>
          <xs:element name="Table_Binary" type="pds:Table_Binary"> </xs:element>
          <xs:element name="Table_Character" type="pds:Table_Character"> </xs:element>
          <xs:element name="Table_Delimited" type="pds:Table_Delimited"> </xs:element>
        </xs:choice>
      </xs:sequence>
    </xs:extension>
  </xs:complexContent>
</xs:complexType>

...
        

If the schemas are passed into the Validate Tool as follows:

% validate product.xml -x schema1.xsd, schema2.xsd
        

then the File_Area_Browse definition from the schema1.xsd file takes precedence over the schema2.xsd file. If it was passed into the tool in the reverse order, then the File_Area_Browse definition in the schema2.xsd file will take precedence over the one in the schema1.xsd file.

Context Product Reference Validation

The resources/ folder in the Validate Tool Release Package contains a JSON-formatted file that contains a list intended to represent a snapshot of the Context Product LIDVIDs (Logical Identifier/Version Identifier) currently registerd at the PDS Engineering Node. This file is read in at execution time so that the tool can validate that Context Products referenced in a product label exist within this supplied list. In the event that you would like the tool to check for additional Context Products that are not a part of this supplied list, simply edit the resources/registered_context_products.json file and add to the existing list.

Report Format

This section describes the contents of the Validate Tool report. The links below detail the validation results of the same run for each format.

The tool can represent a validation report in three different formats: a full, XML, or JSON format. The report style option is used to change the formatting. When this option is not specified on the command-line, the default is to generate a full report.

Full Report

In a full report, the location, severity, message type, and textual description of each detected anomaly is reported. A 'PASS', 'FAIL', or 'SKIP' keyword is displayed next to each file to indicate when a file has passed, failed, or skipped PDS validation, respectively. A summary is printed at the end indicating the total number of warnings and errors, followed by a summary of each of the different message types that were found in the validation run.

XML Report

In an XML report, the contents are the same as the full report.

JSON Report

In a JSON report, the contents are the same as the full report.

Common Errors

White spaces are required error

Execution of the Validate Tool may result in the following message appearing in the log:

FAIL: file:/Users/.../hi0173794441_9080000_001_r.xml
    FATAL_ERROR  line 1, 55: White spaces are required between publicId and systemId.
      

The message above is generated by the underlying Xerces library that is utilized by the Validate Tool for XML Schema validation. Although not very intuitive, the message normally indicates that the XML Schema for the default namespace of the target label is missing. In the example above the default namespace was "http://pds.nasa.gov/pds4/pds/v03" but the XML Schema file describing that namespace (PDS4_PDS_0300a.xsd) was not provided to the tool at runtime.

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

Running the tool against a large bundle may result in an OutOfMemoryError exception appearing in the standard error similar to the following:

Sep 19, 2017 12:02:39 PM gov.nasa.pds.tools.label.LocationValidator validate
INFO: Using validation style 'PDS4 Directory' for location file:/home/atmos7/anonymous/PDS/data/PDS4/MAVEN/iuvs_calibrated_bundle/
Sep 19, 2017 12:02:39 PM gov.nasa.pds.tools.validate.task.ValidationTask execute
INFO: Starting validation task for location 'file:/home/atmos7/anonymous/PDS/data/PDS4/MAVEN/iuvs_calibrated_bundle/'
Sep 22, 2017 7:07:31 AM gov.nasa.pds.tools.validate.task.ValidationTask execute
INFO: Validation complete for location 'file:/home/atmos7/anonymous/PDS/data/PDS4/MAVEN/iuvs_calibrated_bundle/'
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded      
      

When this OutOfMemoryError exception is thrown, no report is generated. The cause of this issue is due to the tool caching the valdiation results in memory until the end of the validation run. To resolve this issue, the JVM heap space setting allocations should be increased. It is recommended to increase the heap space settings to -Xms4096m -Xmx8192m. The following details how to update these settings in the tool depending on the target platform.

For Unix-Based Environments,

Update the JVM heap space allocation settings in the validate shell script to the following:

"${JAVA_HOME}"/bin/java -Xms4096m -Xmx8192m -jar ${VALIDATE_JAR} "$@"
      

For Windows Environments,

Update the JVM heap space allocation settings in the validate.bat batch file to the following:

"%JAVA_HOME%"\bin\java -Xms4096m -Xmx8192m -jar "%VALIDATE_JAR%" %*
      

No checksum found in the manifest errors

When performing Checksum Manifest file validation, having the wrong base path setting will result in multiple errors like the following:

FAIL: file:/home/pds4/dph_example_archive_VG2PLS/browse/Collection_browse.xml
    ERROR  No checksum found in the manifest for 'file:/home/pds4/dph_example_archive_VG2PLS/browse/Collection_browse.xml'.     
      

To resolve this issue, check that the base path setting correctly resolves the relative file references (if present) in the Checksum Manifest file by looking at the Manifest File Base Path parameter specified at the top of the Validate Tool report. If the base path is incorrect, specify the correct one on the command line using the -B, --base-path flag option.