This document describes how to operate the Validate Tool. The following topics can be found in this document:
Note: The command-line examples in this section have been broken into multiple lines for readability. The commands should be reassembled into a single line prior to execution.
This section is intended to give a quick and easy way to run the Validate Tool. For a more detailed explanation on other ways to run the tool, go to the Advanced Usage section.
Validating a Product
The command below shows the recommended way to validate a single product:
% validate product.xml -r validate-report.txt
This validates the given product against the latest core schema and schematron packaged with the tool and writes the results to a file.
Validating a Bundle
The command below shows the recommended way to validate a bundle:
% validate $HOME/pds/bundle -i -M checksum-manifest.txt -r validate-report.txt
The -i flag indicates to the tool to perform referential integrity checking while the -M flag performs additional checksum validation. For more details on what is involved with referential integrity checking, go to the Referential Integrity Checking section.
The following table describes the command-line options available:
Command-Line Option | Description |
---|---|
-t, --target <files,directories> | Explicitly specify the targets (product files, directories) to validate. Targets can be specified implicitly as well (example: Validate product.xml). For more details on target specification, see the Specifying Targets section. |
-i, --integrity-check | Perform referential integrity checking on a target directory. The tool will not accept a target file specification when this feature is enabled. For more details, see the Referential Integrity Checking section. |
-M, --checksum-manifest <file(s)> | Specify a Checksum Manifest file to perform additional checksum validation. |
-x, --schema <schemas> | Specify XML Schema files to use during validation. By using this flag, this will override using the PDS XML Schemas packaged with the tool. When passing in multiple schemas, specify the schema for the pds namespace (otherwise known as the core schema) first, followed by the other schemas. For more details on passing in multiple schemas, see the Passing in Multiple Schemas section. |
-S, --schematron <schematrons> | Specify Schematron files to use during validation. By using this flag, this will override using the PDS Schematron files packaged with the tool. |
-C, --catalog <xml-catalogs> | Specify XML Catalog files to use during validation. |
-r, --report-file <file> | Specify the report file name. Default is to output results to standard out. |
-s, --report-style <json|xml> | Specify the standard human-readable report format. Valid values are "full" for a full view, "xml" for an XML view, or "json" for a JSON view. Default is to generate a full report if this option is not specified. For more details on these report styles, see the Report Format section. |
-v, --verbose <1|2|3> | Specify the severity level and above to include in the human-readable report: (1=Info, 2=Warning, 3=Error). Default is Warning and above. |
-e, --regexp <file-patterns> | Specify file patterns to look for when validating a target directory. Each pattern must be surrounded in quotes (example: "*.xml"). Pattern matching is case-insensitive in Windows, but case-sensitive for other systems. The default behavior is to search for files ending in "*.xml" or "*.XML". This flag option will override this default behavior. |
-m, --model-version <model> | Specify a model version to use during validation. The default is to use the latest PDS4 data model (1300). The other models supported in this release include: 1000, 1100, 1101, 1200 and 1201. |
-L, --local | Validate files only in the target directory instead of recursively traversing down the sub-directories. |
-f, --force | Force the tool to validate against the schemas and schematrons specified in a label. |
-c, --config | Specify a configuration file to set the tool behavior. |
-V, --version | Display the release number and copyright information. |
-h, --help | Display Harvest usage. |
This section describes more advanced ways to run the tool, as well as its behaviors and caveats.
This section demonstrates some of the ways that the tool can be executed using the command-line option flags:
Validating a Target Directory
The following command demonstrates the validation of a target directory against the core PDS schemas:
% validate /home/pds/collection
Validating Against User-Specified Schemas
Specifying XML Schemas on the command line will allow the Validate Tool to validate against the user-specified schemas instead of the schemas packaged with the tool. The following command demonstrates the validation of a single product label against a user-specified schema:
% validate product.xml -x product.xsd
The following command demonstrates the validation of a set of target files against a set of user-specified schemas:
% validate producta.xml, productb.xml -x producta.xsd, productb.xsd
Validating with Referential Integrity Checking
The following command demonstrates validating a bundle target with referential integrity checking enabled:
% validate ${HOME}/bundle -i
Validating Against a Checksum Manifest File
By default, the tool performs checksum validation against the file references specified in a product label. It generates an MD5 checksum for each file reference found and verifies that the generated checksum value matches the supplied checksum in the product label. When a Checksum Manifest file is passed into the tool, the tool will additionally verify that each generated checksum matches the supplied checksum in the given manifest file.
The following command demonstrates validating a bundle target against a Checksum Manifest file:
% validate ${HOME}/bundle -M ${HOME}/bundle_checksums.txt
Note that the tool expects the file references specified in the Checksum Manifest file to either be absolute or relative paths.
Validating Against User-Specified XML Catalogs
The following command demonstrates the validation of a single data product against a user-specified XML Catalog:
% validate product.xml -C catalog.xml
Validating Against User-Specified Schematron Files
Specifying Schematron files on the command-line will allow the Validate Tool to validate against the user-specified Schematron files instead of the Schematron files packaged with the tool. The following command demonstrates the vadation of a single data product against a user-specified Schematron:
% validate product.xml -S product.sch
Validating Against Label Specified Schemas and Schematrons
The following command demonstrates forcing the tool to validate against the schemas and schematrons specified in a given label.
% validate product.xml -f
Validating Against an Older Version of the PDS4 Data Model
The following command demonstrates the validation of a single data product label against version 1000 of the PDS4 data model:
% validate product.xml -m 1000
Ignoring Sub-Directories During Validation
By default, the Validate Tool will recursively traverse a target directory during validation. The local flag option is used to tell the Validate Tool to not perform recursion. The following command demonstrates the validation of a target directory without directory recursion:
% validate /home/pds/collection -L
Changing Tool Behaviors With The Configuration File
A configuration file can be passed into the command-line to change the default behaviors of the tool and to also provide users a way to perform validation with a single flag. For more details on how to setup the configuration file, see the Using a Configuration File section.
The following command demonstrates performing validation using a configuration file:
% validate -c config.txt
Targets are validated in the order in which they are specified on the command-line. They can be specified implicitly and explicitly.
To specify targets implicitly, it is best to specify them first on the command-line before any other command-line option flags. The following command demonstrates the validation of an implicitly defined, single target product label:
% validate product.xml
The following command demonstrates the validation of implicitly defined, multiple targets:
% validate product.xml, /home/pds/collection
Note: Implicit targets should not be specified after option flags that allow multiple arguments (see example below). Unexpected results can occur.
% validate -x product.xsd product.xml
In this example, the Validate Tool will inadvertently treat the implicit target, product.xml, as a schema file.
Targets can be specified both implicitly and explicitly at the same time. Targets specified implicitly are validated first, followed by those that are specified explicitly with the target flag.
The following command demonstrates the validation of multiple product labels, specified both implicitly and explicitly:
% validate producta.xml, productb.xml -t productc.xml, /home/pds/collection
In this example, producta.xml and productb.xml will get validated first, then productc.xml and the product labels in /home/pds/collection will get validated next.
In each scenario above, the target product(s) were the equivalent of observational or document products. The data model also consists of bundle and collection products, which in turn reference other products. When the Validate Tool encounters one of these products, it traverses the inventory associated with that product and validates each product referenced as well as the target product.
The Validate Tool performs referential integrity checking within a given target directory when the -i, --integrity-check flag option is specified on the command line. The tool makes one pass through a given target and gathers the LIDVIDs of each product, members within a Bundle product, and members within a Collection product. Assuming that the target directory points to a full PDS4 bundle, the following referential integrity checks are made as the tool validates each product:
For Bundle products,
For Collection products,
All other products are assumed to be members of a Collection. So for these, the tool will verify that each product is referenced by a Collection product within the given target.
It is important to note that the tool only performs referential integrity checking on the targets that the tool is given. As an example, it is fine to specify a target directory that contains just a Collection product and its members:
% validate ${HOME}/bundle/collection -i
In this case, the tool will not report that the given Collection is not referenced by a Bundle product.
Another thing to note is that the report will record verified references under the INFO message severity level. So in order to see these messages, run the tool with the verbose level set to 1 to produce a report with the INFO messages. The following command demonstrates running the tool with the severity level set to INFO and above:
% validate ${HOME}/bundle1 -i -v1
Below is a snippet of an example report of a validation run with integrity checking turned on and the severity level set to INFO and above:
... PASS: https://starbase.jpl.nasa.gov/pds4/1201/dph_example_archive_VG2PLS/browse/Collection_browse.xml INFO The lidvid 'urn:nasa:pds:example.dph.sample_archive_bundle:browse::1.0' is a member of the following bundle: \ https://starbase.jpl.nasa.gov/pds4/1201/dph_example_archive_VG2PLS/Product_Bundle.xml INFO The member 'urn:nasa:pds:example.dph.sample_archive_bundle:browse:ele_mom::1.0' is referenced in the following product: \ https://starbase.jpl.nasa.gov/pds4/1201/dph_example_archive_VG2PLS/browse/ele_mom_browse.xml PASS: https://starbase.jpl.nasa.gov/pds4/1201/dph_example_archive_VG2PLS/browse/ele_mom_browse.xml INFO The lidvid 'urn:nasa:pds:example.dph.sample_archive_bundle:browse:ele_mom::1.0' is a member of the following collection: \ https://starbase.jpl.nasa.gov/pds4/1201/dph_example_archive_VG2PLS/browse/Collection_browse.xml PASS: https://starbase.jpl.nasa.gov/pds4/1201/dph_example_archive_VG2PLS/context/Collection_context.xml INFO The lidvid 'urn:nasa:pds:example.dph.sample_archive_bundle:context::1.0' is a member of the following bundle: \ https://starbase.jpl.nasa.gov/pds4/1201/dph_example_archive_VG2PLS/Product_Bundle.xml INFO The member 'urn:nasa:pds:context:instrument_host:instrument_host.vg2::1.0' is referenced in the following product: \ https://starbase.jpl.nasa.gov/pds4/1201/dph_example_archive_VG2PLS/context/PDS4_host_VG2_1.0.xml INFO The member 'urn:nasa:pds:context:instrument:instrument.pls__vg2::1.0' is referenced in the following product: \ https://starbase.jpl.nasa.gov/pds4/1201/dph_example_archive_VG2PLS/context/PDS4_inst_PLS__VG2_1.0.xml INFO The member 'urn:nasa:pds:context:investigation:mission.voyager::1.0' is referenced in the following product: \ https://starbase.jpl.nasa.gov/pds4/1201/dph_example_archive_VG2PLS/context/PDS4_mission_VOYAGER_1.0.xml INFO The member 'urn:nasa:pds:context:target:planet.jupiter::1.0' is referenced in the following product: \ https://starbase.jpl.nasa.gov/pds4/1201/dph_example_archive_VG2PLS/context/PDS4_target_JUPITER_1.0.xml ...
In the case that 2 targets are specified on the command line, the tool will treat each target as a separate entity when performing referential integrity checking. As an example, the following command demonstrates performing referential integrity on 2 different bundles:
% validate ${HOME}/bundle1, ${HOME}/bundle2 -e "*.xml" -i
The tool will perform referential integrity checking within the products located in the ${HOME}/bundle1 target, then will perform referential integrity checking within the products located in the ${HOME}/bundle2 target. In other words, the tool will not cross over to the ${HOME}/bundle2 directory to find LID/LIDVID references of a product in the ${HOME}/bundle1 directory.
An XML Catalog allows the user to describe a mapping between external entity references in their products and locally-available XML Schema documents. This feature of the tool is not fully implemented and needs to be exercised with multiple PDS scenarios, but it is available to experiment with in this release. The following is an example XML Catalog file for validating the Data Provider's Handbook (DPH) example bundle. The file maps the PDS namespace to a local copy of the PDS4 Ops XML Schema document and the DPH namespace to a local copy of the DPH example dictionary XML Schema document.
<?xml version="1.0" encoding="UTF-8"?> <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> <group xml:base="file:///${HOME}/"> <uri name="http://pds.nasa.gov/pds4/pds/v03" uri="VG2PLS/schemas/PDS4_OPS_0600h.xsd"/> <uri name="http://pds.nasa.gov/pds4/dph/v01" uri="VG2PLS/local_dictionaries/dph_example_dict_0300a.xsd"/> </group> </catalog>
There is actually a third schema document Product_TableChar_tailored_0600h.xsd, that is required for validation that references the DPH dictionary schema document with an <xs:include> statement. This is where the issue occurs with the current implementation. This schema document must be passed on the command-line (instead of being specified in the XML Catalog) when executing the Validate Tool. The following command will validate DPH example bundle correctly:
% validate -t ${HOME}/VG2PLS_archive -C catalog.xml \ -x ${HOME}/VG2PLS_archive/schemas/Product_TableChar_tailored_0600h.xsd
A configuration file is an alternative way to set the different behaviors of the tool instead of the command-line option flags. It consists of a text file made up of keyword/value pairs. The configuration file follows the syntax of the stream parsed by the Java Properties.load(java.io.InputStream) method.
Some of the important syntax rules are as follows:
Since backslashes (\) have special meanings in a configuration file, keyword values that contain this character will not be interpreted properly by the Validate Tool even if it is surrounded by quotes. A common example would be a Windows path name (e.g. c:\pds\collection). Use the forward slash character instead (c:/pds/collection) or escape the backslash character (c:\\pds\\collection).
Note: Any flag specified on the command-line takes precedence over any equivalent settings placed in the configuration file.
The following table contains valid keywords that can be specified in the configuration file:
Property Keyword | Associated Command-Line Option |
---|---|
validate.target | -t, --target |
validate.integrity | -i, --integrity-check |
validate.checksum | -M, --checksum-manifest |
validate.catalog | -C, --catalog |
validate.schema | -x, --xsd |
validate.schematron | -S, --schematron |
validate.report | -r, --report-file |
validate.verbose | -v, --verbose |
validate.reportStyle | -s, --report-style |
validate.regexp | -e, --regexp |
validate.local | -L, --local |
validate.model | -m, --model-version |
validate.force | -f, --force |
The following example demonstrates how to set a configuration file:
# This is a Validate Tool configuration file validate.target = ./collection validate.report = report.txt validate.regexp = "*.xml"
This is equivalent to running the tool with the following flags:
-t ./collection -e "*.xml" -r report.txt
The following example demonstrates how to set a configuration file with multiple values for a keyword:
# This is a Validate Tool configuration file with multiple values validate.target = product.xml, ./collection validate.regexp = "*.xml", "Mars*"
This is equivalent to running the tool with the following flags:
-t product.xml, ./collection -e "*.xml", "Mars*"
The following example demonstrates how to set a configuration file with multiple values that span across multiple lines:
# This is a Validate configuration file with multiple values # that span across multiple lines validate.target = product.xml, ./collection validate.regexp = "*.xml", "Mars*"
As previously mentioned, any flag options set on the command-line will overwrite settings set in the configuration file. The following example demonstrates how to override a setting in the configuration file.
Suppose the configuration file named config.txt is defined as follows:
validate.target = ./collection validate.regexp = "*.xml"
This configuration allows the tool to validate files with a .xml extension in the collection directory. To change the behavior to validate all files instead of just files ending in .xml, then specify the regexp flag option on the command-line to overwrite the validate.regexp property:
% validate -c config.txt -e "*"
The Validate Tool allows multiple XML Schemas to be passed in through the command-line via the -x flag option. When passing in multiple schemas, the definitions found in each file are merged internally. In the case where two schemas are passed in and both define the same element under the same namespace, then the definition in the first schema passed in will take precedence over the second schema. As an example, suppose a schema file, schema1.xsd, contains the following definition:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://pds.nasa.gov/pds4/pds/v1" xmlns:pds="http://pds.nasa.gov/pds4/pds/v1" elementFormDefault="qualified" attributeFormDefault="unqualified" version="1.1.0.0"> ... <xs:complexType name="File_Area_Browse"> <xs:annotation> <xs:documentation> The File Area Browse class describes a file and one or more tagged_data_objects contained within the file. </xs:documentation> </xs:annotation> <xs:complexContent> <xs:extension base="pds:File_Area"> <xs:sequence> <xs:element name="File" type="pds:File" minOccurs="1" maxOccurs="1"> </xs:element> <xs:choice minOccurs="1" maxOccurs="unbounded"> <xs:element name="Array_1D" type="pds:Array_1D"> </xs:element> <xs:element name="Array_2D" type="pds:Array_2D"> </xs:element> <xs:element name="Array_2D_Image" type="pds:Array_2D_Image"> </xs:element> <xs:element name="Array_2D_Map" type="pds:Array_2D_Map"> </xs:element> <xs:element name="Array_2D_Spectrum" type="pds:Array_2D_Spectrum"> </xs:element> <xs:element name="Array_3D" type="pds:Array_3D"> </xs:element> <xs:element name="Array_3D_Image" type="pds:Array_3D_Image"> </xs:element> <xs:element name="Array_3D_Movie" type="pds:Array_3D_Movie"> </xs:element> <xs:element name="Array_3D_Spectrum" type="pds:Array_3D_Spectrum"> </xs:element> <xs:element name="Encoded_Header" type="pds:Encoded_Header"> </xs:element> <xs:element name="Encoded_Image" type="pds:Encoded_Image"> </xs:element> <xs:element name="Header" type="pds:Header"> </xs:element> <xs:element name="Stream_Text" type="pds:Stream_Text"> </xs:element> <xs:element name="Table_Binary" type="pds:Table_Binary"> </xs:element> <xs:element name="Table_Character" type="pds:Table_Character"> </xs:element> <xs:element name="Table_Delimited" type="pds:Table_Delimited"> </xs:element> </xs:choice> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> ...
Suppose the other schema, schema2.xsd, contains the same element definition:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://pds.nasa.gov/pds4/pds/v1" xmlns:pds="http://pds.nasa.gov/pds4/pds/v1" elementFormDefault="qualified" attributeFormDefault="unqualified" version="1.0.0.0"> ... <xs:complexType name="File_Area_Browse"> <xs:annotation> <xs:documentation> The File Area Browse class describes a file and one or more tagged_data_objects contained within the file. </xs:documentation> </xs:annotation> <xs:complexContent> <xs:extension base="pds:File_Area"> <xs:sequence> <xs:element name="File" type="pds:File" minOccurs="1" maxOccurs="1"> </xs:element> <xs:choice minOccurs="1" maxOccurs="unbounded"> <xs:element name="Array_2D" type="pds:Array_2D"> </xs:element> <xs:element name="Array_2D_Image" type="pds:Array_2D_Image"> </xs:element> <xs:element name="Array_2D_Map" type="pds:Array_2D_Map"> </xs:element> <xs:element name="Array_2D_Spectrum" type="pds:Array_2D_Spectrum"> </xs:element> <xs:element name="Array_3D" type="pds:Array_3D"> </xs:element> <xs:element name="Array_3D_Image" type="pds:Array_3D_Image"> </xs:element> <xs:element name="Array_3D_Movie" type="pds:Array_3D_Movie"> </xs:element> <xs:element name="Array_3D_Spectrum" type="pds:Array_3D_Spectrum"> </xs:element> <xs:element name="Encoded_Header" type="pds:Encoded_Header"> </xs:element> <xs:element name="Encoded_Image" type="pds:Encoded_Image"> </xs:element> <xs:element name="Header" type="pds:Header"> </xs:element> <xs:element name="Stream_Text" type="pds:Stream_Text"> </xs:element> <xs:element name="Table_Binary" type="pds:Table_Binary"> </xs:element> <xs:element name="Table_Character" type="pds:Table_Character"> </xs:element> <xs:element name="Table_Delimited" type="pds:Table_Delimited"> </xs:element> </xs:choice> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> ...
If the schemas are passed into the Validate Tool as follows:
% ./validate product.xml -x schema1.xsd, schema2.xsd
then the File_Area_Browse definition from the schema1.xsd file takes precedence over the schema2.xsd file. If it was passed into the tool in the reverse order, then the File_Area_Browse definition in the schema2.xsd file will take precedence over the one in the schema1.xsd file.
This section describes the contents of the Validate Tool report. The links below detail the validation results of the same run for each format.
The tool can represent a validation report in three different formats: a full, XML, or JSON format. The report style option is used to change the formatting. When this option is not specified on the command-line, the default is to generate a full report.
In a full report, the location, severity, and textual description of each detected anomaly is reported. A 'PASS', 'FAIL', or 'SKIP' keyword is displayed next to each file to indicate when a file has passed, failed, or skipped PDS validation, respectively.
In an XML report, the contents are the same as the full report.
In a JSON report, the contents are the same as the full report. Currently, the tool only supports validation runs of only a single data product label when generating this type of report.
Execution of the Validate Tool may result in the following message appearing in the log:
FAIL: file:/Users/.../hi0173794441_9080000_001_r.xml FATAL_ERROR line 1, 55: White spaces are required between publicId and systemId.
The message above is generated by the underlying Xerces library that is utilized by the Validate Tool for XML Schema validation. Although not very intuitive, the message normally indicates that the XML Schema for the default namespace of the target label is missing. In the example above the default namespace was "http://pds.nasa.gov/pds4/pds/v03" but the XML Schema file describing that namespace (PDS4_PDS_0300a.xsd) was not provided to the tool at runtime.