OperationThis document describes how to operate the Validate Tool. The following topics can be found in this document:
Note: The command-line examples in this section have been broken into multiple lines for readability. The commands should be reassembled into a single line prior to execution. Tool ExecutionThe Validate Tool can be executed in various ways. This section describes how to run the tool, as well as its behaviors and caveats. Command-Line OptionsThe following table describes the command-line options available:
Running the Validate ToolThis section demonstrates some of the ways that the tool can be executed using the command-line option flags:
Validating a Target File The following command demonstrates the validation of a single data product label against the core PDS schemas: % validate product.xml Validating a Target Directory The following command demonstrates the validation of a target directory against the core PDS schemas: % validate /home/pds/collection Validating Against User-Specified Schemas Specifying XML Schemas on the command line will allow the Validate Tool to validate against the user-specified schemas instead of the schemas packaged with the tool. The following command demonstrates the validation of a single product label against a user-specified schema: % validate product.xml -x product.xsd The following command demonstrates the validation of a set of target files against a set of user-specified schemas: % validate producta.xml, productb.xml -x producta.xsd, productb.xsd Validating Against User-Specified XML Catalogs The following command demonstrates the validation of a single data product against a user-specified XML Catalog: % validate product.xml -C catalog.xml Validating Against User-Specified Schematron Files Specifying Schematron files on the command-line will allow the Validate Tool to validate against the user-specified Schematron files instead of the Schematron files packaged with the tool. The following command demonstrates the vadation of a single data product against a user-specified Schematron: % validate product.xml -S product.sch Validating Against an Older Version of the PDS4 Data Model The following command demonstrates the validation of a single data product label against version 1000 of the PDS4 data model: % validate product.xml -m 1000 Validating Specific Files in a Target Directory The following command demonstrates the validation of any file that has a .xml extension in a target directory: % validate /home/pds/collection -e "*.xml" Note: File patterns should be surrounded in quotes to avoid having the system shell mistakingly interpreting them. In addition, pattern matching is case-insensitive in Windows, but case-sensitive for other systems. Ignoring Sub-Directories During Validation By default, the Validate Tool will recursively traverse a target directory during validation. The local flag option is used to tell the Validate Tool to not perform recursion. The following command demonstrates the validation of a target directory without directory recursion: % validate /home/pds/collection -L Changing Tool Behaviors With The Configuration File A configuration file can be passed into the command-line to change the default behaviors of the tool and to also provide users a way to perform validation with a single flag. For more details on how to setup the configuration file, see the Using a Configuration File section. The following command demonstrates performing validation using a configuration file: % validate -c config.txt Specifying TargetsTargets are validated in the order in which they are specified on the command-line. They can be specified implicitly and explicitly. To specify targets implicitly, it is best to specify them first on the command-line before any other command-line option flags. The following command demonstrates the validation of an implicitly defined, single target product label: % validate product.xml The following command demonstrates the validation of implicitly defined, multiple targets: % validate product.xml, /home/pds/collection Note: Implicit targets should not be specified after option flags that allow multiple arguments (see example below). Unexpected results can occur. % validate -x product.xsd product.xml In this example, the Validate Tool will inadvertently treat the implicit target, product.xml, as a schema file. Targets can be specified both implicitly and explicitly at the same time. Targets specified implicitly are validated first, followed by those that are specified explicitly with the target flag. The following command demonstrates the validation of multiple product labels, specified both implicitly and explicitly: % validate producta.xml, productb.xml -t productc.xml, /home/pds/collection In this example, producta.xml and productb.xml will get validated first, then productc.xml and the product labels in /home/pds/collection will get validated next. In each scenario above, the target product(s) were the equivalent of observational or document products. The data model also consists of bundle and collection products, which in turn reference other products. When the Validate Tool encounters one of these products, it traverses the inventory associated with that product and validates each product referenced as well as the target product. Using an XML CatalogAn XML Catalog allows the user to describe a mapping between external entity references in their products and locally-available XML Schema documents. This feature of the tool is not fully implemented and needs to be exercised with multiple PDS scenarios, but it is available to experiment with in this release. The following is an example XML Catalog file for validating the Data Provider's Handbook (DPH) example bundle. The file maps the PDS namespace to a local copy of the PDS4 Ops XML Schema document and the DPH namespace to a local copy of the DPH example dictionary XML Schema document. <?xml version="1.0" encoding="UTF-8"?> <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> <group xml:base="file:///${HOME}/"> <uri name="http://pds.nasa.gov/pds4/pds/v03" uri="VG2PLS/schemas/PDS4_OPS_0600h.xsd"/> <uri name="http://pds.nasa.gov/pds4/dph/v01" uri="VG2PLS/local_dictionaries/dph_example_dict_0300a.xsd"/> </group> </catalog> There is actually a third schema document Product_TableChar_tailored_0600h.xsd, that is required for validation that references the DPH dictionary schema document with an <xs:include> statement. This is where the issue occurs with the current implementation. This schema document must be passed on the command-line (instead of being specified in the XML Catalog) when executing the Validate Tool. The following command will validate DPH example bundle correctly: % validate -t ${HOME}/VG2PLS_archive -e "*.xml" -C catalog.xml \ -x ${HOME}/VG2PLS_archive/schemas/Product_TableChar_tailored_0600h.xsd Using a Configuration FileA configuration file is an alternative way to set the different behaviors of the tool instead of the command-line option flags. It consists of a text file made up of keyword/value pairs. The configuration file follows the syntax of the stream parsed by the Java Properties.load(java.io.InputStream) method. Some of the important syntax rules are as follows:
Since backslashes (\) have special meanings in a configuration file, keyword values that contain this character will not be interpreted properly by the Validate Tool even if it is surrounded by quotes. A common example would be a Windows path name (e.g. c:\pds\collection). Use the forward slash character instead (c:/pds/collection) or escape the backslash character (c:\\pds\\collection). Note: Any flag specified on the command-line takes precedence over any equivalent settings placed in the configuration file. The following table contains valid keywords that can be specified in the configuration file:
The following example demonstrates how to set a configuration file: # This is a Validate Tool configuration file validate.target = ./collection validate.report = report.txt validate.regexp = "*.xml" This is equivalent to running the tool with the following flags: -t ./collection -e "*.xml" -r report.txt The following example demonstrates how to set a configuration file with multiple values for a keyword: # This is a Validate Tool configuration file with multiple values validate.target = product.xml, ./collection validate.regexp = "*.xml", "Mars*" This is equivalent to running the tool with the following flags: -t product.xml, ./collection -e "*.xml", "Mars*" The following example demonstrates how to set a configuration file with multiple values that span across multiple lines: # This is a Validate configuration file with multiple values # that span across multiple lines validate.target = product.xml, ./collection validate.regexp = "*.xml", "Mars*" As previously mentioned, any flag options set on the command-line will overwrite settings set in the configuration file. The following example demonstrates how to override a setting in the configuration file. Suppose the configuration file named config.txt is defined as follows: validate.target = ./collection validate.regexp = "*.xml" This configuration allows the tool to validate files with a .xml extension in the collection directory. To change the behavior to validate all files instead of just files ending in .xml, then specify the regexp flag option on the command-line to overwrite the validate.regexp property: % validate -c config.txt -e "*" Passing in Multiple SchemasThe Validate Tool allows multiple XML Schemas to be passed in through the command-line via the -x flag option. When passing in multiple schemas, the definitions found in each file are merged internally. In the case where two schemas are passed in and both define the same element under the same namespace, then the definition in the first schema passed in will take precedence over the second schema. As an example, suppose a schema file, schema1.xsd, contains the following definition: <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://pds.nasa.gov/pds4/pds/v1" xmlns:pds="http://pds.nasa.gov/pds4/pds/v1" elementFormDefault="qualified" attributeFormDefault="unqualified" version="1.1.0.0"> ... <xs:complexType name="File_Area_Browse"> <xs:annotation> <xs:documentation> The File Area Browse class describes a file and one or more tagged_data_objects contained within the file. </xs:documentation> </xs:annotation> <xs:complexContent> <xs:extension base="pds:File_Area"> <xs:sequence> <xs:element name="File" type="pds:File" minOccurs="1" maxOccurs="1"> </xs:element> <xs:choice minOccurs="1" maxOccurs="unbounded"> <xs:element name="Array_1D" type="pds:Array_1D"> </xs:element> <xs:element name="Array_2D" type="pds:Array_2D"> </xs:element> <xs:element name="Array_2D_Image" type="pds:Array_2D_Image"> </xs:element> <xs:element name="Array_2D_Map" type="pds:Array_2D_Map"> </xs:element> <xs:element name="Array_2D_Spectrum" type="pds:Array_2D_Spectrum"> </xs:element> <xs:element name="Array_3D" type="pds:Array_3D"> </xs:element> <xs:element name="Array_3D_Image" type="pds:Array_3D_Image"> </xs:element> <xs:element name="Array_3D_Movie" type="pds:Array_3D_Movie"> </xs:element> <xs:element name="Array_3D_Spectrum" type="pds:Array_3D_Spectrum"> </xs:element> <xs:element name="Encoded_Header" type="pds:Encoded_Header"> </xs:element> <xs:element name="Encoded_Image" type="pds:Encoded_Image"> </xs:element> <xs:element name="Header" type="pds:Header"> </xs:element> <xs:element name="Stream_Text" type="pds:Stream_Text"> </xs:element> <xs:element name="Table_Binary" type="pds:Table_Binary"> </xs:element> <xs:element name="Table_Character" type="pds:Table_Character"> </xs:element> <xs:element name="Table_Delimited" type="pds:Table_Delimited"> </xs:element> </xs:choice> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> ... Suppose the other schema, schema2.xsd, contains the same element definition: <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://pds.nasa.gov/pds4/pds/v1" xmlns:pds="http://pds.nasa.gov/pds4/pds/v1" elementFormDefault="qualified" attributeFormDefault="unqualified" version="1.0.0.0"> ... <xs:complexType name="File_Area_Browse"> <xs:annotation> <xs:documentation> The File Area Browse class describes a file and one or more tagged_data_objects contained within the file. </xs:documentation> </xs:annotation> <xs:complexContent> <xs:extension base="pds:File_Area"> <xs:sequence> <xs:element name="File" type="pds:File" minOccurs="1" maxOccurs="1"> </xs:element> <xs:choice minOccurs="1" maxOccurs="unbounded"> <xs:element name="Array_2D" type="pds:Array_2D"> </xs:element> <xs:element name="Array_2D_Image" type="pds:Array_2D_Image"> </xs:element> <xs:element name="Array_2D_Map" type="pds:Array_2D_Map"> </xs:element> <xs:element name="Array_2D_Spectrum" type="pds:Array_2D_Spectrum"> </xs:element> <xs:element name="Array_3D" type="pds:Array_3D"> </xs:element> <xs:element name="Array_3D_Image" type="pds:Array_3D_Image"> </xs:element> <xs:element name="Array_3D_Movie" type="pds:Array_3D_Movie"> </xs:element> <xs:element name="Array_3D_Spectrum" type="pds:Array_3D_Spectrum"> </xs:element> <xs:element name="Encoded_Header" type="pds:Encoded_Header"> </xs:element> <xs:element name="Encoded_Image" type="pds:Encoded_Image"> </xs:element> <xs:element name="Header" type="pds:Header"> </xs:element> <xs:element name="Stream_Text" type="pds:Stream_Text"> </xs:element> <xs:element name="Table_Binary" type="pds:Table_Binary"> </xs:element> <xs:element name="Table_Character" type="pds:Table_Character"> </xs:element> <xs:element name="Table_Delimited" type="pds:Table_Delimited"> </xs:element> </xs:choice> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> ... If the schemas are passed into the Validate Tool as follows: % ./validate product.xml -x schema1.xsd, schema2.xsd then the File_Area_Browse definition from the schema1.xsd file takes precedence over the schema2.xsd file. If it was passed into the tool in the reverse order, then the File_Area_Browse definition in the schema2.xsd file will take precedence over the one in the schema1.xsd file. Report FormatThis section describes the contents of the Validate Tool report. The links below detail the validation results of the same run for each format. The tool can represent a validation report in three different formats: a full, XML, or JSON format. The report style option is used to change the formatting. When this option is not specified on the command-line, the default is to generate a full report. Full ReportIn a full report, the location, severity, and textual description of each detected anomaly is reported. A 'PASS', 'FAIL', or 'SKIP' keyword is displayed next to each file to indicate when a file has passed, failed, or skipped PDS validation, respectively. XML ReportIn an XML report, the contents are the same as the full report. JSON ReportIn a JSON report, the contents are the same as the full report. Currently, the tool only supports validation runs of only a single data product label when generating this type of report. Common ErrorsExecution of the Validate Tool may result in the following message appearing in the log: FAIL: file:/Users/.../hi0173794441_9080000_001_r.xml FATAL_ERROR line 1, 55: White spaces are required between publicId and systemId. The message above is generated by the underlying Xerces library that is utilized by the Validate Tool for XML Schema validation. Although not very intuitive, the message normally indicates that the XML Schema for the default namespace of the target label is missing. In the example above the default namespace was "http://pds.nasa.gov/pds4/pds/v03" but the XML Schema file describing that namespace (PDS4_PDS_0300a.xsd) was not provided to the tool at runtime.
|