Release Description 4.0.0
This release of the PDS4 System represents Build 4a and is intended as an operational release of the system components to date. The following sections can be found in this document:
Reference Documents
This section details the controlling and applicable documents referenced for this release. The controlling documents are as follows:
- PDS Level 1, 2 and 3 Requirements, March 26, 2010.
- PDS4 Project Plan, July 17, 2013.
- PDS4 System Architecture Specification, Version 1.3, September 1, 2013.
- PDS4 Operations Concept, Version 1.0, September 1, 2013.
- PDS General System Software Requirements Document (SRD), Version 1.1, September 1, 2013.
- PDS Harvest Tool Software Requirements and Design Document (SRD/SDD), Version 1.2, September 1, 2013.
- PDS Preparation Tools Software Requirements and Design Document (SRD/SDD), Version 0.3, September 1, 2013.
- PDS Registry Service Software Requirements and Design Document (SRD/SDD), Version 1.1, September 1, 2013.
- PDS Report Service Software Requirements and Design Document (SRD/SDD), Version 1.1, September 1, 2013.
- PDS Search Service Software Requirements and Design Document (SRD/SDD), Version 1.0, September 1, 2013.
- PDS Search Scenarios, Version 1.0, September 1, 2013.
- PDS Search Protocol, Version 1.1, September 1, 2013.
- PDS Security Service Software Requirements and Design Document (SRD/SDD), Version 1.1, September 1, 2013.
The applicable documents are as follows:
- PDS4 System Build 4a Test Document
Capabilities
This section details the new, modified and corrected capabilities that comprise this release. They are summarized here for a system-level view. A more detailed list of capabilities can be found in the change logs for each component.
New Capabilities
The following new capabilities have been added for this release:
Modified Capabilities
The following capabilities have been modified for this release:
- [PDS-84] - Search: Refactor a number of capabilities in the Search Service
- [PDS-103] - Search: Replace Solr default Request Handler
- [PDS-122] - Search: Update UI Deployment Documentation
- [PDS-142] - Search: Create new resource domain name flag
- [PDS-144] - Search: Update product-classes.txt to be list of files
- [PDS-154] - Search: Update search-core to retrieve Agency from Registry
- [PDS-157] - Search: Update search-service documentation per Todd King's comments
- [PDS-158] - Search: Update UI documentation per Todd King's comments
- [PDS-163] - Catalog: The -mingest option re-registers files if listed in multiple voldescs
- [PDS-168] - Search: Minor errors in search documentation
- [PDS-169] - Search: The initial-index script relies on env-vars
- [PDS-172] - Search: Improve documentation to be clear regarding cannot index documents if search service is not running
- [PDS-173] - Search: Run PDS search with Solr4 back-end
- [PDS-177] - Search: Create New Configuration For pdsbeta installation
- [PDS-178] - Search: Better Handle Null Pointer Exception
- [PDS-182] - Search: Update Search UI and Search Service to better handle archive_status
- [PDS-183] - Refactor Updating Solr Index
- [PDS-184] - Create An Association Cache for Common Queries
- [PDS-185] - Search: Update PSA Registry URL
- [PDS-191] - Search: Update Search UI to support PDS4 product search
- [PDS-192] - Search: Create Search Core configuration for PDS4 Observational Products
- [PDS-193] - Search: Update Search Service schema to support PDS4 Observational products
- [PDS-194] - Search: Default values for missing registry slots
- [PDS-197] - Search: Create documentation for product-search-ui
- [PDS-198] - Search: Remove References to Primary Result Summary class
- [PDS-214] - Data Set View: Needs better visibility for resource links
- [PDS-225] - Validate: For schematron messages treat role="warning" differently than default (role="error")
Corrected Capabilities
The following capabilities were corrected in this release:
- [PDS-159] - Search: Alternate_title causing error during index generation
- [PDS-174] - Registry: Random Database Connection Drops
- [PDS-175] - Registry: Replication Halts
- [PDS-179] - Search: Search for a phrase is not working properly
- [PDS-181] - Search: Default Solr SearchHandler needs to be changed to protocol
- [PDS-186] - Search: Search UI Data Set Start Time Bug
- [PDS-196] - Search: OutOfMemory Error When Running Search Core
- [PDS-213] - Validate: The tool is not finding document files correctly
- [PDS-217] - Search: Testing Bug In Search Service schema
- [PDS-218] - Search: Documentation typos
- [PDS-226] - Data Set View: Viewing target Asteroid 7822 results in an error
Liens
This section details the liens against the capabilities that have yet to be implemented or are partially implemented. They are summarized here for a system-level view. A more detailed list of liens can be found in the release notes for each component.
- Need to update PDS4 Tools to keep pace with changes occurring in the PDS4 data model for this release.
- Need to update the Registry User Interface to work properly with a secured instance of the Registry Service.
- Upgrade miscellaneous portal interfaces to remove the dependence on the PDS3 catalog database.
System Requirements
This section details the system requirements for installing and operating the software. Specific system requirements for each component in this release can be found in their respective Installation documents.
Java Runtime Environment
The custom software contained in this release was developed using Java and will run on any platform with a supported Java Runtime Environment (JRE). The software was specifically developed under Java version 1.6 and has only been tested with this version. The following commands test the local Java installation in a UNIX-based environment:
% which java /usr/bin/java % java -version java version "1.6.0_26" Java(TM) SE Runtime Environment (build 1.6.0_26-b03-384-10M3425) Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-384, mixed mode)
The first command above checks whether the java executable is in the environment's path and the second command reports the version. If Java is not installed or the version is not at least 1.6, Java will need to be downloaded and installed in the current environment. Consult the local system administrator for installation of this software. For the do-it-yourself crowd, the Java software can be downloaded from the Oracle Java Download page. The software package of choice is the Java Standard Edition (SE) 6, either the JDK or the JRE package. The JDK package is not necessary to run the software but could be useful if development and compilation of Java software will also occur in the current environment.
Java Application Server
The other main system requirement pertains to providing a Java application server for hosting certain components (Registry Service and Registry User Interface). The suggested application server for this release is Apache Tomcat with a minimal version of 6.0.20 through version 7.0.X. Avoid version 7.0.29, there is a bug in this version causing an error when loading the Registry Service. Version 8.0.X is now available from Apache but is not supported in this release. Consult the local system administrator for installation of this software. For the do-it-yourself crowd, the Apache Tomcat software can be downloaded from the Apache Tomcat page. Choose the version to download (6.0.X or 7.0.X) from the menu on the left. Details on downloading, installing and configuring an Apache Tomcat server can be found in the Tomcat Deployment document.
Web Server
Although not previously listed in the RDD as a system requirement, the installation of a web server to act as a front-end interface to the Java application server is desired. The suggested web server for this release is the Apache HTTP Server. Consult the local system administrator for installation of this software.
Database Server
Although not previously listed in the RDD as a system requirement, the installation of a database server is suggested to support the deployment of the Registry Service. The Registry Service comes prepackaged with the Apache Derby database but isn't necessarily the preferred choice for an operational deployment. Other options supported by the Registry Service include MySQL and PostgreSQL. If another database server is preferred, contact the EN so we can help prepare a database-specific configuration for your environment.
Installation/Operation
This section details the installation and operation of the software in this release. This release is intended for deployment in the Engineering Node operations environment with certain components deployed in the environments of participating Nodes. Details regarding the release for each of these components, including installation of the packages and operation of the associated software, can be found in the documentation for each component. This release is comprised of the following components:
- Ingest
- Catalog Tool - 1.6.1 (EN Only)
- Harvest Tool - 1.5.0
- Harvest-PDAP Tool - 0.2.0 (EN Only)
- Portal
- Data Set View - 2.2.0 (EN Only)
- Filter - 0.1.0 (DN Optional)
- Preparation Tools
- Core Library - 1.4.1
- Design Tool - 0.4.0
- Generate Tool - 0.6.0
- PDS4 Tools - 0.2.0
- Transform Tool - 0.2.1
- Validate Tool - 1.4.1
- Registry
- Registry Core - 1.5.0
- Registry Service - 1.5.0
- Registry User Interface - 1.5.0
- Report
- Update Tool - 1.4.0 (EN Only)
- Profile Manager - 1.4.0 (EN Only)
- Sawmill - 1.4.0 (EN Only)
- Search
- Search Core - 1.3.0
- Search Service - 1.3.0
- Search User Interface - 1.3.0 (EN Only)
- Product Search User Interface - 1.3.0 (DN Optional)
- Security Service - 1.2.0 (EN Only)
- Storage
- Storage Service - 0.5.0 (EN Only)
- Product Service - 0.5.0 (EN Only)
- Transport
- Transport Proxy - 1.1.0 (EN Only)
- Transport Service (OFSN) - 0.2.0
- Transport Service (Registry) - 0.1.0
Not all of the above components are required for a given installation of the software for this release. The following sections detail typical deployment scenarios for the Engineering Node and the Discipline Nodes, respectively.
Engineering Node Environment
This section provides an overview of the deployment to the Engineering Node operations environment. The following diagram depicts the EN system flow, starting with central catalog migration, data product registration and finishing up with search index generation:
Click the image for a larger version.
The components depicted above serve as the operational deployment of this release for the Engineering Node. The main purpose of this deployment is to support catalog-level search for the PDS.
- Catalog Tool
The Catalog Tool provides the means for registering PDS3 catalog files with the Registry Service. This tool provides functionality for comparing, validating and ingesting a catalog submission.
- Harvest Tool
The Harvest Tool provides the means for registering products with the Registry Service. This tool reads a PDS4 product label, extracts specified metadata and registers the product with the registry. This tool allows the user to register a batch of products and generates a summary report.
- Registry Service
The Registry Service provides functionality for tracking, auditing, locating and maintaining artifacts within the system. Two instances of the service have been deployed, one containing context products migrated from the PDS3 central catalog database and one containing the initial set of PDS4 context products. The end points for these services are:
- Search Service
The Search Service provides functionality for accepting queries from data consumers for registered products and includes functionality for retrieving search results. This component acts as the interface to the Registry Service for the data consumer. The end point for this service is http://pds.nasa.gov/services/search/.
- Search User Interface
The Search User Interface software serves as the user interface for the Search Service on the PDS website. The end point for this application is http://pds.nasa.gov/tools/data-search/.
- Report Service
The Report Service provides the capability for capturing and reporting metrics. With this release, this service is undergoing testing of configurations and features with a set of metrics from the Engineering Node and the Imaging Node at JPL. This service is not yet available to the user community. The end point for this service is http://pds.nasa.gov/services/report/.
Discipline Node Environment
This section details the deployment to a Discipline Node environment. The Nodes are asked to download, install and exercise the software that makes up this release. The following diagram details an example deployment of selected components:
Click the image for a larger version.
The components depicted above represent a typical deployment of the PDS4 software on a single machine. The following diagram depicts a normalized system flow, starting with schema design, continuing onto data product label generation and validation and finishing up with data product registration and search index generation:
Click the image for a larger version.
This diagram identifies where the system components come into play within the system flow. These components and the ones depicted in the deployment diagram, are described in more detail with respect to how the components can be utilized in operations:
- Design Tool
The Design Tool is intended to aid users in the development of their data product label schemas. This release identifies two off-the-shelf products (Oxygen and Eclipse) that are available for this purpose. The provided documentation guides the user with respect to the download, installation and operation of each of these products.
- Generate Tool
The Generate Tool is intended to aid users in the generation of their data product labels. This tool was developed by the Imaging Node at JPL and provides a command-line interface for generating PDS4 Labels from either a PDS3 Label or a PDS-specific DOM object.
- Validate Tool
The Validate Tool comes pre-packaged with the latest version of the core XML Schema and Schematron files generated from the data model. The tool allows the user to validate collections of products or single products against the associated core schema. Discipline and mission schemas can be passed into the tool to enable a more detailed validation check. Although the Design Tools listed above provide a file-by-file validation capability, this tool allows the user to validate a batch of products and generates a summary report.
- Transform Tool / PDS4 Tools
The Transform Tool and the PDS4 Tools library comprise the initial release of the product transformation capability. The Transform Tool provides a command-line interface to a subset of the functions offered by the PDS4 Tools library. In addition, the PDS4 Tools library provides functionality for accessing PDS4 data objects. The functionality of both of these components will be expanded in future releases.
- Harvest Tool
The Harvest Tool provides the means for registering products with the Registry Service. This tool reads a PDS4 product label, extracts specified metadata and registers the product with the registry. This tool allows the user to register a batch of products and generates a summary report. This tool also supports registration of PDS3 data sets for the purpose of tracking and reporting.
- Registry
- Registry Service
The Registry Service provides functionality for tracking, auditing, locating and maintaining artifacts within the system. Ultimately, all products (including PDS3 products) will be registered and tracked with the Registry Service.
- Registry User Interface
The Registry User Interface provides a simple viewing capability of the contents of the associated Registry Service. It is intended to offer visual verification of successful registration of products as well as the ability to update the status for registered products.
- Search
- Search Core
The Search Core component provides functionality for generating the search index utilized by the Search Service for retrieving search results.
- Search Service
The Search Service provides functionality for accepting queries from data consumers for registered products and includes functionality for retrieving search results. This component acts as the interface to the Registry Service for the data consumer.
- Product Search User Interface
The Product Search User Interface software serves as the user interface for the Search Service for querying and displaying search results.
- Transport
- Transport Service (OFSN)
The Online File Specification Name (OFSN) Transport Service provides functionality for transporting and transforming PDS products. This service offers a similar interface to the PDS-D Product Server and allows a user to interface with an archive directory structure for discovering and retrieving product files.
- Transport Service (Registry)
The Registry Transport Service provides functionality for transporting and transforming PDS products. This service interfaces with the Registry Service to discover and retrieve products files given a logical identifier.
Although the Design, Generate and Transform Tools are included with this release, we have not included them in the installation procedure because they may not pertain to every Discipline Node. This procedure will focus on installing the Registry components, Harvest Tool and Validate Tool. The example commands below assume the software is installed in the user's home directory, indicated by the $HOME environment variable. If this is not the case or if this variable is not defined, the absolute path should be used instead of the variable. Some of the commands below have been broken into multiple lines for readability. The commands should be reassembled into a single line prior to execution. Perform the following steps to download, install, configure and test the software in the Node's local environment:
- Verify System Requirements
As specified in the System Requirements section above, the software requires Java, a Java Application Server (Apache Tomcat), a Web Server (Apache HTTP) and an optional database server to be installed and accessible. These installations should be verified before proceeding to the next step.
With respect to the Web Server installation, this server may reside on another machine and in most cases that would be preferable. The preferred configuration is to install the PDS4 software components on a machine that resides behind a firewall at your institution. This offers an initial level of security for the components. Configuring the Web Server for reverse proxies to the query interfaces of the components provides an additional level of security by restricting access to the interfaces that allow updates to the data.
- Install Registry Components
The Registry components consist of the Registry Service and Registry User Interface. The Registry Client and Registry Core, which are also Registry components, are not required to be installed at the Discipline Node.
- Registry Service
The Installation document for the Registry Service provides the details for downloading, installing and configuring the software. By default, the Registry Service comes packaged with and configured to utilize Apache Derby as the backend database for storing registered content. This database is expected to be located in the directory where the Tomcat server is launched, by default. The Installation document describes how to change this location as well as how to use MySQL or PostgreSQL as the backend database instead of Derby. The installation at a DN calls for two instances of the Registry Service, one named registry-pds3 and one named registry-pds4. If using MySQL as the backend database server, the corresponding databases should be named registrypds3 and registrypds4, respectively.
If there was a previous installation of the Registry Service, the application should be un-deployed from the Tomcat server followed by removal of the Derby database directory named registry, and the derby.log file prior to installing the new version. Before deploying each instance of the Registry Service, be sure to rename the WAR file to registry-pds3.war and then to registry-pds4.war, so that the end point of the service in the Tomcat server is as expected for each instance. For each instance of the service, follow the instructions in the Installation document for configuring access to the local database, making sure to update the database name accordingly. The same goes for the home configuration, and updating the end point for the specific instance of the service. For each instance of the service, the object type configuration needs to be performed. Perform the following commands instead of the commands listed in the Installation document (the service must be up and running and pdsX should be pds3 and pds4, respectively):
% cd $HOME/registry-service-1.5.0/bin % export REGISTRY_SERVICE=http://localhost:8080/registry-pdsX % ./registry-config
The Secure Configuration is skipped for the DN deployment. The reverse proxy configuration that follows provides security for the Registry Service instances. The PDS3 and PDS4 instances of the service are accessed at /services/registry-pds3 and /services/registry-pds4, respectively, on the web server which is facilitated by the following reverse proxy configuration in the Apache HTTP Server httpd.conf file:
ProxyPass /services/registry-pds3/ http://localhost:8080/registry-pds3/ ProxyPassReverse /services/registry-pds3/ http://localhost:8080/registry-pds3/ ProxyPass /services/registry-pds3 http://localhost:8080/registry-pds3 ProxyPassReverse /services/registry-pds3 http://localhost:8080/registry-pds3 <Location /services/registry-pds3> <Limit PUT POST DELETE> Order Deny,Allow Deny from All Allow from 127.0.0.1 </Limit> </Location> ProxyPass /services/registry-pds4/ http://localhost:8080/registry-pds4/ ProxyPassReverse /services/registry-pds4/ http://localhost:8080/registry-pds4/ ProxyPass /services/registry-pds4 http://localhost:8080/registry-pds4 ProxyPassReverse /services/registry-pds4 http://localhost:8080/registry-pds4 <Location /services/registry-pds4> <Limit PUT POST DELETE> Order Deny,Allow Deny from All Allow from 127.0.0.1 </Limit> </Location>
The above configuration will likely need to be performed by the System Administrator. This configuration assumes the Apache HTTP Server is on the same machine as the Apache Tomcat Server. If that is not the case, the localhost designation above should be replaced with the DNS name of the machine where the Apache Tomcat Server is installed. This configuration allows GET requests through the public interface but restricts PUT, POST and DELETE requests to processes running on the local machine.
- Registry User Interface
The Installation document for the Registry User Interface provides the details for downloading, installing and configuring the software. If there was a previous installation of the Registry User Interface, the application should first be un-deployed from the application server before deploying the updated application. In a future release of the application, support will be added for accessing multiple Registry Services. For now, a separate instance of the application must be installed for each Registry Service instance. Before deploying each instance of the Registry User Interface, be sure to rename the WAR file to registry-ui-pds3.war and then to registry-ui-pds4.war, so that the end point of the service in the Tomcat server is as expected for each instance. In addition, be sure to follow the instructions in the General Configuration section of the Installation document for specifying the correct end point for the target Registry Service for each Registry UI instance. The Secure Configuration may be skipped for the DN deployment.
- Install Harvest Tool
The Installation document for the Harvest Tool provides the details for downloading and installing the software. In order to support multiple instances of the Registry Service, it is suggested that the user make a copy of the harvest script provided with the software for each Registry Service instance.
% cd $HOME/harvest-1.5.0/bin % cp harvest harvest-pds3 % cp harvest harvest-pds4
Edit the newly created harvest-pds3 and harvest-pds4 scripts and change the http://localhost:8080/registry specification to http://localhost:8080/registry-pds3 and http://localhost:8080/registry-pds4, respectively.
- Install Validate Tool
The Installation document for the Validate Tool provides the details for downloading and installing the software.
Now that the core software is installed, it can be exercised with a local bundle prepared by the Node or with the Example Bundle available from the EN. The following steps detail exercising of the software with the example bundle:
- Download Example Bundle
The bundle can be downloaded from the PDS4 Repository. Unzipping the package results in a directory named dph_example_archive_VG2PLS.
- Validate Example Bundle
The next step exercises the Validate Tool by validating the example bundle. The Operation document for the Validate Tool provides the details for executing the software. Execute the Validate Tool against the example bundle with the following commands:
% cd $HOME/validate-1.4.1/bin % ./validate -t $HOME/dph_example_archive_VG2PLS -e "*.xml"
The above run results in one validation failure due to not specifying the local data dictionary schema for the example bundle. The schemas can be specified on the command-line as follows:
% ./validate -t $HOME/dph_example_archive_VG2PLS -e "*.xml" \ -x $HOME/dph_example_archive_VG2PLS/xml_schema/PDS4_PDS_1101.xsd \ $HOME/dph_example_archive_VG2PLS/xml_schema/dph_example_dict_1101.xsd
The above run results in a successful validation. Note that the core schema (PDS4_PDS_1101.xsd) was also passed on the command-line because passing any schema on the command-line overrides the schemas provided with the tool.
- Register Example Bundle
The next step exercises the Harvest Tool and Registry Service by registering the contents of the example bundle. The Operation document for the Harvest Tool provides the details for executing the software. A configuration file (harvest-policy-example.xml) for registering the example bundle can be found in the examples directory of the Harvest Tool distribution package. Depending on the location of the example bundle in the local environment, the file and path specifications in the configuration file will likely need to be updated for the example bundle location in the local environment. Execute the Harvest Tool against the example bundle with the following commands:
% cd $HOME/harvest-1.5.0/bin % ./harvest-pds4 -c ../examples/harvest-policy-example.xml
The above run will result in one error due to a duplicate logical identifier and a few warnings pertaining to references for products that were not found in the local registry. The rest of the products should get registered successfully. The summary should look something like the following:
... Summary: 14 of 14 file(s) processed, 4 other file(s) skipped 1 error(s), 5 warning(s) 13 of 14 products registered. 82 of 82 ancillary products registered. Product Types Registered: 2 Product_Document 1 Product_Browse 1 Product_Observational 1 Product_XML_Schema 2 Product_File_Text 1 Product_Bundle 82 Product_File_Repository 5 Product_Collection 93 of 93 associations registered. End of Log
- View Registry Contents
The next step verifies that the registered content from the previous step can be viewed via the Registry Service REST-based API and the Registry User Interface. The Operation document for the Registry Service provides the details for interfacing with the service via the REST-based API. Assuming the registration above was performed against the PDS4 instance of the service, the Registry Service should be accessible via the following URL: http://localhost:8080/registry-pds4/. Accessing the report end point (append report to the URL) from the desired web browser (some browsers display XML better than others), should produce the following output:
<ns2:report xmlns:ns2="http://registry.pds.nasa.gov" registryVersion="1.5.0" packages="5" classificationNodes="66" classificationSchemes="2" services="0" extrinsics="95" associations="349" serverStarted="2014-02-25T10:52:46.513-08:00" status="OK"/>
The above report shows 95 extrinsics (PDS products) registered which corresponds with the Harvest Tool summary. The number of associations registered is a little harder to verify because there were already some associations in the registry from the configuration step. Feel free to try out some of the other end points (e.g. extrinsics) detailed in the Query Artifacts section of the Operations document.
Assuming the registration above was performed against the PDS4 instance of the service, the Registry User Interface should be accessible via the following URL: http://localhost:8080/registry-ui-pds4/. Accessing this URL from the desired web browser, should produce the following display:
Click the image for a larger version.If an "RPC Failure" message appears near the bottom of the screen, this is most likely the result of the Registry Service URL not being configured correctly. The Installation document for the Registry User Interface specifies how to modify the appropriate configuration file to correct this situation.
The above steps exercise the core components of the system but are not exhaustive by any measure with respect to testing the software, but they will verify that the software can be installed and executed in the various Node environments. Some Nodes are actively working on generating PDS4 bundles and collections. If a Node would like to attempt to register those products with their local registry the EN would be glad to assist in this effort. In addition, the Harvest Tool supports registering a PDS3 Data Set with the Registry Service. An example configuration file (harvest-policy-pds3.xml) for registering PDS3 Data Sets can be found in the examples directory of the Harvest Tool distribution package. When registering a PDS3 Data Set, the Harvest Tool creates an in-memory proxy label for each product and registers it with the Registry Service. Although not sufficient for PDS4 migration, it does provide a tracking mechanism for PDS3 Data Sets until they can be migrated to PDS4.
In addition to the exercises above, the Nodes are encouraged to exercise the Generate Tool, Transform Tool and PDS4 Tools components. See the corresponding Installation and Operation documents for details. For advanced installations, the Search and Transport components may also be installed (details to come later).
The final step of this exercise is to delete the example bundle from the Registry Service. This can be accomplished via the Registry User Interface at the following URL: http://localhost:8080/registry-ui-pds4/. In the Packages tab just select the entry for the Example package and then select the Delete button. This action may take a few seconds to complete. Now the Registry Service is ready for your Bundles.