The following notes were taken by Elizabeth Rye. Corrections are welcome.


Planetary Data System Standards Teleconference
8 March 2006 9-11 AM

Participants:
Rafael Alanis (IMG)
Keith Bennett (GEO)
Mike Cayanan (EN)
Patty Garcia (IMG)
Mitch Gordon (RINGS)
Lyle Huber (ATM)
Steve Hughes (EN)
Steve Joy (PPI)
Ron Joyner (EN)
Todd King (PPI)
Joe Mafi (PPI)
Anne Raugh (SBN)
Elizabeth Rye (EN)
Dan Scholes (GEO)
Boris Semenov (NAIF)
Dick Simpson (RS)
Susie Slavney (GEO)
Tom Stein (GEO)
Dave Tarico (SBN)

SCR 3-1034 MD5 Checksums (File) version 3

Email comments on SCR exchanged prior to telecon (all times PST):

Todd King, March 6, 2006, 4:03pm
Elizabeth Rye, March 6, 2006, 4:36pm
Dick Simpson, March 6, 2006, 6:12pm and attachment
Bill Harris, March 6, 2006, 7:08pm
Anne Raugh, March 7, 2006, 3:51am
Mitch Gordon, March 7, 2006, 6:46am
Anne Raugh, March 7, 2006, 7:05am
Todd King, March 7, 2006, 10:19am
Susie Slavney, March 7, 2006, 10:41am

Prior to eliciting comments on the SCR, E. Rye provided updates on two items:

  1. It has been confirmed that neither our tools nor our computer platforms can handle a 128-bit integer, thus requiring that the actual comparison of two MD5 checksum values be done as a string comparison. (This according to M. Cayanan.)
  2. It is theoretically possible to have multiple distinct files produce the same MD5 checksum value. (This will be true for any checksum, given that the number of possible inputs is infinite and the number of possible outputs is finite.) The SHA-1 checksum algorithm would provide a reduced risk of this, but at a significant cost in processing time. Practically speaking, the MD5 checksum should be completely adequate for our purposes. (This according to M. McAuley.)

Each of the nodes then provided input on the SCR. In general, most nodes approved of S. Slavney's proposal to break down the issue into a set of questions to be answered, although some disagreed with Geo's answers to those questions.

Each question that was discussed and the node votes on that question are recorded in the table below.

Question ATM GEO IMG NAIF PPI RINGS RS SBN EN
Should checksums be required in some form? yes yes yes yes yes yes yes, at some time (under PDS4?) yes yes
Should checksums be required as part of the archive? no no no either way okay yes yes yes no yes
If checksums are included in an archive, should the checksum file have a required filename? don't care no don't care yes yes yes yes (type of name, allowing for multiple files) yes yes
Should the file have a required format? suggested format no no no no no require two columns in specific order with optional third column for checksum type don't care same as RS
Should there be a specific location? If so, where? yes; INDEX yes; INDEX yes; INDEX yes; INDEX yes; INDEX yes; INDEX no preference yes; INDEX or DOCUMENT yes; INDEX
Should the order of the columns in the file be fixed? don't care no split no/yes don't care no no yes no yes
Should the columns have standardized names? yes absent yes yes yes yes yes don't care yes
Should the type of checksum value used be described as a new standard value of DATA_TYPE (DT), as a new keyword CHECKSUM_TYPE (C), or in the DESCRIPTION (DE) field of the COLUMN object description? C C C C C C C C C
Should the CHECKSUM_TYPE keyword have a standard value type of DYNAMIC? (no major dissent)
The the initial list of standard values for CHECKSUM_TYPE be limited to a few or expansive? limited check all 30 types available and list those that are useful all reasonable limited limited MD5, SHA-1 absent limited limited

After the resolution of these questions, it was clarified that checksums were to be calculated for every file in an archive, with the exception of the checksum file itself and its label.

The determination of what criteria should be applied to determine the acceptability of a particular checksum algorithm was deemed to be a policy issue not appropriate for the current discussion.