Draft comments on Dublin Core 1.1

greenspun.com : LUSENET : Federal Information and Records Managers Council : One Thread

A set of metadata elements called "Dublin Core" has been promoted for applications involving interoperable search and retrieval on the Internet, and the DC "elements" have also been used in a variety of applications of GILS and Z39.50. During the month of July, the DC community leadership is seeking comments on a proposed revision for the semantics of the 15 DC "elements". This revision, now called DC 1.1, would bring the DC elements into alignment with international standard ISO 11179 - Specification and Standardization of Data Elements

As convenor of the ISO group responsible for ISO 11179 (ISO/IEC SC 32 / WG 2 Metadata), I am compiling a response to DC 1.1 from the perspective of ISO 11179 experts. I strongly encourage others involved in GILS and Z39.50 to also provide comments, either directly to the DC community or through me. Please do share this message with other persons and discussion lists that may have an interest in DC, GILS, Z39.50, etc.

Following is the current draft compilation of comments on DC 1.1 from the ISO 11179 perspective. DC 1.1 references are to the document retrieved July 5, 1999 from .

-------------------------------------------------------------

(1) References to the standard "ISO 11179" should designate "ISO/IEC 11179".

(2) The stated intention in DC 1.1 is to distinguish between concepts and representations. To be used effectively for its purpose of cross-domain searching, and to allow inheritance, it is crucial for DC to have data element concepts. Yet, it is also important to have one or more data element representations for each DC element. The recommended approach for DC 1.1 is to express a data element concept for each DC element, and to annotate the representations. Each of the DC 1.1 entries should have a single Concept Definition and one or more "Representation Classes" (e.g., name, text, code, date, ...).

(3) The definition given for each DC 1.1 entry should be labeled as "Concept Definition", and each Concept Definition should express a single data element concept. Most of the DC elements are currently compounds of multiple data elements, either because they incorporate attributes (e.g., scheme, format, etc.) or because they permit qualification to form sub-elements not fully contained within the parent element. It is necessary that there be a concept definition within DC 1.1 for each of the "unadorned" data element concepts, exclusive of any attributes or qualified sub-elements. (In XML parlance, DC 1.1 elements should refer to the contents of the element container but not to any attributes of the element.)

(4) Once each DC 1.1 element includes a Concept Definition, it will not be necessary to have a separate data element definition for each representation, because a data element definition can be readily derived by pre-pending "The [representation class] of" to the concept definition. For example, the concept "Creator" is defined as: "An entity primarily responsible for making the content of the resource". Given that the Representation Class for "Creator" is "name", the data element definition would be: "The name of an entity primarily responsible for making the content of the resource". This rule for deriving a data element definition from the concept definition derivation rule should be prominently accessible to users of the DC.

(5) Each of the 15 DC 1.1 entries should have a separate attribute for its "Representation Class" (equivalent to "Form of Representation" discussed in ISO/IEC 11179-3, clause 6.4.2).

(6) Each of the 15 DC 1.1 entries should include a separate attribute for "Permissible Data Element Values" (specified by ISO/IEC 11179-3, clause 6.4.7). The "Comment" included with some of the DC elements contains guidance on permissible data element values (e.g., "controlled vocabulary", "classification scheme", "Dublin Core Types", "MIME", "ISBN", etc.) This guidance should be moved to "Permissible Data Element Values" (specific cases are described below).

(7) It is stated in DC 1.1 that the elements describe "resources". The term "resource" is confusing since it is very broad and is commonly applied to real-world entities such as "human resources" and "water resources". While there are ontological systems targeted primarily at real-world entities, the "universe of discourse" for Dublin Core is focused on the information aspects of resources. Accordingly, the root concept should be "information resource".

(8) The DC 1.1 definitions should be consistent in using the indefinite article ("a" or "an") versus the definite article ("the"). For concept definitions, the indefinite article should be used, or the article can be dropped when possible.

(9) Eight of the DC 1.1 elements (Title, Publisher, Date, Format, Resource Identifier, Source, Relation, and Rights) refer to the "resource" irrespective of the "content", while the other DC elements specifically refer to the "content of the resource". There ought to be some description of the significance of the distinction being made. The approach used in the Basic Semantic Registry is to regard the content of the information resource as the root concept. There is a secondary concept, "information resource product", that defines an expression of the content in specific products (including works, services, artifacts, type specimens, etc.).

(10) DC 1.1 elements Creator, Contributor, and Publisher use the term "entity" in their definitions. The dictionary defines entity as a thing that has existence (not a very distinctive definition and quite similar to the definition of "resource"). To clearly evoke the sense of entities such as persons and corporations that can act, the term "entity" should be replaced with "agent" or "party".

(11) DC 1.1 element "Title", Definition: Because "name" refers to a Representation Class, the current DC 1.1 definition sounds like a data element definition rather than a concept definition. A recommended Concept Definition independent of Representation Class would be "Designation of the information resource".

(12) DC 1.1 element "Title", Representation Class: The recommended Representation Class is "name". (Per ISO 1087-1, a "name" is "a designation of an individual concept by a linguistic expression".)

(13) DC 1.1 element "Creator", Definition: The term "entity" is not distinctive and should be replaced.

(14) DC 1.1 element "Creator", Representation Class: The recommended Representation Class is "name".

(15) DC 1.1 element "Subject and Keywords", Definition: In common use, "keywords" are often various terms used for retrieval, but they are not necessarily descriptive of the "topic". For example, "Cleveland" can be a retrieval keyword for a Web page on a restaurant in Cleveland even though the restaurant is not "about" Cleveland. If this concept is not to be constrained to subject terms only, the word "characteristic" is recommended rather than "topic".

(16) DC 1.1 element "Subject and Keywords", Comment: The Comment should be clear that the element is exclusive of any designation for controlled vocabularies or formal classification schemes.

(17) DC 1.1 element "Subject and Keywords", Representation Class: Because the concept defines a container having multiple keywords or phrases, the Representation Class would be "group". To be usefully applied, there is an implication that there exists a separate concept defined to only include one keyword or phrase, which concept would be inherited into this group definition. (Otherwise, the group components are merely undifferentiated character strings with no way to distinguish words from phrases, for example.)

(18) DC 1.1 element "Subject and Keywords", Permissible Data Values: Move from the Comment to Permissible Data Values the guidance given as "Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme."

(19) DC 1.1 element "Description", Definition: The word "account" in the definition has different connotations than "description". A recommended definition is: "A description of the content of the information resource."

(20) DC 1.1 element "Description", Representation Class: The recommended Representation Class is "text".

(21) DC 1.1 element "Contributor", Definition: The term "entity" is not distinctive and should be replaced.

(22) DC 1.1 element "Publisher", Definition: The phrase "making the resource available" in the definition is not commonly understood as equivalent to "publishing". If these are meant to be equivalent, the definition should say "responsible for publishing the information resource".

(23) DC 1.1 element "Publisher", Representation Class: The recommended Representation Class is "name".

(24) DC 1.1 element "Contributor", Definition: The term "entity" is not distinctive and should be replaced.

(25) DC 1.1 element "Contributor", Definition: The phrase "responsible for" in the definition is unnecessary and could cause confusion.

(26) DC 1.1 element "Contributor", Representation Class: The recommended Representation Class is "name".

(27) DC 1.1 element "Date", Definition: The term "date" is used in ISO/IEC 11179 as a Representation Class, so should not be in the Concept Definition. However, it is necessary to convey the sense of a time point of non-specific dimension. The recommended Concept Definition is: "A point in time of an event relevant to the information resource".

(28) DC 1.1 element "Date", Comment: The Comment should make clear if the concept allows both structured and unstructured representations and perhaps give examples of how those might be distinguished using attributes or other mechanisms.

(29) DC 1.1 element "Date", Comment: ISO 8601 does not support "profiles". While it is possible to constrain representations to a subset of ISO 8601, processors dealing with ISO 8601 would be expected to handle any validly formatted ISO 8601 date/time. There is an ANSI standard, X3.30, in which only YYYYMMDD is valid.

(30) DC 1.1 element "Date", Representation Class: The recommended Representation Class is "date", following the guidance given in ISO/IEC TR 15452, Specification of data value domains, associated with ISO/IEC 11179. If it is desired that the concept also be applied to unstructured text for dates (e.g. "Renaissance"), a second Representation Class would be "name".

(31) DC 1.1 element "Date", Permissible Data Values: Move from the Comment to Permissible Data Values the guidance given as "Recommended best practice for encoding the date value is defined in a profile of ISO 8601 [W3CDTF] and follows the YYYY-MM-DD format."

(32) DC 1.1 element "Resource Type", Comment: The Comment should be clear that the element is exclusive of the designation for a controlled vocabulary.

(33) DC 1.1 element "Resource Type", Representation Class: The recommended Representation Class is "name".

(34) DC 1.1 element "Resource Type", Permissible Data Values: Move from the Comment to Permissible Data Values the guidance given as "Recommended best practice is to select a value from a controlled vocabulary (for example, the working draft list of Dublin Core Types [DCT1])."

(35) DC 1.1 element "Format", Comment: The Comment should be clear that the element is exclusive of the designation for a controlled vocabulary.

(36) DC 1.1 element "Format", Representation Class: The recommended Representation Class is "name".

(37) DC 1.1 element "Format", Permissible Data Values: Move from the Comment to Permissible Data Values the guidance given as "Recommended best practice is to select a value from a controlled vocabulary (for example, the list of Internet Media Types [MIME] defining computer media formats)."

(38) DC 1.1 element "Resource Identifier", Comment: The Comment should be clear that the element is exclusive of the designation for a formal identification system.

(39) DC 1.1 element "Resource Identifier", Comment: ISO/IEC 11179-6 may also be cited in the Comment as a source for examples of formal identification systems.

(40) DC 1.1 element "Resource Identifier", Representation Class: The recommended Representation Class is "name". /ISO

(41) DC 1.1 element "Resource Identifier", Permissible Data Values: Move from the Comment to Permissible Data Values the guidance given as "Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Example formal identification systems include the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL)), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN)."

(42) DC 1.1 element "Source", Comment: The Comment should be clear that the element is exclusive of the designation for a formal identification system.

(43) DC 1.1 element "Source", Comment: In the Comment, add the phrase "including but not limited to a Resource Identifier".

(44) DC 1.1 element "Source", Representation Class: The recommended Representation Class is "name".

(45) DC 1.1 element "Source", Permissible Data Values: Move from the Comment to Permissible Data Values the guidance given as "Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system."

(46) DC 1.1 element "Language", Definition: This is the only use of the term "intellectual" as a modifier for content. If the adjective is necessary, the distinction should be defined since it is not commonly understood.

(47) DC 1.1 element "Language", Representation Class: The recommended Representation Class is "code".

(48) DC 1.1 element "Language", Permissible Data Values: Move from the Comment to Permissible Data Values the guidance given as "Recommended best practice for the values of the Language element is defined by RFC 1766 [RFC1766] which includes a two-letter Language Code (taken from the ISO 639 standard [ISO639]), followed optionally, by a two-letter Country Code (taken from the ISO 3166 standard [ISO3166]). For example, "en" for English, "fr" for French, or "en-uk" for English used in the United Kingdom."

(49) DC 1.1 element "Relation", Comment: The Comment should be clear that the element is exclusive of the designation for a formal identification system.

(50) DC 1.1 element "Relation", Comment: In the Comment, add the phrase "including but not limited to a Resource Identifier". /REMARK

(51) DC 1.1 element "Relation", Representation Class: The recommended Representation Class is "name".

(52) DC 1.1 element "Relation", Permissible Data Values: Move from the Comment to Permissible Data Values the guidance given as "Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system."

(53) DC 1.1 element "Coverage", Comment: The Comment should be clear that the element is exclusive of the designation for a controlled vocabulary.

(54) DC 1.1 element "Coverage", Definition: "Extent" and "scope" are very similar in meaning and neither evoke a sense of temporal extent. A recommended definition is: "An extent or duration of the content of the information resource".

(55) DC 1.1 element "Coverage", Representation Class: The recommended Representation Class is "text".

(56) DC 1.1 element "Coverage", Permissible Data Values: Move from the Comment to Permissible Data Values the guidance given as "Coverage will typically include spatial location (a place name or geographic co-ordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity). Recommended best practice is to select a value from a controlled vocabulary (for example, the Thesaurus of Geographic Names [TGN])."

(57) DC 1.1 element "Rights Management", Name and Identifier: The term "management" does not make sense in "Rights Management", especially as the definition refers to "information" and the comment refers to a "statement". The Name and Identifier should be "Constraints".

(58) DC 1.1 element "Rights Management", Definition: The meaning of the phrase "held in and over" is not clear to a reader, though it does have the ring of legal jargon.

(59) DC 1.1 element "Rights Management", Definition: It seems as though the definition is backwards--it is certainly not meant to be a full statement of the rights of the legal entities. Rather, one would expect to find statements by owners and their agents granting or constraining the rights of users to either access or use the information resource. (Privacy considerations give rise to access constraints, intellectual property considerations give rise to use constraints, for example.)

(60) DC 1.1 element "Rights Management", Comment: The last sentence of the Comment seems completely out of place. Clearly the issue referenced is related to national and international law. One could say, perhaps, that no rights of access or use are granted in the absence of a granting statement, though this may well not hold up in court. The Internet convention seems to be that in the absence of a constraints statement there are no constraints on access or use.

(61) DC 1.1 element "Rights Management", Comment: It is not clear whether the referenced rights are from the perspective of the owner ("intellectual property rights") or the rights of the user ("copyright").

(62) DC 1.1 element "Rights Management", Representation Class: The recommended Representation Class is "text".

-------------------------------------------------------------

Eliot Christian, US Geological Survey, 802 National Center, Reston VA 20192 echristi@usgs.gov Office 703-648-7245 FAX 703-648-7112 Home 703-476-6134

-- Anonymous, July 21, 1999


Moderation questions? read the FAQ