Data Transformations
This page presents the results of a series
of data transformation tests carried out at ICS-FORTH, in the framework
of the Harmony + CIMI Collaboration: Interoperability and metadata vocabularies. Data formats from different
organisations were mapped to a simple XML DTD compatible with the
CIDOC CRM. In the sequence, sample data were transformed automatically
using commercial tools into instances of this DTD. These instances
can best be viewed with a style sheet or XSL file like those provided
here.
Alternatively, instead of a DTD we could
have used an RDFS formulation of the CIDOC CRM. The respective
RDFS instances could be formally validated wrt the CIDOC CRM, as
RDFS allows to implement the IsA hierarchies of the CIDOC CRM.
The logic of the transformation is however identical for this DTD,
and a proper RDF Schema, only the output syntax would differ. We
have choosen a DTD here, because it is easier to display and to
explain as an example.
The CIDOC
CRM Entity DTD can be regarded as a new approach to the definition
of data transport formats. Let us assume, that all properties
of the CIDOC CRM would be declared at the highest entity in the
IsA hierarchy. Via inheritance, all properties would appear at
the entities they were foreseen for, and at all others. The CIDOC
CRM properties are optional, they need not be used, in particular
those, which appear now at entities that should not have such
a property. Therefore a correct instance of the correct CIDOC
CRM is also a correct instance of this simplified model. Moreover,
the entities and properties it instantiates are consistent with
respect to both models. Only, the simplified model does not provide
any guidance to the correct combination of properties and entities.
Therefore it is a "transport model" rather than a "validation
model".
Based on these considerations, we have
defined a recursive DTD, were the elements correspond to properties
rather than entities. All properties are allowed for the "root" entity,
and within each element representing a property except for properties
with a primitive value (string, time, number). The "forward" and "backward" use
of a property is defined as a different element. The classification
of a node is described by another element called "in class". So,
the DTD can be used to transport valid instances of the CIDOC CRM,
e.g. of an RDFS formulation of the CIDOC CRM.
The DTD is based on the test CIDOC CRM
version 3.0. As these results are quite new, we apologize for possible
minor inconsistencies between the latest versions of the CIDOC
CRM, the DTD, and the transformed data. We shall try to eliminate
those in the days to come.
For presentation purposes we have created
a first XSL and CSS presentation
format. It simulates the format of the "CIDOC
CRM example", and leads to a readable form, which by sure
stands improvements. Necessarily we need ":before" elements to
show the name of an element. This feature is not supported by Microsoft
Internet Explorer, but by some XML editors. Therefore we add the
XSL form. Interesting enough, with these simple means any data
transformed into a CIDOC CRM-compatible form can be automatically
displayed in a readable form!
Science Museum of London
Transformations of sample data of
the Science
Museum of London to the CIDOC CRM, are described in the following
document:
Martin Doerr & Iraklis Karvasonas, Converting object documentation into
a CRM-compatible XML form using Data Junction 7.5 , ICS-FORTH, Heraklion, Greece,
May 2001. Available: word file (128
Kb).
Harmony + CIMI Tests
In the framework of the Harmony + CIMI Collaboration: Interoperability and metadata vocabularies, the extended CIDOC CRM
has been base for data transfer experiments from museum data
of 4 organisations to the CIDOC CRM (National
Museum of Denmark, AMOL, RLG, The
John Clayton Herbarium of The
Natural History Museum, London). ICS-FORTH is
assisting Harmony in
this collaboration with respect to the use of the CIDOC CRM.
- Data sample from the National
Museum of Denmark (NMD)
The current schema of the NMD database GENREG is shown graphically on the
following images created with ACCESS:
1. 2.
This file shows the ACCESS
representation of the data dictionary of the NMD database
with comments about the mapping to the CIDOC CRM (word
file 682 Kb). The GENREG model is event-centric.
As reasonable in a Relational implementation, it keeps
the number of tables small. Therefore fine-granularity
distinctions between events as in the CIDOC CRM are
expressed by types of events and types of roles. Naturally,
there is no built-in mechanism top constrain event
types to the allowed roles. Such a service could be
implemented using the CIDOC CRM. In this mapping we
have not analyzed in depth all types of events used
in the NMD base to achieve an optimal mapping to its
coprresponding CIDOC CRM subclasses of Event. The idea
of this test was to demonstrate the feasibility. This
demonstrates that in general a mapping is based on
the input schema and on type definitions used in the
input data.
A peculiarity of the NMD
data is the default event of classification and measurement:
Classification if not otherwise specified is implied
in the "use event" (Brug), and measurement in the acquisition
event. We have traced these cases and interpreted those
events as multiple instantiation of both implied CIDOC
CRM classes. Note, that we regard a collection as a
physical object, similar to a set of chessmen, a bikini,
a set of plates. The argument is, that a collection
has a total weight, can be destroyed, shares a common
life-cycle. Coming and going of parts is neither unusual
to other objects, just look at your computer.
Here now the result of
the transformation, data from the ethnographic collection
of the NMD, with embedded images. This file must be
viewed with an XSL-enabled viewer.
NMD sample in CIDOC CRM form: (xml
file 273 Kb)*
- Data sample from the Australian
Museums On-Line (AMOL)
The schema of the Australian Museums On-Line (AMOL) database is a flat list
of attributes shown in: AMOL schema (word
file 59 KB).
The fields of the AMOL
schema have a loose semantic connection to the data
in it. They are more on the level of a document structure
than of a conceptual model. Therefore a direct mapping
of AMOL field semantics to CIDOC CRM notions is not
possible besides a few fields. The Clayton example
below is just the opposite. All data fields can be
interpreted with high precision, but they provide few
structuring. They allow however for complete automatic
data transformation to the CIDOC CRM. We have mapped
in a first step all such structuring fields of the
AMOL data to the CIDOC CRM "has note" property, and
the interpretable fields to the respective CIDOC CRM
properties.
Here now the result of
the automatic transformation, data from the Australian
Museums On-Line (AMOL), with embedded images. This
file must be viewed with an XSL-enabled viewer.
AMOL sample in CIDOC CRM form, part 1: (xml
file 83 Kb)*
AMOL sample in CIDOC CRM form, part 2: (xml
file 86 Kb)*
AMOL sample in CIDOC CRM form, part 3: (xml
file 95 Kb)*
AMOL sample in CIDOC CRM form, part 4: (xml
file 95 Kb)*
In a second step we have
analyzed one AMOL record by hand in order to demonstrate
that the meanings referred in these records are completely
covered by the CIDOC CRM. A satisfactory automatic
transformation of the AMOL data to the CIDOC CRM could
be achieved by the use of text parsers, based on heuristics
and by comparison with place name and person name authorities,
as usual in data mining and citation index generation.
This was however beyond the resources we could assign
to this test. The complexity of such an analysis could
be greatly reduced, if a certain displine of separating
person names,organisation names and place names would
be applied. The field "subject" exhibits a certain
object type dependent polysemy, which could have been
better analyzed by us. The notion of modelling "subject" in
the librarians' sense is an issue still under discussion
in the CIDOC CRM.
Here now the result of
the transformation by hand of one record from the Australian
Museums On-Line (AMOL), with embedded image. This file
must be viewed with an XSL-enabled viewer.
AMOL sample in CIDOC CRM form, complete mapping: (xml
file 4 Kb)*
- Data sample from the The
John Clayton Herbarium of The
Natural History Museum, London
The schema of the Clayton database is a flat list of attributes shown in: "Clayton
schema" .
This transformation contistutes
the first test of natural history data with the CIDOC
CRM. The Clayton example consists of reasoning between
object types, their names and types, classification
and prototypicality of specimen. The only aspect we
could identify not to be covered by the CIDOC CRM already
is the "Type Specimen", which could be generalized
as a property: E55 Type: has prototype (is prototype
of). Else, the events of classification, the distinction
between names and types, and the recent (Agios Pavlos
Extensions) subordination of E55 Type to CIDOC CRM
Entity seemed to us to be satisfactory to capture this
reasoning. We kindly ask Natural History experts and
particularly the curators of the Clayton collection
to provide us with feedback to our interpretation of
these data.
Here now the first result
of the transformation, data from the John Clayton Herbarium,
with embedded images. This file must be viewed with
an XSL-enabled viewer.
Clayton sample in CIDOC CRM form, part 1: (xml
file 77 Kb)*
Clayton sample in CIDOC CRM form, part 2: (xml
file 76 Kb)*
Clayton sample in CIDOC CRM form, part 3: (xml
file 77 Kb)*
Clayton sample in CIDOC CRM form, part 4: (xml
file 81 Kb)*
* To view XML files MSIE 5+ or Netscape 6 are recommended
- A full report of the transformations
will be shown in this page soon.
Martin Doerr
Chair, CIDOC CRM SIG,
June 2001
Back to top
|
CIDOC
CRM Special Interest Group
Working Group of CIDOC |
|