CIDOC CRM Special Interest Group Conference
March 26-28, 2003 Smithsonian, Washington, D.C.
Conference Topic:  "SHARING THE KNOWLEDGE"

Human Markup Language  (HumanML):
Humanness Content and Sharing across Perspective Shifts


S. Candelaria de Ram , James Landrum III , Ranjeeth Kumar Thunga , Claude (Len) Bullard" , Rob Nixon , Rex Brooks , Joseph Norris , Emmanuil Batsis

0. Abstract. Human Markup Language, a set of standard HumanML tags, are designed for explicitly indicating "humanness" -- meaningfulness originating in your point of view and interpreted in the perspective of your fellow in communicating. Gaps in perspectives are to be expected in globalized communication. But resolving them fast is of supreme importance.

HumanML, or "Human Markup Language", is a developing standard tagset for XML envelopes and RDF logic. The endeavor is one among the OASIS projects. To quote the Technical Committee's Charter, "HumanML is designed to represent human characteristics through XML. The aim is to enhance the fidelity of human communication. "

As meta-labels for communicators' content and perspective, HuML tags address both the personal and contextual aspects of communications. They may range from Community, Culture, Intent, and Emotion to descriptors of person/group and circumstance.

HumanML tags are carried along with the signal itself as it crosses the bridge between you and your fellow. The goals are better interpretation and more accurate messaging -- even if the fellowship involves a computer or two.

Computer applications, say for virtual face-to-face communication, may pick up HumanML terms expanded to suit different domains: For instance, primary terms in the base schema such as Intent, Kinesic, and GeoLocator can be variously expanded in applications that handle communications of diplomacy, dance, design, or education. Similarly, Community and Semiosis may be detailed for local or specialized genres and sign-systems.

Standard XML expansion mechanisms are available for derivation and expansion as HumanML primary terms are built upon to yield secondary terms for different practical tasks. Use Cases illustrate.

1.0 . Comparison of Human Markup Language and CIDOC CRM endeavors.

Human Markup Language (HumanML) in OASIS' XML subdomain namespace prefix) and CIDOC CRM (Conceptual Reference Model of International Committee for Documentation of the International Council of Museums) have commonalities of purpose, and social and implementation structure . They share being designed "with a view to enabling wide area information exchange and integration of heterogeneous sources. " (to quote http://cidoc.ics.forth.gr/scope.html).

Both HumanML and CIDOC CRM can be and have been cast as XML and RDF tagging systems, which insert identifying/descriptive labels at appropriate places in electronic documents. These enable automated processing by computers to find commonalities among accumulated documents and do other processing. HumanML and CIDOC CRM are both intended to provide "providing a common and extensible semantic framework"

Over ten years old, the CRM endeavor "is intended to serve as a common language for domain experts and implementers to formulate requirements for information

1.1. Standardizations. Both HumanML and CRM are "intended to promote a shared understanding of cultural heritage information" and use document tagging ("markup") and object-oriented techniques.

In December 2002, the CRM spec moved toward being a recognized standard through the International Standards Organization that arose from the hard sciences (Draft ISO/TC46/SC4).

On January 12, 2003 a first HuML spec was adopted under the umbrella of OASIS after a month-long public comment period (Human Markup Language Primary Base XML Specification 1.0 http://www.oasis-open.org/committees/humanmarkup/#documents).

OASIS, being business-oriented, builds upon and supplements the work on markup done by standards bodies such as W3C (for XML) or ISO (for SGML) with practical applications and their inter-operability (pragmatics "such as schemas/DTDs, name spaces, style sheets, ..., electronic exchange of business information" ; see http://www.oasis-open.org/committees/). The OASIS Technical Committee for Human Markup Language work (http://www.oasis-open.org/committees/humanmarkup/) is bolstered by a non-profit corporation (http://humanmarkup.org).

1.2. Scope. Being a much younger effort, HumanML is undergoing rapid evolution of implementations after discussion by people with expertise in both "hard" and "soft" sciences as well as computing. CIDOC is currently involved in conformance verifications applied to extant archives, as seen in the list of collections drawn from their web page (see Appendix). Note that eventually HumanML may be useful for characterizing collected dialogue (in collections). While both efforts reify or reflect what has happened, I (S. Candelaria de Ram) would venture that CIDOC's focus is more on the past and HumanML's more on ongoing process. The one is for uniformization, the other for bridging shifts in perspective.

HumanML plans "sets of modules which frame and embed contextual human characteristics including physical, cultural, social, kinesic, psychological, and intentional features within conveyed information. Other efforts within the scope of the HumanMarkup TC include messaging, style, alternate schemas, constraint mechanisms, object models, and repository systems, which will address the overall concerns of both representing and amalgamating human information within data.

2.0. The Development, Purpose and Scope of the Human Markup Language: A Brief History and Introduction.

The effort to create the Human Markup Language, HumanML, started with Ranjeeth Kumar Thunga's frustration during his study of Psychology. It was due largely to the discrepancy in the rapid improvement of technology developed from the "hard" physical sciences , Biology, Physics, Chemistry and the like, when compared to the lack of similar improvement in the "soft" human-centered sciences such as Psychology, Sociology, Cultural Anthropology. This discrepancy continued to occupy Ranjeeth Kumar Thunga as he entered into the workplace in the burgeoning arena of web development in the 1990s.

In this environment, Ranjeeth concluded that one way to help such improvements to human concerns in Information Technology was through XML, the eXtensible Markup Language. XML allows for standard vocabularies or "Markup Languages" to be developed in the various "domains" of information technology. XML allows for interoperability across digital information systems, platforms and software programs.

Ranjeeth started the Humanmarkup discussion list on YahooGroups in February, 2001 to explore how to bring technological improvements to the practices of soft "human" sciences. The initial discussions were captured in the archives of YahooGroups (http:// ). The topic tends to attract people who not only have practical computing experience in information technology, but apply extraordinarily broad backgrounds in hard and human sciences, especially language, culture, and human-descriptive fields.

Chief among goals pointed out by the group was clarifying communication, especially within digital information sytems. It was also seen that standardizing Human Information was occuring in various efforts to establish a system to assure uniform authentication of basic Human Identity. Extending this to ensure both greater depth of individual information and control over that information was identified as another important purpose.

Further purposes were identified by application-area. Among them were:

                        * Anthropology
                        * Archeology
                        * Artificial Intelligence
                        * Biometrics
                        * Business Decision Support (Heuristics)
                        * Communications
                        * Conflict Resolution
                        * Cultural Studies
                        * Diplomacy
                        * Economics
                        * Emergency Services
                        * Government Record Keeping
                        * Human Behavior Representation
                        * Individual Personalization and Identity Authentication Enhancement
                        * Linguistics
                        * Marketing
                        * Medicine
                        * Psychology
                        * Sociology
                        * Virtual Reality

It was decided that HumanML would consist of XML Schemata for defining use of terms and RDF Schemata to associate those terms with standard resources in Academic, Governmental and Business use to ensure interoperability. It was also decided to associate HumanML with standard Ontological and Taxonomic frameworks.

It was decided to concentrate on improving communication as the mission and to move the effort to OASIS, The Organization for the Advancement of Structured Information Standards, to provide public access, credibility and the protection of Intellectual Property Rights for the standard specifications anticipated.

The OASIS HumanMarkup Technical Committee was established in September, 2001.

The early months of the TC were spent in gathering requirements, during which the principles of Semiotics, or Semiology, were adopted as the guiding principles of this effort.

We delivered our first version of a Requirements Document April 31, 2002: http://www.oasis-open.org/committees/humanmarkup/schema/huml-primary-base-1.0.xsd and our first Committee Specification, The Human Markup Language Primary Base XML Specification 1.0 on December 12, 2002, which was modified slightly after an initial Public Comment Period of Thirty (30) days and adopted January 12, 2003: http://www.oasis-open.org/committees/humanmarkup/#documents


=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
HUMAN MARKUP LANGUAGE DEVELOPMENT AND SCOPE
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


I. Introduction
        A. R. K. Thunga frustrated Psychology student at lack of advances in "Soft" Human Sciences.
        B. RKT involved in Web Development, sees XML standards as way to improvement
        C. HumanML born as Yahoo Group in Feb. 2001
II. Initial Discussions captured in Yahoo archives
        A. Purposes
                1. Improve Communication
                2. Humanize IT (digital information systems)
                3. Standardize Human Information beyond basic Identity Authentication
                4. Many individual project goals-became application area domains
                        a. Anthropology
                        b. Archeology
                        c. Artificial Intelligence
                        d. Biometrics
                        e. Business Decision Support (Heuristics)
                        f. Communications
                        g. Conflict Resolution
                        h. Cultural Studies
                        i. Diplomacy
                        j. Economics
                        k. Emergency Services
                        l. Government Record Keeping
                        m. Human Behavior Representation
                        n. Individual Personalization and Identity Authentication Enhancement
                        o. Marketing
                        p. Medicine
                        q. Virtual Reality
        B. Means
                1. XML Language/Vocabulary

3.0. Terms of Human Markup Language Primary Base Schema v. 1.0.

This gives an idea what HumanML definitions look like and how you get them.

3.1. How to Pull HumanML into your Documents

To pull HumanML definitions into your documents you put the reference to it in your document [headers]. Then anyone can find out what the terms mean and use them as known. NB: You go here.

Your headers include precisely this:

3.2 Sample Terms: HumanML XML Schema term samples of definitions with range of Cultural and Personal Perspective characterizations.

Drawn from http://www.oasis-open.org/committees/humanmarkup/documents/HM.Requirements.html. Entitled Human Markup Language Primary Base Specification 1.0, the specification is Copyright 2002 by The Organization for the Advancement of Structured Information Standards [OASIS].

A couple of terms of interest to illustrate are Kinesics and Semiosis, movement (especially body movement) and communicational process.

3.2.1. Kinesic: Human Movements
Communicational Kinesics constitute some vocabulary of body language used to portray moods and emotions and to add emphasis to verbal communication. As a study concerned with how bodily and facial gestures function as a factor in communication, kinesics is fairly well understood. For our purposes we expect enumeration of body language gestures to be included in culture-specific subsets.

A kinesic vocabulary is deferred to either the Secondary Base Schema or other subsequent huml schemata. The provided for this expansion is huml Kinesic.
<xs:complexType name="Kinesic" abstract="true">
         <xs:annotation>
                  <xs:documentation>
                  Kinesic: Human Movements
                  Communicational Kinesics constitute a vocabulary of body language
                  used to portray moods and emotions and to add emphasis to verbal
                  communication. As a study concerned with how bodily and facial
                  gestures function as a factor in communication, kinesics is fairly
                  well understood. For our purposes we expect enumeration of body
                  language gestures to be included in culture-specific subsets.
                  </xs:documentation>
<xs:appinfo>NONE</xs:appinfo>
</xs:annotation>
<xs:attributeGroup ref="humlCommAtts"/>
<xs:attributeGroup ref="humlTemporalAtts"/>
<xs:attribute ref="intensity"/>
</xs:complexType>

3.2.2. Semiotic Communication Mode
Semiosis is a meaningful exchange of signs, signals and symbols among cognitive agents.

NOTE: This process is the model of the human communication process upon which HumanML is based. It can be, and we expect that it will be, further enumerated by semiotic types and extended in the Secondary Base Schema and subsequent huml schemata.
<xs:complexType name="Semiosis" abstract="true">
         <xs:annotation>
                  <xs:documentation>
                  Semiotic Communication Mode
                  Semiosis is a meaningful exchange of signs, signals and symbols
                  among cognitive agents.
                  
                  NOTE: This process is the model of the human communication
                  process upon which HumanML is based. It can be, and we expect
                  that it will be further enumerated by semiotic types and extended
                  in the Secondary Base Schema and subsequent huml schema.
                  </xs:documentation>
         <xs:appinfo>NONE</xs:appinfo>
         </xs:annotation>
         <xs:attributeGroup ref="humlTemporalAtts"/>
         <xs:attributeGroup ref="humlCommAtts"/>
</xs:complexType>

3.3. Parallel RDF <=> XML Schema definitions for HumanML Computer Reasoning

HumanML, while being a standardized XML Schema specification vocabulary, can be used for [computer] reasoning over tagged document components with Resource Document Format or RDF. Parallel sample RDF definitions provided by huml group member "Emmanuil Batsis (Manos)" for Kinesics and Semiosis look like this.

<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:xs="http://www.w3.org/2001/XMLSchema-datatypes">

   
    <rdfs:Property rdf:id="id">
        <rdfs:domain rdf:resource="#Semiosis"/>
         <rdfs:domain rdf:resource="#Kinesic"/>
        <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema-datatypes#ID"/>
    </rdfs:Property>
    <rdfs:Property rdf:id="id">
        <rdfs:domain rdf:resource="#Semiosis"/>
         <rdfs:domain rdf:resource="#Kinesic"/>
        <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema-datatypes#ID"/>
    </rdfs:Property>
    <rdfs:Property rdf:id="humlName">
        <rdfs:domain rdf:resource="#Semiosis"/>
         <rdfs:domain rdf:resource="#Kinesic"/>
        <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema-datatypes#string"/>
    </rdfs:Property>
    <rdfs:Property rdf:id="fromDate">
        <rdfs:domain rdf:resource="#Semiosis"/>
         <rdfs:domain rdf:resource="#Kinesic"/>
        <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema-datatypes#dateTime"/>
    </rdfs:Property>
    <rdfs:Property rdf:id="toDate">
        <rdfs:domain rdf:resource="#Semiosis"/>
         <rdfs:domain rdf:resource="#Kinesic"/>
        <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema-datatypes#dateTime"/>
    </rdfs:Property>
    <rdfs:Property rdf:id="toDate">
        <rdfs:domain rdf:resource="#Semiosis"/>
         <rdfs:domain rdf:resource="#Kinesic"/>
        <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema-datatypes#dateTime"/>
    </rdfs:Property>
   
    <rdfs:Property rdf:id="intensity">
        <rdfs:domain rdf:resource="#Kinesic"/>
    </rdfs:Property>
    <rdfs:Class rdf:ID="Semiosis">
        <rdfs:label xml:lang="en">Semiosis</rdfs:label>
        <rdfs:comment>
            Semiotic Communication Mode.
            Semiosis is a meaningful exchange of signs, signals and symbols among cognitive agents.
            NOTE: This process is the model of the human communication process upon which HumanML
            is based. It can be, and we expect that it will be further enumerated by semiotic types
            and extended in the Secondary Base Schema and subsequent huml schemata.  
        </rdfs:comment>
    </rdfs:Class>
   
    <rdfs:Class rdf:ID="Kinesic">
        <rdfs:label xml:lang="en">Semiosis</rdfs:label>
        <rdfs:comment>
            Kinesic: Human Movements
             Communicational Kinesics constitute some vocabulary of body language used to portray
             moods and emotions and to add emphasis to verbal communication. As a study concerned
             with how bodily and facial gestures function as a factor in communication, kinesics is
             fairly well understood. For our purposes we expect enumeration of body language gestures
             to be included in culture-specific subsets.  
        </rdfs:comment>
    </rdfs:Class>
</rdf:RDF>

[RDFs to HTML via Lyx use of http://www.latex2html.org : LaTeX2HTML translator Version 2002-1 (1.68), q.v. The translation was initiated and edited by SC on 2003-03-15.]

RDF components are used as parts of predications. This sort of inter-conversion allows HumanML descriptions to be useful for computer problem solving in domains where language variety, formality "register", speaker role and mood, and other things mean that there is more meaning than meets the eye in someone's missive ( a letter, say) -- or less! These factors are more easily imputed when the speakers/listeners and their contexts are the same than in rapid/delayed, electronically facilitated global communication.

3.4. Entire range of HumanML terms in the first basic schema, to be extended by apps needing specialized terms.

In building from an original example worked up by Len Bullard toward the nuclear terms, it was found that capturing the contextual, dynamic nature of of active communication is not as readily cast as either ontology nor hierarchy. The current XMLSchema, however, has the standard OASIS setup, as seen in the quick guide below, huml At Hand. Being an XML Schema, HumanML (whose tag name is "huml") is comprised of SIMPLE types, COMPLEX types, and ATTRIBUTES.:

huml AT HAND -- HumanML QUICK-READ

ELEMENTS (of namespace) HumanML
<xs:schema targetNamespace="http://www.oasis-open.org/committees/humanmarkup/schema/huml-primary-base-1.0.xsd" xmlns="http://www.oasis-open.org/committees/humanmarkup/schema/huml-primary-base-1.0.xsd" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:a="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0" xmlns:n="urn:oasis:names:tc:ciq:xsdschema:xNL:2.0" elementFormDefault="qualified" attributeFormDefault="unqualified">

SIMPLE TYPES (non-capitalized camel)
	<xs:simpleType name="">

range	(global)
	decimal within [0,1]

Locator
	[upper, lower, back, front, inner, outer, left, right, top, bottom]

____________________________________
ATTRIBUTE GROUPS: (non-cap camel)
<xs:attributeGroup name=""*gt;

	ATTRIBUTE (non-cap camel)
	<xs:attribute name="">

humlTemporalAtts
humlIdentifierAtts
humlCommAtts
physicalDescriptors
	height
	weight
	hairColor
	eyeColor
	build
	scarsMarksTattoos

age
	dateOfBirth
	dateOfDeath
gender
	genderAtBirth
	currentGender
	impersonator

bodyPart
	[arm, leg, head]

intensity

____________________________________
COMPLEX TYPES (Capitalized camel)
	<xs:complexType name="">

Address
	[postal, residential, email, previous, current]
Artifact
	humlIdentifierAtts
	humlCommAtts
	humlTemporalAtts
Belief
	humlIdentifierAtts
BodyLocation
	bodyPart
	location Locator
	humlIdentifierAtts
Channel: {sight, hearing, touch, taste, smell, kinesthetic}
	humlCommAtts
	strength of signal

Note that whilst Address and GeoLocation are within the scope of CRM vocabulary, Belief, Intent, Semiosis and other terms in the HumanML range are properties of communicating rather than of artifacts. With the advent of multimedia, though, Channel joins shared descriptors like Address.

4.0. Human Markup Language Use Cases

While implementations using huml tagging have just begun, several Use Cases can indicate how it can be put into practice. There is a great range of domains. Several examples will give an idea what some may look like. Note that applications can add in further distinctions they need, extending the Primary Base Schema into the Secondary Schema level. (Illustrated. See presentation.)

4.1. Discourse intent: Human Implicata.

4.2. Verbal and non-verbal communication correspondences.

4.3. Mood.

4.4. Kinesics: Movement indication.

4.5. Semiosis by Semiotes: Characterizing Shifting Perspectives.

Referencing language variety, provenience, and media of both original signals are important here. Characterizing transformations of the channeled signals traversing the gap between interpretive agents allows room for inserting markup. The meaningfulness and parallelism among the interpretations of the parties to the communication -- the semiotes -- are, hopefully, maximized by using HumanML to add back in those qualities lost in transmission. More notation is called for when the semiotes' contexts contrast the most. (The term "semiote" was coined Aug. 17, 2002 by huml group member S. Candelaria de Ram for the agents-in-context in a Semiosis diagram.)

5.0. CONCLUSION

Elegantly summarizing what the Human Markup Language effort is about, group member [Claude] Len Bullard describes its applicability (in an email to the group March 4, 2003):

And Human Markup Language aims to help put the Humanness Content back into remote communication so that Sharing is maximized across Perspective Shifts.


APPENDICES

CRM Collection Descriptions Vocabulary Standardization (from CIDOC website). A goal of CRM which contrasts with HumanML's purpose is getting uniform descriptors for artifact collections all over the world so that computer searches can come up with comparative data.

CRM CONFORMANCE OF COLLECTIONS


_________________
1. Completed:
_________________

Dublin Core
Art Museum Image Consortium (AMICO) (with the exception of data encoding information)
Encoded Archival Description (EAD)
MDA SPECTRUM
Natural History Museum (London) John Clayton Herbarium Data Dictionary
National Museum of Denmark GENREG
International Federation of Library Associations and Institutions (IFLA) Functional Requirements for Bibliographic Records (FRBR)
OPENGIS
Association of American Museums Nazi-era Provenance Standard
MPEG7
Research Libraries Goup (RLG) Cultural Materials Initiative DTD

_________________
2. Currently in progress:
_________________

Consortium for the Computer Interchange of Museum Information (CIMI) Z39.50 Profile
Council for the Prevention of Art Theft Object ID (core and recommended categories)
The International Committee for Documentation of the International Council of Museums (CIDOC) The International Core Data Standard for Archaeological and Architectural Heritage
Core Data Index to Historic Buildings and Monuments of the Architectural Heritage
CIDOC Normes Documentaires (Archeologie)/ Data Standards (Archaeology)
English Heritage MIDAS - A Manual and Data Standard for Monument Inventories
English Heritage SMR 97
Hellenic Ministry of Culture POLEMON Data Dictionary


_________________
3. Desirable:
_________________

FENSCORE
Sydney University TimeMapper
Data Service Standards in Archaeology
Digital Library Metadata
Spanish DAC
Ministere de la Culture et Communications (MCC) Systemes Descriptifs pour les Collections?
CHIN Data Dictionaries
International Council on Archives (ICA) ISAD(G) - International Standard Archival Description (General)
CIDOC Recommendations for Ethnography Collections
Visual Resources Association Core Categories
MARC - Machine Readable Cataloguing
CIMI SGML DTD
Getty CDWA - Categories for the Description of Works of Art
RSLP Collection Description
MODES OBJECT FORMAT