Issue 364: Create Profile Markup Language/Schema

Starting Date: 
2018-01-17
Status: 
Open
Background: 

In the 40th joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 33nd FRBR - CIDOC CRM Harmonization meeting, Francesco Beretta(FB) informed the crm-sig that Data for History group thinks that it is necessary to create a tool that allows users to create data profiles using CIDOC CRM classes and relations, plus local extensions, but in order to be able to share these profiles more generally we would need a profile markup language. HW is assigned to CEO will look into making such a markup/schema (could be TEI inspired) in cooperation with  FB, GB will contact Wisski and ResearchSpace to tell them about this development.

Cologne, January 2018

 

 

Current Proposal: 

Posted by Christian Emil on 14/5/2017

Dear Francesco and George,
As I mentioned, my focus has been diverted this spring. Therefore, I haven’t had time to go into this. However, I have started. There are a few principle questions we need to focus on.  If we go back to Robert Sanderson’s “evaluation document  of the “useful” and not so “useful” classes of CRM (http://linked.art/model/profile/), he promised also to give an evaluation of the properties. Apparently, this energy was not sufficient.  As we discussed in Cologne what Robert really wants is a profile system in which he can pick out a subpart of CRM. The question is what the purpose is and this will affect an xml-formalism for defining a profile. For me it the most useful purpose is a way to simplify data entry and data export. How can one easily create a mapping from a given input form to CRM (both data entry and search) and how can one produce  datareports (e.g. results of a query in tabular form)? To use a profile formalism to create a simplified graph database or ontology seems a little odd to me. It is also much more complicated since one has to decide what should be the result if one take away a class frequently used in paths, e.g. birth event? One must also decide what the consequence of excluding the domain or range of a superproperty or a property where the domain or range has subclasses.

If we restrict ourselves to data entry, query forms and export of data, then the exercise will be a mapping system from some (xml) description of a dataentry/search form and  a mapping to a result description.

Comments, suggestions, opinions?

Posted by George on 15/5/2018

Dear all,

Having dealt with this now in a number of contexts, I see a number of things going on which generally fall under the category ‘how do I make CRM a reality for use with some data in some format such that when I try to use it with someone else who has also adopted this format it does what the bill of sale said it would and gives me interoperability’.

There are three levels I have discerned on which people want to share info.

Semantic: There are general modelling patterns which are: how should you semantically try to represent x or y situation so that it is semantically correct? This is more or less the level of the abstract diagrams we have on the cidoc website (just in case you are able to actually find them which is a particularly skilled user).  That being said, this diagrams talk about very general things and don’t necessarily talk to a particular audience. Hence people recreate them for particular groups.

Then there is an implementation level description which is to say, now that I know what semantically I want to represent, how do I actually build a particular information structure for this in a particular format. The hot topic format is RDF amongst the groups I have seen. Because there is no authoratative word, everyone is reinventing the wheel and people are doing it differently. This should be stopped as quickly as possible in my opinion via a well considered and generally accepted position on the base level encoding strategy in RDF. Again this doesn’t need a particular format, but needs a thorough and well argued description.

Finally, sometimes either the low level patterns or a more complex collection of patterns which essentially indicates a best practice model of how to encode some thing, are encoded in some way as an application profile that picks out a particular set of classes and properties deployed in a certain way in order to facilitate making statements of kind x in system y. This is done for example by Arches, by Research Space and by Wisski, probably by Qoqnus in Iran and by numerous other consortia projects without any interchange or consolidation format.

The arguments of Robert are at a different level and have to do with a fundamentally different position on CRM. Essentially they would like to cut it down significantly so that it was easy for developers to use as it is apparently hard to code around (I’m not a developer so I take no position on whether this is true). I think we can ignore that within the context of the interchange format for application profiles (different encodings of the classes and relations that give a model paradigm that should be reusable by other systems adopting CRM that do not adopt the same encoding paradigm).

To me the task of creating such a markup language sounds incredibly difficult. I however recently heard of two potential existing languages which I have not investigated called: Shackl and Shex. If we could avoid having to invent this, it would be great.

Anyhow, I agree with CEO that defining the use of this is the first step before going further. To my opinion the use of it, is to allow different teams that are encoding a best practice model in a particular system to be able to make a standard representation that would in turn be reusable/renecodable in other CRM conforming formats.

Posted by Christian Emil on 15/5/2018

Hi

I attach a short presentation and a short paper by Ari Häyrinen​. The work is from 2007-2010. The paper is in a conference report

(Proceedings of the 2011 conference on Information Modelling and Knowledge Bases XXII,Pages 312-320 IOS Press Amsterdam, The Netherlands, The Netherlands ©2011  ISBN: 978-1-60750-689-8​)


It is not rocket science. However, it is interesting because it illustrates what many projects invent and want to have 10 years ago and now), for example WISSKI. In WISSKI this was a hot issue 9-10 years ago. In ARCHES this was a new feature 2 years ago. The task in all this projects is to make an (mostly XML) description of the structure behind a more intuitive  input/output form and the underlying information/data-structure be it rdf-triplestore or a relational database. The point with such forms is to hide long  chains of foreign keys in a relational database or a path of rdf-triples in a triplestore.  The xml-description of a user friendly input form and its relation to say, CIDOC CRM as RDF is can be expressed in X3ML, maybe slightly extended. The form itself can be incarnated by a xslt transformation producing html.


It is also possible to reverse the prosess, that is from a triplestore or a rdbms to a some other structure. Then one needs to define what a data object is, say a person, and which attributes it possibly has, and then traverse the structure from the  primary identifier of this object. It is somehat similar to the idea behind http://www.cidoc-crm.org/sites/default/files/RDF%20visualiser.pdf


I will read more about Shackl and Shex now.

Posted by Christian Emil on 15/5/2018

I have read the primer to Shex and speed read the specifications of  Shacl 66 pages.  The seems to be two alternatives to solve the same problem. In the slides (link below) they are compared.

The main purpose of the two  is to define a formalism for data/type  checking of rdf-based data. RDF as such does not contain any such mechanism, it is like post modern archeology - everything goes or for geeks like typed and untyped lambda calculus. Remember that OWL etc only use rdf(s) as data storage format. The semantics and check is in the interpreter and reasoned.

So far I find Shex clearer and more well defined.  Both Shex and Shacl can be used to create checks on input data and checks that a graph is valid with respect to an ontology. For example it is possible to create shapes in Shex and Shacl which one can use to check a graph and find out whether it is CRM compatible.  In principle one could write a system that produces shapes in Shex and Shacl for a profile describing a subset of CRM. It can be useful check that data from an input form is conformant to CRM or a subset. I don’t think it is the right way to solve our task.​ The X3ML may do that more efficiently.


 

posted by George on 19/5/2018

Hi Christian-Emil,

I also had a chance now to look through Shex and somewhat Shacl now as well as the paper you suggested. Thanks for the comparison document as well, that was a useful find.

Open to debate, but I think what we need is a format which can be used as a basis for generating forms in different formats. So I think we agree in the sense that

> The task in all this projects is to make an (mostly XML) description of the structure behind a more intuitive  input/output form and the underlying information/data-structure be it rdf-triplestore or a relational database. The point with such forms is to hide long  chains of foreign keys in a relational database or a path of rdf-triples in a triplestore.

One thing we need to find is the common format for the structure description/prescription. I don’t have a bias towards Shex or Shacl, but a feature I do like about them is that they are also useful for validating. So if you do have some form that generates RDF which you think follows a certain CRM pattern you can find out if the end data product actually is conformant. Of course, I’m a supporter of X3ML so if it could serve also as this common format for describing/prescribing how to create a certain pattern that would be great. I mostly think of it as a translation tool though so I haven’t quite visualized yet how it would play this role.

After we had a common description format, whatever it is, we would have various projects (Arches, Wisski, ResearchSpace and the list goes on), that use their own internal language to create patterns. I guess we would need to create buy in to creating a transformation (where 3M/X3ML could play role potentially?) into whatever structure description format SIG decided to adopt. In the case of tools such as Francesco is proposing that would allow users to choose a subset of useful properties for them from a batch of CRM and other ontology extensions and then arrange them to create standard modelling constructs, such an output format could be native to the system.

Whatever structure description format we decided to adopt, I think it would be of significant use to SIG and the CRM community by allowing SIG to offer already basic ways of modelling things that need not be rethought because they are obvious and necessary, as well as create a communication standard whereby community members could share larger more subjective/discipline oriented models and at least have a common comparison basis.

I have an interesting live use case that I know is going on right now. I know that in AAC, in Arches, in another art consortium in Europe and in the MASA archaeology consortium everyone is discussing the proper modelling of E52 Time Span. This is a very closed problem, in my opinion, with very few right solutions. In fact there should simply be some set suggestions which one can choose from based on one’s scenario (thinking of creating data from scratch, not mapping existing data). Unfortunately no such approved recommendation exists at a central level so everyone is reinventing the wheel and reinventing it slightly differently.  Perhaps this example provides the elements of the scenario we need to consider being able to cover.

Anyhow, these are thoughts hopefully to push the discussion forward. I am very interested to understand better how you would envision putting X3ML to service to create structure description/prescriptions.

 

posted by George on 19/5/2018

Just to give some actual references for the time span modelling example

 
Arches (me in this case):
 
 
Art History Data
 
 
Linked Art
 
 
MASA
 

 
I’m sure Wisski and ResearchSpace could also span more variants. Now questions come up also of wrong semantics, but after that we still have the problem of interoperability at the data level because of encoding choices even though we all adopt the same strategy and ontology.

 

posted by Christian Emil on 20/5/2018

A comment on mappings:


Assume one wants to make a profile for an ontology defined in the way CRM (or extentions) is. That is, the ontology is defined as an isa-tree with properties connecting pair of classes. The properties are inherited down the isa (from superclasses to subclasses. We may leave out the subproperty hierarchy, just notice that a subproperty is equal to its superproperty but has a domain and range which are subsets of the domain and rang of the superpropery.

In a profile definition the user select which classes should be used (or oppositely deselect the classes that are not of interest).  In either way some classes of the original ontology will not be a part of the profile.

1)    The deselected class is a leaf (without subclasses) and without properties. This cas is trivial

2)    The deselected class is a leaf (without subclasses) with only one property. The deselection of the class result in the deselection of this property.

3)    The deselected class is a leaf (without subclasses) with two or more properties. In this case the deselection of the class should imply a deselection of all the properties. However, this may destroy a connection the user wants to keep. An example is the path:  E21 Person  - P98 brought into life (was born)  -  E67 Birth - P96 by mother (gave birth) - E21 Person.  Without E67 Birth there is no connection between a person and her/his mother. To reestablish such a connection a new property has to be introduced, in fact a new short cut.

4)    The deselected class has one or more subclasses. Does this imply that all the subclasses also are deselected?

a.     Yes: This is somewhat brutal, but tidy. This may cause the same complications as described in 3 above.

b.    No: Then the subclasses become subclasses of the superclass(es) of the deselected class. Properties defined with the deselected class as domain or range will be defined for the subclasses. This should be unproblematic although one needs new names.

5)    Deselection of properties. This may cause the same situation as in 3. As a consequence the user has to define new shortcut properties.

 

Deselecting (deleting) classes and properties in a profile should be done with care as it may result in a (slightly) different ontology than the original. If so mappings to and from the new and the original has to be defined, hence 3XML. The new ontology will require a different shape in Shex or Shacl. Remember also that the current version of at least Shex cannot follow reasoning.​


 

In the 41st joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 34th FRBR - CIDOC CRM Harmonization meeting, the sig reviewed the comments about the announced homework for   tools that allows users to create data profiles using classes and relations of CIDOC CRM and local extensions and discussed about refining the meaning of application profile. By this is meant, picking out the classes and properties that one deems useful for a project/domain from the ontology and its extensions. This is useful for creating an easy way to handle the ontology for specific tasks. This is separated off from template management functionality. The sig came to the following conclusions:

It is decided to change the name of application profile to ontology profile. By ontology profile we mean a mechanism to denote CRM constructs collected from CRM and extensions that are useful for data entry and mapping in a certain domain.  This mechanism should have the following capabilities:

  • Allow the extraction of latest definitions of the respective classes and properties.
  • Automatically produce a list of super classes and properties needed for querying (in this profile).
  • Check validity with respect to updates on referred RDFS sources.
  • There may be a suggester to exclude properties from inheritance. If you have types in biological discourse, may not be interested in many of its inherited properties.

It is assigned to GB to try to connect this work above with  the idea of a template manager that will continue this work done to the data level.

Lyon, May 2018