Issue 460: URI Management

ID: 
460
Starting Date: 
2020-01-16
Working Group: 
2
Status: 
Proposed
Background: 

Posted by Francesco Beretta on 16/1/2020

Dear all,

I have a question about CIDOC CRM URI management.

The last published version of CRMbase is 6.2.1. If I take the RDF serialization, I find this base URI:

http://www.cidoc-crm.org/cidoc-crm/

If I sent this URI in the web:

http://www.cidoc-crm.org/cidoc-crm/E92_Spacetime_Volume

I have an error message.

If I sent this URI in the web:

http://www.cidoc-crm.org/cidoc-crm/E5_Event

I'm dereferenced on verson 5.0.4.

The machine cannot know which version of CRM is considered.

I have then the Erlangen URI:

http://erlangen-crm.org/current/E92_Spacetime_Volume

dereferencing on a document of the whole version.

There are additional, earlier specific versions.

I have an issue in OntoME: which URI is to be used ?

We have a provisional, not dereferenced URI:

https://dataforhistory.org/external-ontology/cidoc-crm-base-6-2/E92_Spac...

It is there to avoid confusion but it's bad practice.

I'm asking myself what to do, and people adopting the CRM are asking me these kind of questions, beeing not happy with this situation.

I think there was already a discussion about this point in the SIG.

Shouldn't we find, and implement, a solution that meets current requirements?

The same issue is raised of course about the extensions familiy.

 

Posted by George on 16/1/2020

Dear all,

I agree that this is an ongoing issue that creates barriers to uptake because of confusion. It is an oft repeated question and deserves a clear answer. We need a solution based on community wide best practice. Suggestions?

Posted by martin on 16/1/2020

Dear Francesco,

At FORTH we will implement anything that is regarded good practice, and does not create a manual overhead we cannot manage. Volunteers to design whatever is needed? 

Posted by Robert Sanderson on 16/1/2020

Dear all,

I have a python script that already does this for CRM and the Linked Art extension.

The results of that script for Linked Art can be seen here:

https://linked.art/ns/terms/   -- the entire ontology is returned when dereferencing the namespace
https://linked.art/ns/terms/paid_amount.xml  -- an individual term is returned when dereferencing its URI

The script simply goes through the ontology files and cuts out each property and class in turn. Then a very simple redirect handler adds the mapping to the .xml files.

You can see the results for CRM in a temporary branch:

https://prov-updates--linked-art.netlify.com/ns/crm/P9_consists_of  -- P9 (but the rest of the data is there too)

Posted by Richard Light on 16/1/2020

On 16/01/2020 12:09, George Bruseker wrote:
> Dear all,
>
> I agree that this is an ongoing issue that creates barriers to uptake because of confusion. It is an oft repeated question and deserves a clear answer. We need a solution based on community wide best practice. Suggestions?

George,

It sounds as though there are a number of issues here:

    what is returned (HTML or RDF)
    which version of the CRM is returned (how to know which you have; how to specify which one you want)
    how much of the CRM is returned (one concept or the whole thing). If it's the whole thing, where are you placed within it
    whether to return a set of RDFS statements or an OWL ontology

As things stand, the Erlangen implementation returns an OWL ontology. This includes version information. By default (in my Firefox browser) Erlangen returns an RDF/XML response.  You are placed at the start of this document.  So you get the same response, no matter which CRM class or property you specified in the URL.

Our implementation returns a set of RDFS class and property statements. By default it redirects to an HTML response, and uses the '#' notation (supported in native HTML) to place you at the correct place within that web page. The version number is given, but only as a human-readable heading.  If you specify RDF/XML in your HTTP request (e.g. using curl), it redirects to RDF/XML and gives you the whole thing, again starting from the beginning.

I think our approach (human-readable response by default) is the better one, especially as you end up reading about the concept you expressed an interest in. It would be nice if the RDF response could also take you to the correct declaration - this would require the addition of IDs to each declaration, plus the addition of a '#' to the redirected URL. (? does this work for XML docs?)  The RDF response could also be improved by the addition of a machine-processible header, including version info and possibly links to the various versions that are available.

Which brings us to the question of supporting different versions. Erlangen have the concept of 'current', which at 6.2.1 is rather more current than we manage. However, I don't see any way of getting at earlier versions.  We could support a URL pattern:

http://www.cidoc-crm.org/cidoc-crm/[version]/E5_Event

which would allow users to explicitly state which CRM version they are conforming to.

If we can agree on a spec for improved RDF delivery, I would be happy to help implement it.

Posted by Detlev Balzer on 16/1/2020

> Martin Doerr <martin@ics.forth.gr> hat am 16. Januar 2020 um 13:27 geschrieben:
>
> (...)
> At FORTH we will implement anything that is regarded good practice, and 
> does not create a manual overhead we cannot manage. 

For formal specifications such as ontologies, there is a widely adopted pattern for change management which goes like this:

http://www.cidoc-crm.org/cidoc-crm/ always resolves to the latest version, while

http://www.cidoc-crm.org/cidoc-crm/{version}/ always resolves to the particular {version} given in the URI.

There can be any number of versions, and the latest one is both referenced through the un-versioned namespace and through the one with the most recent version number (or publication date, if that is used for versioning).

Alternatively, the most recent version could be labelled explicitly as the current one, e.g. http://www.cidoc-crm.org/cidoc-crm/current/

Application developers must then decide what kind of stability they prefer: stability of the namespace URI, or stability of the content retrieved from a URI. Evidently, one cannot have both.

Maintenance effort for this pattern is minimal: Just publish each new version under its versioned namespace and then, any time another version comes out, adjust the non-versioned namespace so that it will resolve to the most recent version. Most modern Web frameworks have a URL routing facility which makes this fairly easy.

I should not forget to say that LOD best practice also demands that URIs support content negotiation, as assumed throughout all recommendations in the http://linkeddatabook.com/

Posted by Velios on 16/1/2020

I agree with Detlev's proposal. Also, I believe that versions should not be included in the class URIs. These are not normally used to retrieve reasoning rules but only to identify classes, right? Resolving the class URI should return all versions of the class.
 

Posted by George on 17/1/2020

Dear all,

It seems a very fruitful discussion. Can I add some other 'complications' into it. 

Starting from what Detlev proposes:

    > For formal specifications such as ontologies, there is a widely adopted pattern for change management which goes like this:
    >
    > http://www.cidoc-crm.org/cidoc-crm/ always resolves to the latest version, while
    >
    > http://www.cidoc-crm.org/cidoc-crm/{version}/ always resolves to the particular {version} given in the URI.

This seems sensible. Here is a twist. 

If we click the first link, it brings us to CIDOC CRM 5.0.4 which is the last official ISO version.  In the meantime, we have a last official community version which 6.2.1. Which one should this be pointing to? Second, it points to the text version of the ontology in an html representation. 

For the appearance/presentation of the whole ontology, it is an html representation of the main document that we create. This seems fine. Would it be useful to be able to provide links explicitly at the top of this document to click over to encodings? This way somehow we can better consolidate and direct people to the RDF and the Erlangen OWL?

To me doing it this way, the Erlangen way, makes sense. So current always points to what current is (once we define what current is). It would also be good to be able to use the versioned edition (not currently supported but presumably easy).

Up to here we talk about pointing to the whole ontology representation.

Then there comes the question of resolving to an individual concept: http://www.cidoc-crm.org/cidoc-crm/E5_Event

As Richard points out, if you click it, it uses # and puts you to the right anchor point in the overall html document. Is this the best practice?

I will point out that on the CRM site, there is also an entire architecture wherein each version has its own overall presentation: e.g.: http://www.cidoc-crm.org/Version/version-6.2.1

and then you can click on an individual concept, eg: http://www.cidoc-crm.org/Entity/E5-Event/Version-6.2.1

The above follows a different URI pattern than suggested above, but is doing the same work. This is run on a database that also calculates incoming and outgoing properties, making the representation more full than one gets from the flat html versino of our word doc. Functionally, it can be argued it is more useful. Would it be possible to use this as the dereferencing point and stay within best practices? If the URI pattern were changed could we provide an easy was then to click over to the particular representation of the element in OWL, RDFS or other representations that exist for that version?

Finally to Thanasis' point.

"Resolving the class URI should return all versions of the class."

Currently we certainly don't do that. It definitely would not / could not happen based on our doc/html presentation of the ontology. With the database version I pointed to above, I suppose it would be relatively straightforward to have the older versions of a class you are looking at listed below as links. I guess it would be a specialist user who would care about this (not to put the idea down, just to say).

I hope these questions are a useful contribution to the conversation. 

Posted by Thanasis on 17/1/2020

> For the appearance/presentation of the whole ontology, it is an html representation of the main document that we create. This seems fine. Would it be useful to be able to provide links explicitly at the top of this document to click over to encodings? This way somehow we can better consolidate and direct people to the RDF and the Erlangen OWL?

Links would certainly be useful but the web server's content negotiation mechanism should be enough to deliver the right format to the client, is this what you mean?

> I will point out that on the CRM site, there is also an entire architecture wherein each version has its own overall presentation: e.g.: http://www.cidoc-crm.org/Version/version-6.2.1

I think this should be maintained but not used as URIs for classes.

> Finally to Thanasis' point.
>
> "Resolving the class URI should return all versions of the class."
>
> Currently we certainly don't do that. It definitely would not / could not happen based on our doc/html presentation of the ontology. With the database version I pointed to above, I suppose it would be relatively straightforward to have the older versions of a class you are looking at listed below as links. I guess it would be a specialist user who would care about this (not to put the idea down, just to say).

Yes I thought it should be relatively easy to do through a Drupal View. The point is that if there are no versions on the class URI, the user should be able to read about any version of the class given that they may be coming from a database using an earlier version than the current one. 

Posted by Robert Casties on 17/1/2020

Hi George,

On 17.01.20 10:47, George Bruseker wrote:
> I will point out that on the CRM site, there is also an entire architecture
> wherein each version has its own overall presentation: e.g.:
> http://www.cidoc-crm.org/Version/version-6.2.1

Wow, that is a really useful format, I didn't know it existed 

Especially having a concise list of all Classes and Properties and then
having all inherited Properties also listed with each class! That is
really useful when working on an implementation.

Sadly this format seems to exist only up to 6.2.2 

This is not exactly to the point of what is "right" to resolve the
default URIs to but as a documentation this is much more useful to me
than the reference PDF which is the only thing linked on
http://www.cidoc-crm.org/versions-of-the-cidoc-crm. Would it be possible
to have this format also for at least the latest version?

 

Posted by George on 17/1/2020

    Links would certainly be useful but the web server's content negotiation
    mechanism should be enough to deliver the right format to the client, is
    this what you mean?

My underlying assumption would be that the default thing served up would be html, but you could reach the other representation consistently through adding an appropriate ending or whatever would be most suitable... but that people looking at the html should have a shiny red button type clue that there is another way to retrieve the info which is for example as owl.
 

    > I will point out that on the CRM site, there is also an entire
    > architecture wherein each version has its own overall presentation:
    > e.g.: http://www.cidoc-crm.org/Version/version-6.2.1

    I think this should be maintained but not used as URIs for classes.

Why would you argue against using it as the resolving point for individual classes?  
 

    > Finally to Thanasis' point.
    >
    > "Resolving the class URI should return all versions of the class."
    >
    > Currently we certainly don't do that. It definitely would not / could
    > not happen based on our doc/html presentation of the ontology. With the
    > database version I pointed to above, I suppose it would be relatively
    > straightforward to have the older versions of a class you are looking at
    > listed below as links. I guess it would be a specialist user who would
    > care about this (not to put the idea down, just to say).

    Yes I thought it should be relatively easy to do through a Drupal View.
    The point is that if there are no versions on the class URI, the user
    should be able to read about any version of the class given that they
    may be coming from a database using an earlier version than the current one.

Currently this is not supported at all, correct? I mean you always point at a version. So you would suggest that 'current' should be 'versionless'? 

How I understood Erlangen to work is that it just makes the versionless URI redirect to the current. So I thought the idea would be that 'current' resolves to the present official (whatever the present official means). If a class has been deprecated then I guess it would have to revert to the last official in which it had existed?
 

Posted by George on 17/1/2020

Hi Robert,

Yes it is really quite nice actually. A hidden gem as it were.

About why it doesn't exist past 6.2.2, it's a bit odd. I would have said it is because it is only made for official release versions (like 6.2.1) but I see that it has been made for other non official versions. Perhaps it points to the need for a simple flowchart for understanding the steps that are taken in order to produce each version since there are many products (the word doc, the pdf, the html version in simple format, the database version drupal, the rdf, the owl etc.). 

We are aiming anyhow to make a new official release with the next SIG (fingers crossed) which I guess would probably entail updating the drupal resource to the latest state.

Posted by Thanasis on 17/1/2020

> My underlying assumption would be that the default thing served up would be html, but you could reach the other representation consistently through adding an appropriate ending or whatever would be most suitable... but that people looking at the html should have a shiny red button type clue that there is another way to retrieve the info which is for example as owl.

Yes, I agree.

>      > I will point out that on the CRM site, there is also an entire
>      > architecture wherein each version has its own overall presentation:
>      > e.g.: http://www.cidoc-crm.org/Version/version-6.2.1
>
>     I think this should be maintained but not used as URIs for classes.
>
>
> Why would you argue against using it as the resolving point for individual classes?

Because it includes versions. These are necessary when working across different versions but I do not think versions are needed for classes.

> Currently this is not supported at all, correct? I mean you always point at a version. So you would suggest that 'current' should be 'versionless'?

I am suggesting that classes do not need versions at all. Doing reasoning on a per class and per version basis would be bad practice, no? One would expect that the whole RDF/OWL representation would be used for reasoning. I think class URIs are only used as identifiers. This also avoids the problem of ensuring correct older versions for deprecated classes.

>
> How I understood Erlangen to work is that it just makes the versionless URI redirect to the current. So I thought the idea would be that 'current' resolves to the present official (whatever the present official means). If a class has been deprecated then I guess it would have to revert to the last official in which it had existed? 

Posted by Francesco Beretta on 17/1/2020

Dear all,

This very interesting conversation was up to now focusing on CRMbase. But what about the extensions family ? Often pointing from one extension to antoher ?

One major point for having machine actionable, consistent ontologies is to have a mechanism to point to the versions of each module (and base) to which a certain module version refers. This, as you know, to provide consistency.

One of the reasons for developing OntoME was to provide a way of easily integrating different modules and extensions. We added recently the possibility of having a rdf-owl export of a namespace and more will follow soon, I hope, to export profiles in OWL and probably soon SHACL.

The general vision for OntoME is to go from beta to MVP in summer, and at the same time go opensource so that the community can help improve the platform. And also integrate it, if desirable and desired, with the tooling at FORTH or any other platform.

I think we should discuss on a vision and rules for providing a robust, machine actionable integration of CRMbase and modules in general (i.e. platform independent). And to develop a commun platform providing versions integration and easy to use tooling for the community.

I raise this issue because I've heard expressing this need in the user community multiple times, and I wonder in which direction we should move, and I know developing such platforms is time-consuming, expensive and and causes headaches...

Posted by George Bruseker on 17/1/2020

    >      > I will point out that on the CRM site, there is also an entire
    >      > architecture wherein each version has its own overall presentation:
    >      > e.g.: http://www.cidoc-crm.org/Version/version-6.2.1
    >
    >     I think this should be maintained but not used as URIs for classes.
    >
    >
    > Why would you argue against using it as the resolving point for
    > individual classes?

    Because it includes versions. These are necessary when working across
    different versions but I do not think versions are needed for classes.

But is your objection to showing the data in the form that you see when you click this link (ie not a large html text and a pointer to the anchor) or to showing a version? 

I like the way that the link above displays an individual class and the functionality it gives to actually use the ontology. I don't know if it breaks good practice though. 

Re displaying a version, don't you always have to display a version? Even if you are displaying current, it is actually just the last official version.
 

    > Currently this is not supported at all, correct? I mean you always point
    > at a version. So you would suggest that 'current' should be 'versionless'?

    I am suggesting that classes do not need versions at all. Doing
    reasoning on a per class and per version basis would be bad practice,
    no? One would expect that the whole RDF/OWL representation would be used
    for reasoning. I think class URIs are only used as identifiers. This
    also avoids the problem of ensuring correct older versions for
    deprecated classes.

I think from a provenance point of view, given that the ontology is changing if one knew the version it could help one interpret the information in the future. I mean that if you made your data under version 4 when the intension of class x was of a certain size and now we widened it, then perhaps it affects how you used the ontology. I imagine this is a pretty sci fi scenario right now and nobody has this use case, but thinking of how things could shape up in a future world, I think it would be relevant. Actually even thinking about conversations in LinkedArt people get confused between versions. Why didn't you use property x? Well I was looking at version x and in that version class y doesn't have property x.

Anyhow if we had a workflow in which the structured data for classes and properties were edited first and from that the different products (doc, rdf, owl etc.) were generated then generating the versioned version would not be more overheard. Think it's a question of order of production of the documents.

 

Posted by Thanasis on 17/1/2020

Yes, if we have different URIs for each version of E5 Event, then this will complicate matters during implementation in local systems. If one wants to work out the difference in reasoning rules across the versions then they would need to refer to the whole document not each individual class. So yes to versions for the document URIs but no to versions for the individual class URIs.