Issue 347: Dimension and Data sets

Starting Date: 
2017-10-06
Working Group: 
3
Status: 
Open
Background: 

Posted by Martin on 21/9/2017

Dear All,

In connection with Issue 293, I propose to consider defining the relation between

Dimension and Data Set in CRMDig. We may consider basically any CSV file an instance of Dimension, i.e., a point in a multidimensional mathematical space in which each mathematical dimension has a real world meaning in terms of an observable property. Classical examples are digital images with RGB values on a 10M 2D pixel matrix, i.e.,  3 millions dimensions in one measurement result, taken in one process.

Then Dimension could be IsA Data Set.

This can help harmonizing the assigment of Dimensions following a data evaluation process with the general result of data evaluation.

Best,

 

In the 39th joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 32nd FRBR - CIDOC CRM Harmonization meeting, the sig discussed about dimension proposed  to be  revised  the Dimension considering that  data evaluation creates an approximation of a dimension. Also it was decided  to  propose a  better model about  how dimensions are related to values from measurements and from evalution. It was  assigned to  MD and  Steve to  find a conservation person . Thanasis should think about this.

Heraklion, October 2017

 
Current Proposal: 

Posted by Martin on 19/5/2018

Dear All,

May be we should relax the definition of E54 to either representing the approximation of a true quantity of a thing or phenomenon provided by a measurement, with all reservations to which degree the measurement measures what it is supposed to do, or a derived quantity computed indirectly from observation data comparable to reality, or a quantity produced by simulation of reality-like situations.

If we take for instance measuring the weight or length of an object, we know that it changes continuously, regardless whether within negligible margins or not. The indeterminacy/precision intervals given are those of the measurement and not those of the property itself. In that sense, we may abandon that the Dimension is the true quantity of the thing, but rather true measured.

In case of physical constants, such as the proton diameter (see recent literature), which is not property of a particular but may quite well be, we may talk about medium values from multiple measurements.

We can continue to include counting letters in a text, once this is based on comparing physical copies. A monetary amount is still a more tricky thing theoretically. It could be turned into paper money, but in case of bitcoins etc., it may never be. As a measure to compare social obligations, it pertains to a reality.

In case the value is a complex structure, the "unit" can describe the structure and elementary units of subfields.

A dataset may be composed of Dimensions, or be a Dimension as a whole.

I am in favor of taking a digital image or the results of a gene activation measurement array as a Dimension.

Comments?

In the 41st joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 34th FRBR - CIDOC CRM Harmonization meeting,  the sig discussed  Martin's proposal about the relaxation of definition of E54 to either representing the approximation of a true quantity of a thing or phenomenon provided by a measurement  or a derived quantity computed indirectly from observation data comparable to reality, or a quantity produced by simulation of reality-like situations and concluded. Closeness or overlap of given bounds of approximation are the means to reason on the compatibility with a common reality or item. Arrays and time series are regarded as “complex” dimensions.

The sig assigned to MD to re write the scope note of E54.

Lyon, May 2018

Posted by martin on 15/11/2018

Here my homework!
ISSUE 347

The sig discussed  Martin's proposal about the relaxation of definition of E54 to either representing the approximation of a true quantity of a thing or phenomenon provided by a measurement  or a derived quantity computed indirectly from observation data comparable to reality, or a quantity produced by simulation of reality-like situations.

Some comments are:

·       the phrase "An instance of E54 Dimension is specific to an instance of E70 Thing" in the scope note of P43 is too specific and it is not  compatible with the above approach. There are some consequences of this approach.

·       We should consider the properties of a physical thing.

A conclusion was that closeness or overlap of given bounds of approximation are the means to reason on the compatibility with a common reality or item.  Arrays and time series are regarded as “complex” dimensions.

The SIG assigned to MD to rewrite the scope note of E54.

Here it is, a bit lengthy I admit:

E54 Dimension

Subclass of:      E1 CRM Entity

Scope note:         This class comprises quantifiable properties of things or phenomena that can be measured by some calibrated means and can be approximated by values, i.e. points or regions in a mathematical or conceptual space, such as natural or real numbers, RGB values etc.

An instance of E54 Dimension may represent either the result of a measurement of a quantity of some things or phenomena, or the result of an evaluation of observation data indirectly determining an approximation of such a quantity, or the result of a prediction or hypothetical simulation of such a quantity in a scenario of a possible or alternative evolution of reality. Quantifiable properties of conceptual objects are observed on representative carriers. Not only dimensions of persistent items can be quantifiable, but also durations, spatial and temporal distances of instances of E2 Temporal Entities.

Most quantifiable properties change over time and have a continuous value space. Therefore, multiple observations will differ from each other, but knowledge about the precision of the respective determination method and domain knowledge about the changes to be expected between observations allow for reasoning which results of multiple observations are compatible with a common reality and can be used for further approximating the real quantities. For instance the weight of a person varies in the course of the day in a kg range. Wood expands under humidity some per cents, metal much less under increasing temperature. Instances of E54 Dimensions are specific to the way a property is determined. For instance, the length of a museum objects determined by the smallest rectangular bounding box is different from the maximum length of the object.

The properties of the class E54 Dimension allow for expressing the numerical approximation of the values of an instance of E54 Dimension. If the true values belong to a non-discrete space, such as spatial distances, it is recommended to record them as approximations by intervals or regions of indeterminacy enclosing the assumed true values. For instance, a length of 5 cm may be recorded as 4.5-5.5 cm, according to the precision of the respective observation. Note, that interoperability of values described in different units depends critically on the representation as value regions. For instance, 5cm is about 1.96850394 inches. 2 inches on the other side may be interpreted as something between 1.5-2.5 inches, which would be something within 3.5-6.5 cm.

Numerical approximations in archaic instances of E58 Measurement Unit used in historical records should be preserved. Equivalents corresponding to current knowledge should be recorded as additional instances of E54 Dimension as appropriate.

Examples:         

§  The 250 metric ton weight of the Luxor Obelisk

§  

§  The 5.17 m height of the statue of David by Michaelangelo

§  

§  The 530.2 carats of the Great Star of Africa diamond

§  

§  The AD1262-1312, 1303-1384 calibrated C14 date for the Shroud of Turin

§  

§  The 33 m diameter of the Stonehenge Sarcen Circle

§  

§  The 755.9 foot length of the sides of the Great Pyramid at Giza

§  Christies’ hammer price for “Vase with Fifteen Sunflowers” (E97) has currency British Pounds (E98)

Posted by Robert Sanderson on 15/11/2018

Some scoping questions, that might be obvious or might help clarify the discussion…

 

The National Fire Protection Association is responsible for the safety diamond, that has Health (blue), Flammable (red) and Reactivity (yellow) ratings for the sample or area, each of which is measured from 0-4.   This seems like a quantified numeric value with a particular unit.  Is that then a Dimension, like:

 

_:x a Dimension ;

  P2_has_type <health safety> ;

  P90_has_value 3 ;

  P91_has_unit <nfpa_unit> .

 

Color can be measured, but to be useful has three component parts. Pure red could be expressed as 256 red, 0 green, 0 blue.  A type of redness, a value of 256, and a unit of hexademical color proportion (e.g. as per https://www.w3.org/TR/2018/REC-css-color-3-20180619/#rgb-color or the standard for sRGB)

 

This would be most usefully expressed as a partitioning of an overall color dimension.  This would be a second example of dimension partitioning to go along with non decimal unit systems (feet and inches, etc.) or currencies (florins, etc).

 

Thoughts?

Posted by Martin

Dear Robert,

On 11/15/2018 8:11 PM, Robert Sanderson wrote:
>
> Some scoping questions, that might be obvious or might help clarify the discussion…
>

>
> The National Fire Protection Association is responsible for the safety diamond, that has Health (blue), Flammable (red) and Reactivity (yellow) ratings for the sample or area, each of which is measured from 0-4.   This seems like a quantified numeric value with a particular unit.  Is that then a Dimension, like:
>

>
> _:x a Dimension ;
>
>   P2_has_type <health safety> ;
>
>   P90_has_value 3 ;
>
>   P91_has_unit <nfpa_unit> .

I'd argue this can not be measured by calibrated means. hence it is not a Dimension, but a set of types, no matter that they are decorated with numbers.
>

>

>
> Color can be measured, but to be useful has three component parts. Pure red could be expressed as 256 red, 0 green, 0 blue.  A type of redness, a value of 256, and a unit of hexademical color proportion (e.g. as per https://www.w3.org/TR/2018/REC-css-color-3-20180619/#rgb-color or the standard for sRGB)
>

>
> This would be most usefully expressed as a partitioning of an overall color dimension.  This would be a second example of dimension partitioning to go along with non decimal unit systems (feet and inches, etc.) or currencies (florins, etc).

Yes, sure, valid. To be discussed in Berlin. Each custom datatype needs a syntax, and string function to take it apart. This should come from other communities that have solved the issue. Then we import this.

Reference to Issues: