Prefer XML Schema data types in @context to schema.org ones

aschrijver · June 7, 2022, 3:11pm

In the context of a discussion started by @trwnh on improving the JSON-LD @context for Lemmy I want to mention the finding we came to with regards to using data types, proposing that as a best-practice. Dedicating a separate topic for findability.

Best practice: In the JSON-LD @context use XML Schema data types to define JSON-LD value properties.

The JSON-LD 1.1 Data Model specifies:

A JSON-LD value is a typed value, a string (which is interpreted as a typed value with type xsd:string ), a number (numbers with a non-zero fractional part, i.e., the result of a modulo‑1 operation, or which are too large to represent as integers (see Data Round Tripping) in [JSON-LD11-API]), are interpreted as typed values with type xsd:double , all other numbers are interpreted as typed values with type xsd:integer ), true or false (which are interpreted as typed values with type xsd:boolean ), or a language-tagged string.

Here’s the W3C XML Schema Part 2: Datatypes Second Edition specification. The most used are the Primitive datatypes are :

3.2.1 string
3.2.2 boolean
3.2.3 decimal
3.2.4 float
3.2.5 double
3.2.6 duration
3.2.7 dateTime
3.2.8 time
3.2.9 date
3.2.10 gYearMonth
3.2.11 gYear
3.2.12 gMonthDay
3.2.13 gDay
3.2.14 gMonth
3.2.15 hexBinary
3.2.16 base64Binary
3.2.17 anyURI
3.2.18 QName
3.2.19 NOTATION

A number of federated apps, like PeerTube use(d) schema.org data types instead. One advantage is that schema.org offers a JSON-LD definition (but it is large with 1.4 MB). Disadvantages are:

W3C standards including all Linked Data standards (and ActivityPub too) use XML Schema Data types.
There are compatibility issues translating between one and the other, even for schema:Date that is more often used.

A mapping for going from schema.org to XML Schema data types hasn’t been created (the unresolved issue closed by a bot):

github.com/schemaorg/schemaorg

Schema datatypes duplicating XSD datatypes considered.

opened 08:07AM - 26 Oct 17 UTC

closed 02:09AM - 25 Jul 21 UTC

VladimirAlexiev

no-issue-activity

This issue is related to #1404, #1748 . #1715 lists https://goo.gl/5TPDw4 (by… @ericprud and @labra) which has @danbri's implicit endorsement. And it shows: ```ttl schema:description "A cool dataset"^^schema:Text ; schema:url "http://an.url.com"^^schema:URL ; ``` God gave us XSD datatypes to use and cherish, so it gets me really worried if Schema promotes its own unnecessary data types. - It's fine to say that most props can carry either a structured resource, or `Text` (that's part of Schema's flexibility), but it's not ok to suggest literals should be `^^schema:Text` rather than plain literals (which RDF 1.1 makes the same as `^^xsd:string`). - As for `URL` they should be URLs and not literals (I could never grok the utility of `xsd:anyURI`) What's wrong with inventing your own types rather than using XSD types? - first, it suggests disregard for prior art, which is just wrong - second, most RDF repos have special handling of numeric and datetime types. Eg GraphDB puts them in a literal index for fast searching, repos know how to do numeric operations, etc. If you declare your own datatypes `owl:equivalentClass` to well-known XSD types, perhaps **D-Entailment** would permit such special handling, but I know of no repo to implement this. In addition, binding props like modelDate, productionDate, availabilityStarts to datatypes of specific granularity like `schema:Date` and `schema:DateTime` is harmful because it doesn't always reflect the granularity of available data. In #1748 @RichardWallis argues (imho tenaciously) that these also permit other granularities (eg mere year, which corresponds to `xsd:gYear`) and are basically the same thing. Suggested actions: 1. explain that most Schema datatypes have only organizational purpose but should not be used in actual data 2. document a mapping to XSD as requested by #1404 3. recommend that actual data should be tagged with the most specific XSD datatype, as per [this comment](https://github.com/schemaorg/schemaorg/issues/1748#issuecomment-333153156) 4. make `schema:DateTime` a union of the 5 related XSD datatypes (see the comment) and kill `schema:Date`. Not sure how to make a "union", probably just explain it verbally 5. remove any datatypes from schema's JSONLD context to ensure that data is not mis-interpreted (see #1748). Map `URL` to `@id` 6. review all examples and make sure they use XSD datatypes