Creation of new vocabularies

Sometimes there are no existing vocabularies available for a specific domain, or they do not comply with the review criteria described above, therefore, one may decide to create a new vocabulary. In this case, it is necessary to use best engineering practices for modelling linked data in order to guarantee quality by design, and use proper advertising strategies to stimulate the adoption of the vocabulary in the LOD community. The main guidelines for creating a new vocabulary can be summarized in the following criteria:

  • Define a clean and stable URI using a careful URI naming strategy. More details on these strategies can be found in the Step 4 and the Linked Data Cookbook.
  • Choose the proper language to model your vocabulary depending on your purpose. For example, SKOS is suitable to model lists of terms, such as controlled vocabularies, taxonomies or thesauri. RDF allows to represent data models as objects (web resources) and relations in terms of (subject, predicate, object) triples, while RDF Schema extends RDF for describing properties and classes of RDF-based resources. OWL provides more primitives to describe properties and classes , and axioms to constrain the usage of these properties and classes, allowing a higher degree of semantic reasoning.
  • Make your vocabulary self-descriptive using at least a label, a definition and a comment for each class or property that is defined.
  • Provide documentation, not only machine readable, but also human readable, together with basic metadata that allow others to correctly understand and properly reuse your vocabulary. In this respect, a best practice consists in publishing a VoID description to describe key metadata of the schema or dataset being created, as described by W3C.
  • Provide a versioning policy to show commitment to possible users that you as publisher will take care of changes in the vocabulary and adapt both human and machine readable versions of the vocabulary accordingly.
  • Publish the vocabulary at a stable URI using an open license following best practices for publishing and advertising, as described in the Linked Data Cookbook.

More guidelines on the process of creating a new vocabulary can be found in this Blog. Setting up a new domain vocabulary has much in common with what traditionally was called defining a new semantic data standard for an industry domain. Both are a group process, and both results, the vocabulary and the semantic standard need to be maintained and updated. See BOMOS for an overview and detailed description of all activities needed for the management and maintenance of open standards. One might even argue that some semantic standards will be published as vocabularies in the future.

Ontology links can be specified using owl:subClassOf or owl:equivalentClass relations in the ontology itself, or in a separate mapping ontology that imports both the ontology of the original dataset and the ontologies one wants to map to. Such mappings can be exploited by a reasoner attached to the triple store to derive additional links between the data and the more general ontologies. In this way, a user that does not know the original ontology can query the dataset using the more general ontologies.

Once you have modelled your data by either re-using existing vocabularies or by creating new vocabularies the next step is to define a naming structure for your dataset which makes it uniquely identifiable.

Go back to Step 3