ELD Project Summary

Linked Data and Enterprise Environments[bewerken]

Linked Data (LD) is about searching, sharing and linking information from various sources and different contexts. Data originating from organisations all over the world can easily be shared over the internet with LD technology. On a smaller scale, LD can also be used within one and the same organisation, typically in analytics and knowledge processes. However, the majority of enterprise business processes is of administrative nature. To support data quality, adequate staff capabilities and regulatory compliance , administrative environments rely heavily on fixed predefined data structures. Typical application of LD technology, with its flexible relations between data elements, doesn’t seem to fit well in these environments. Nevertheless , we see added value using LD technology to support information sharing and integration within administrative enterprise business processes. We will call this Enterprise Linked Data (ELD).

Benefits[bewerken]

While established best practices like web services, ESB’s and ETL cover all current business needs in this field , we think LD techniques can provide the following benefits. 1. data quality of end-to-end processes (especially in consolidation and M&A situations) 2. flexibility of the sharing/integration infrastructure (meaning lower TCO)

Requirements[bewerken]

As mentioned above, the context of enterprise business processes differs from the setting of usual LD application. Figure 1 describes our model of the ELD context.

Figure 1.

The aim of our ELD project is a proof of concept of this model using LD techniques, taking into account the following typical enterprise requirements. a) Frequent data updates (Δ data). b) Data availability within Host 2 must be independent of availability of Host 1 and the Network. c) Data presentation (model ‘) has a predefined structure and is fixed (or: has its own life cycle).

On the other hand we require the implementation to provide some typical LD advantages. d) Data semantics can be defined by world wide standards or by self descriptive data and can thus be supported independently of possibly unavailable, incomplete, outdated or inconsistent documentation. e) Channel is transparent for model changes (Δ model), i.e. no channel reconfiguration or deployment efforts are needed. f) Integration of additional data sources (figure 2b.) is possible without channel reconfiguration or deployment efforts.

ELD Model[bewerken]

For sake of simplicity our model is unidirectional: some business logic supporting a business process provides a stream of (transactional) data updates in triple format. This stream is communicated to another hosting environment where the data is read by a user. (Of course, at the same time, this user him/herself might also generate data updates that would then be passed in the opposite direction. We expect such an extension of our model to be straightforward.) Noitice that by requirements a) and b) above, implementing our EDL model will essentially have the character of (yes) a replication exercise. Replication of LD, while not very common in typical LD environments, could be expected to greatly enhance fexibility of enterprise information sharing and integration infrastructures. However, the subsequent transformation of this information to predefined models in the business domain (often using XML schema definitions to validate incoming data) might just move change efforts (Δ model) from the interface domain to the business logic domain. Therefore we include in our model the representation of the received information to a business user that expects information elements to be predefined and clearly identified, having known labels, identifyers and cardinality (are there zero, one or more of these items in this context?).

Focus Areas[bewerken]

We see 4 distinct focus areas in our ELD model (see figures 2a and 2b). 1. Registering business transactions in an LD environment - time dimension - versioning - transaction management 2. LD “replication” channel - push or pull replication: event detector at the source or polling query at the receiver? - guaranteed delivery - resynchronisation on request

Figure 2a.

Figure 2b.

3. Predefined UI structures - LD transformation to predefined reports/forms - as an alternative to predefined reports/forms: flexible content windows (with specific labels etc.) - update forms (read/write) 4. Integration of multiple channels - semantic integration - deduplication - master data management

In the proof of concept of the ELD project we focus on areas 2 and 3, leaving areas 1 and 4 for the moment to mainstream LD discussions.

Deliverables[bewerken]

Talking deliverables, in the ELD project we plan to provide A. working prototype of a LD “replication” channel B. vision on requirements and possible solutions for transforming flexible LD ontologies to predefined data structures like forms and reports

A. working prototype of a LD “replication” channel[bewerken]

In the PoC we’ll work on the following components • Triple store 1 in hosting environment 1 • Some front end to enter data in triple store 1 (A-box and T-box assertions) • Some way to simulate frequent data updates (say new triples every X seconds) • Triple store 2 in hosting environment 2 • Some front end to query the data in triple store 2 • A network channel between triple store 1 and triple store 2 • Either

  - some “event detector” on triple store 1 (figure 3a), being able to detect new triples, perform some filtering on them, and push them through the channel into triple store 2, or
  - some “polling query” (figure 3b) that periodically queries triple store 1 to find out what new triples were added there since the last poll. The results are then added into triple store 2.

• A “synchronisation service” (figure 4) that is provided by triple store 1 and can be invoked by triple store 2. This service will send the whole content of triple store 1 (after some filtering) to triple store 2.

Figure 3a.

Figure 3b.

Figure 4.

B. vision on requirements and possible solutions for transforming flexible LD ontologies to predefined data structures like forms and reports[bewerken]

We want to provide a vision that supports the hypothesis that changes to the business logic (and data model) on the sender side can be assimilated by only adding adequate T-box assertions and/or changing the UI queries on the receiving side. More specifically: can we illuminate the plausibility of the following scenario.

Suppose that the business logic in our model (figure 1) would be changed. This could be a small change like only the update of the label of a single attribute, or a major business process change, including replacement of supporting IT systems. In our model we’d expect this change to result in the business logic providing different A-box assertions (Δ data) than before, and (by some external intervention) some additional T-box assertions (Δ model) to be added.

By the “replication” channel of deliverable A, both new data and new model/ontology are pushed to the receiving end, this without changes to channel or triple stores. On the receiving end, only the transformation to the predefined data structure of the business layer/UI has to be adapted. This can be done by adding adequate T-box assertions and/or changing the UI queries. Although we assume that this will take human intervention, both updating the UI query as generating the necessary T-box assertions might be an end user activity (no IT involvement). And one day even the result of a AI-algorithm parsing the incoming Δ model information.

Your Input[bewerken]

Please don’t hesitate to point out your suggestions for improvement of the above model and/or project set up. If you have any ideas or suggestions as to the implementation of areas 2 and 3 (references or links to architectures, techniques, methods or products), that ‘d be very welcome.

You can send your input to:

Joep Creusen - info@opendataarchitectuur.nl
Pieter van Everdingen - info@openinc.nl