Friday, 9 October 2015

Using RDF Data Cube Vocabulary to model sales data with Fluent Editor - Example.

RDF Data Cube Vocabulary is a way to represent data in popular format with link data paradigms. Linked data is an approach to publishing data on a web and this vocabulary makes it possible. There are numerous benefits to linked data. The individual observations, and groups of observations, become (web) addressable. This allows publishers ad third parties to annotate and link to this data. For example a report can reference the specific figures it is based on allowing for fine grained provenance trace-back. Representing any data set with these benefits is now possible with Controlled Natural Language in Fluent Editor and it has never been so easy.

In this article you will learn:

  • How to represent data set as an OLAP Cube when there is one or several measures included.
  • How to slice the cube with the help of RDF Data Cube vocabulary in Fluent Editor.
  • How to solve problems when there are multiple measures.
Using OLAP Cube with Fluent Editor and RDF Data Cube Vocabulary.

    Follow along: Single Measure Example.
    Let's introduce set of sales data.



    It is great array to analyse since it shows correlation between spending on advertisements in TV or the Internet. Therefore, column 'Online_Ad_Spend' and 'TV_Ad_Spend' can be thought of both: measure and dimension (but not in the same time).
 


    First we declare data structure and then we define it (first line). Definition of dimension and measure, according to official documentation, requires us to declare some blank nodes (Dimension-X and Measure-1 where X is an element from set {1,2,3,4}).

    It is clearly seen that we use references to RDF Data Cube Vocabulary by calling '[qb]'.

    We define dimensions as:

  • Ref-Week is the 'Week' column (dimension)
  • Ref-Store-Id is the 'Store id' column (dimension)
  • Ref-Tv-Ad-Spend is the 'TV_Ad_Spend' column (dimension)
  • Ref-Online-Ad-Spend is the 'Online_Ad_Spend' column (dimension)
  • Ref-Sales is the 'Sales' column (measure)

    Previously we said that we have a data structure but why? We have to define our data set and say that it has the same structure as we defined earlier. (fig-5)
Now we are defining slice key and correlate it with data structure.



    So we can say that our Week dimension is a slice key.


     The very first five lines are used to define that our data set really has components which has dimension which are indeed dimensions. But we need to do it explicitly.

     Then to each slice we assign observation. (O-XY where XY is two-element subset of a {0,1,2,3,4,5,6,7,8,9}).



     It is very important to insert sentences in fig-8 because of Open World Assumption paradigms.



    As we can see we assign to each slice a value of a dimension (week) and to each observation (mapped to proper slice) we assign values of dimensions we look at (there's only one for each dimension because of this specific data array).

    After that we assign to the observation the value of what we measure. (has-ref-sales)
How to interpret the results? Each slice just shows what is our sales value in respect to values of spending dimensions. Let's say we want to see how well the company did in a certain week.



    We see that Slice-18 contains corresponding week and it has one observation called O-181. With this information in mind let's ask again.



    As we can see Fluent Editor provides necessary tools to comprehensively analyse data.

But what if I treated spending and sales as a measure?

    This approach would allow us to look at data differently. We would perceive the data as if it was a function of time (week) and it would map to spending on advertisements on tv , spending on advertisements on the Internet and sales.
  
    We would need different approach if we were to do that. Follow along: Multiple Measure First Example

Handling multiple measurements.


    As we can see from the picture above all we did is define three measurements in our data structure. 




    Again the very first four lines are just to make sure we do everything with Open World Assumption paradigms. Then we proceed as in previous examples (O-X is an observation where X is a one-element subset od {1,2,3,4,5,6,7,8,9,0}).
but we simply make three measurements. However is this a good way to do it?

It is not possible to attach an attribute to a single observed value

We need to understand that these measurements are applied to the whole O-X observation. The best way to use it would be if the measurement would be for e.g indication who made the observation. Therefore, it is better to use a different approach.

Multiple dimension.

    There is a very clever way to solve the problem described above. We need to define an abstract dimension which values would be different measurements.



 As we can see we defined three separate measurements, however in addition to our components abstract dimension was defined ('Dimension-Special'). We used measure-type as an abstract dimension which automatically gets measures declared in data structure definition.


    As shown in the fig-13 we have defined observations and we can see that for each observation there is only one measure. Our problem is solved.

    How to interpret our cube? It is single dimensional therefore we can think of it as a sliced snake which has several values assigned to each slice.

Summary

    With Fluent Editor it is possible to use RDF Data Cube Vocabulary to describe any data set in natural way. Not only is it easy but also provides good foundations for our data set so it can be easily linked, annotated and referenced. For instance, if you are working in scientific community this is the way to publish data since it can be cited in great number of articles. 
    RDF Data Cube Vocabulary combined with Fluent Editor provides excellent, easy to use tools so you can benefit from them.

No comments:

Post a Comment