Tuesday, 19 May 2015

Ask Data Anything

Ask Data Anything is Cognitum's approach to exploring data by using a subset of natural language which articulates concepts and instances modeled in ontologies to provide a meaningful quering experience. Ask Data Anything seizes on regularities of language to provide a natural interpretation of queries being asked; its semantics are provided via R and rOntorion (alternatively  F# and Ontorion).

Technically, Ask Data Anything is capable of performing projection, sub-setting, grouping and aggregation operations, providing answers for queries involving the following information:
  • What? Any of the columns of your data table are considered a quantitative field over which to perform queries,
  • How? How the output is to be shown. The results of the query can be retrieved on either a table, histogram or a map,
  • Where? (Optional) The "in" preposition allows to restrict the search to an specific named group of items  as happens for instance with continents which can be seeing as a group of countries,
  • Of? (Optional) The "of" preposition allows to dive into the data, restricting the desired results to a certain set of types (concepts in the Fluent Editor sense) by searching the data in a certain column for instances (in Fluent Editor sense) of those types; we call this material sub-setting,
  • By? (Optional) By which type (in Fluent Editor sense) you would like to group the results for aggregation purposes.
  • When? (Optional) Queries can contain time constraints.

Inside Ask Data Anything

Ask Data Anything consists of the following blocks:


Data and Models are tightly coupled, for models provide an interface to query the data and everything aimed to be queried needs to be modeled in order to provide an appropriate interpretation.

The ontology modeling adds additional semantic layers on top of your data, expanding the search dimensionality and providing in turn an insightful querying experience.

Operational Semantics

Next we are going to briefly describe the 4 types of exploring modes available: projection, aggregation, sub-setting (Circumstantial) and material sub-setting (Conceptual).

For demonstration purposes, let us take the data represented in the following table as our sample dataset:

Transaction Item Price Quantity City Date Trademark
T-001 Sleeve Shirt 543 11 Warsaw 03/07/2013 Lacoste
T-002 Men's Dark One Button Suit 1395 15 Krakow 07/12/2013 Armani
T-003 Solid Polo Shirt 580 18 Krakow 17/08/2012 Gucci
T-004 Men's Mallow Graphic Tee 163 9 Warsaw 19/03/2013 Nike
T-005 I'm Bob Graphic Tee 73 5 Berlin 01/02/2014 Zara
T-006 21 Years Old Women's Dark T-Shirt 386 7 Alicante 22/12/2012 Armani
T-007 Hanes Men's Comfortblend EcoSmart Jersey Polo, 2 Pack 425 11 Berlin 14/03/2013 Chanel
T-008 Men's Short Sleeve Stripe Polo 820 7 Boston 05/09/2014 Tommy Hilfiger
T-009 Women's Button Down Roll Tab Shirt 244 12 Munchen 29/08/2014 Lacoste
T-010 Men's jeans 184 17 Munchen 06/02/2012 Nike
T-011 Men's Geometric Print Short Sleeve Shirt 975 12 Alicante 23/08/2014 Armani
T-012 Men's Sasquatch Hunter Graphic Tee 180 10 Boston 19/11/2012 Gucci
T-013 Women's Essential V-Neck Tee 147 3 Madrid 21/07/2013 Nike
T-014 Men's Bass Guitar Guy Graphic Tee 86 2 Krakow 22/03/2012 Zara
T-015 Men's Essential shirt 754 2 Boston 01/04/2013 Zara
T-016 Women's Scoopneck Tee 2-Pack 448 6 Munchen 26/07/2012 Chanel

 What's inside?

To start exploring the possibilities, it is always useful to know what is inside:


The dimensions are the columns of the data (quantitative fields), the possible operations are sum and averaging and the outputs are histogram, table and map.

Projection

This identity operation allows for projections over the data, retrieving subsets of it meeting certain requirements expressed through mathematical expressions.

Example query:

Item with price > 700


Aggregation

Aggregation is performed over hierarchical data, modeled in ontologies through typed instances (instances of concepts) related by either an "making part of" property or the ordinary time embedding  i.e., days as part of months and months as part of years.  This way instances of the concept country are related to the concept continent as "Every country is part of a continent".

Example query:

Quantity summed by month on histogram:


Sub-setting (Circumstantial sub-setting)

Sub-setting allows to retrieve a subset of the data by a (circumstantial) belonging relation. This means, we can ask about the specific results in a given country or modeled group of instances: in this latter case we can constraint the result for groups of brands categorized by origin, i.e, Spanish, American, etc.

Quantity summed in Europe by country on map:


 Quantity summed in Spanish-Brands:

Material sub-setting (Conceptual sub-setting)

Material sub-setting allows sub-setting by diving into the data properties as modeled in the provided ontology. This features allows us to make pretty expressive queries as:

Quantity summed in Europe for item of (type) t-shirt:


By default the aggregation is performed in the target type that is marked here by the query sub-part "in Europe", which subsets the data using as discriminant a continent, so it returns the result of quantity for cities (which is the type in the data and is a part of continent). This behavior can be modified by adding the "by" part, as in "by brand", which would retrieve the aggregated sum of  the quantitative field Quantity by brand (Lacoste, Armani, etc.).

You can go further and make a consistency check by retrieving all t-shirts from the Item column:

Semantic Modeling

The key feature offered by Ask Data Anything is adding additional semantic layers on top of the data (which are not explicit in the data itself) implying an increment of its dimensionality, which enhances the possibilities for data exploration.

This way we are capable of asking queries as Price averaged in French-Brands, with French-Brands being a modeled instance of some "brands-by-origin" concept, which adds a grouping abstraction over the (modeled) brand instances that is not otherwise present in the data. Hereby we have models for the brand instances "Chanel" and "Lacoste" and therefore the averaging would be performed over all occurrences of this 2 brands in the data.

In summary Ask Data Anything can handle queries involving:
    1. Location
    2. Time
    3. User-defined concepts
    4. Instances of predefined concepts
Extracting this information from queries to perform the appropriate action chain: any (mix) of projection, sub-setting, grouping or/and aggregation.

Watch quick overview of Ask Data Anything: https://goo.gl/XnaIq3
To learn more about Semantic Technologies visit: http://goo.gl/7pkWIQ


1 comment: