Wednesday, 28 October 2015

Ask Data Anything - Election results example

In modern organizations, data management is a major issue and at the same time a major resource. In our experience, the first challenge a business that wants to use its data is facing how to have a unified view of their data. Generally data inside organizations is stored in different databases that have often proprietary API making it difficult to move from one database to the other. Furthermore, also when the technology used to store data is the same, there are still semantic problems like different terminologies, languages etc.


The bigger the company is, the lower the possibility to standardize the procedures are, so that these kind of situations will not happen. This happens because we are human and we naturally tend to interpret data using our own experience and knowledge. Thus we cannot expect the technical team to call all pieces of a car using the exact same terminology as the logistic department. This is why, our solution aims at giving the possibility to standardize the way in which the end user interact with the data without actually changing the source of the data.

Ask Data Anything (ADA), allows companies to add a semantical layer on top of the data without the need of copying data. The product is managing term disambiguation, aggregation of data using hierarchies defined in ontologies, data integration between different data sources.

This article presents usage of ADA tool with sample data about election results in Leeds borough in 2014, coming from DATA.GOV.UK (http://data.gov.uk/dataset/election-results). It presents example operation which can be performed over data: firstly - summarizing results, then aggregation and finally projection. All sections include example queries and results view. Overall knowledge, consisting of information form data file (which can be e.g. CSV file) and taxonomy in form of CNL ontology, allows ADA to give answers for statistical queries like parties' results in different regions as well as results for specific candidates. ADA is capable of handling queries involving time, location and both predefined and user's concepts.


Summarizing data

One of the possibilities is to ask ADA to summarize parties' general results and present them on pie chart. Query result's computation is based on data contained in data file and semantically modelled taxonomy. Thanks to that knowledge ADA's engine can recognize concepts included in query and perform aggregation on demand.

Example query:
Summarize party by result on piechart



Performing aggregation

There are many possible types of output for each result: table, histogram, pie chart and map. It is also possible to perform different types of aggregation on numerical values, like sum or average. Picture below presents overall sum of votes for each party in form of histogram:

Example query:
Sum votes by party on histogram


Limiting the aggregation area

Domain of operation can be limited to specified area, declared in taxonomy. Here we present the result for a question similar to previous one, but where votes from Kirkstall borough only are used. This time output type was set to the table, which is default option.

Example query:
Sum votes by party in Kirkstall



It is  possible to combine restrictions to get answers to more sophisticated queries. Once again one can ask for votes given to parties, but this time the expected outcome is the average result in two boroughs: Kirkstall and Beeston.

Example query:
Sum votes by party in Kirkstall and Beeston on histogram


Performing projection over data

Another way to manipulate queries' meaning is by using "with" keyword. Clause added after "with" allows to filter output to contain only specific results, as in query presented below:

Example query:
Surname in Alwoodley with votes > 1000


Besides strictly statistical data, one can also ask for more specific information. Example presented in picture below shows all parties which candidates received mandates in Otley borough.

Example query:
Party in Otley with mandate > 0



Summary


Ask Data Anything supports data exploration by applying semantic layer on top of the raw data, which allows to execute analysis without explicitly stating all information. It enables to perform variety of operations, like aggregation or projection, over data set when needed. ADA allows to formulate queries in intuitive way using natural language and present results in convenient for users form (on table, histogram, map or pie chart). Expanding taxonomy makes it possible to ask more complex questions and extends knowledge base with minimal effort.

References





2 comments:

  1. More and more small and medium businesses depend heavily on data. That data might be stored on various servers and/or computers with sensitive information about clients, customer orders, financials, sales records and more.
    top virtual data room providers

    ReplyDelete
  2. This kind of method has limited aggregation results so it is not reliable when it comes to perfect value. It could not give us exact method of information, so when it comes to data, I used UKessays.com review since it has efficient information.

    ReplyDelete