Wednesday 31 July 2013

Now Fluent Editor matches not only your needs! Regular expressions.

Since version 2.3.17 Fluent Editor has new feature: regular expressions matcher.
This functionality allows to specify not only one, particular string as attribute, but also a whole set (or class) of strings defined by regular expression.



Regular expressions

Regular expressions have strong theoretical background in computer science and mathematical linguistics. These items are connected with Chomsky hierarchy of languages, and are equivalent to regular language (see Chomsky hierarchy).
About regular expressions and theirs syntax you can learn e.g. here.
Broadly speaking, thanks to regular expressions, you can specify pattern that matches one, many or even infinite number of strings. This is the easy way to validate for example e-mail addresses and phone numbers.

Regular expressions in Fluent Editor

Since Fluent Editor version 2.3.17 there is an ability to define string attributes as regular expression patterns. Both in ontology and questions. It is possible with new keyword:



that-matches-pattern


This keyword may appear insted of "equal-to" keyword. You can attach such patterned attribute to instance or concept or use it in a question.
Now it is the time for really quick "Hello World" to regular expressions in Fluent Editor.
Lets create ontology:



Tom has-name equal-to 'Tommy'.
Jerry has-name equal-to 'Jerry'.
Every-single-thing that has-name that-matches-pattern '.*rry' is a mouse.
Every cat has-name that-matches-pattern '(T|J)o(n|m){1,2}y'.
Max is a cat.
Max has-name equal-to 'Max'.


 In above ontology there are 2 regular expressions. First one ('.*rry') describes all strings that finish with 'rry'. So it matches 'Jerry' as well as 'blabla23424rry'.
The second one ('(T|J)o(n|m){1,2}y') fits to any string that starts with either 'T' or 'J', then has 'o', the next character is either 'n' or 'm' repeated 1 or 2 times, and then it finishes with 'y'. So all and only strings that are matched by this regular expression are the following: Tony, Tonny, Tomy, Tommy, Jony, Jonny, Jomy and Jommy.
Lets ask some questions. First one to warm up:


Who-Or-What has-name that-matches-pattern '.*ry'?

It returns only 'Jerry'. Why not 'mouse'? Because '.*ry' matches e.g. 'Kery', which is not matched by '.*rry', so Kery is not necessarily a mouse (but it can be a mouse).
The second question (is the first one slightly modified):

Who-Or-What has-name that-matches-pattern '.*erry'?

Now it returns 'Jerry' and 'mouse'.'Jerry' as a result is obvious. 'mouse' is obvious too, because every string that is matched by '.*erry' (finishes with 'erry') is matched by '.*ry' (finishes with 'ry').
The third question:

Who-Or-What has-name that-matches-pattern '[A-Z]+[a-z]*'?

The regular expression used above matches all strings that start with one or more big letters and then contains zero or more small letters.
This query returns 'Tom', 'Jerry' and 'Max' (which are obvious) and 'cat' (because if name satisfies '(T|J)o(n|m){1,2}y' it also matches '[A-Z]+[a-z]*').

Now it is a time for the tricky question:

Who-Or-What has-name that-matches-pattern '[TJonm]*y?'?
The regular expression '[TJonm]*y?' matches the string that contains zero or more 'T', 'J', 'o', 'n' or 'm' letters and may finish with 'y' ('?' indicates that there are zero or one occurrences of 'y'). So this regular expression matches such strings as: 'Tommy' or 'omnomnom'. It matches also all string generated by '(T|J)o(n|m){1,2}y'. So there is no surprise that 'Tom' and 'cat' are returned after execution of this query.
But why this query returned 'Max'? Admittedly, Max has name that do not match regular expression in the question, but the true is that all cats fits the query and Max is a cat. Because of Open World Assumption (see OWA), Max has another unknown name that matches regular expression in the query.

If you want to learn more about Fluent Editor CNL-EN grammar, visit this link.

*) FluentEditor 2, ontology editor, is a comprehensive tool for editing and manipulating complex ontologies that uses Controlled Natural Language. Fluent editor provides one with a more suitable for human users alternative to XML-based OWL editors. It's main feature is the usage of Controlled English as a knowledge modeling language. Supported via Predictive Editor, it prohibits one from entering any sentence that is grammatically or morphologically incorrect and actively helps the user during sentence writing. The Controlled English is a subset of Standard English with restricted grammar and vocabulary in order to reduce the ambiguity and complexity inherent in full English.