Getting started with semantic queries

ESCG editor

Snow Owl includes an editor and execution environment for Extended SNOMED CT Compositional Grammar (ESCG) expressions. ESCG is a formal grammar to compose expressions that include operators and defined concept identifiers. It can be used for semantic querying. All of the operators and grammar constructs are supported as defined in the NHS LRA specification, which is itself an extension of the HL7 TermInfo specification. ESCG expressions are useful because you can query concepts by their relationships, as opposed to their human readable descriptions.

This chapter will help you to become familiar with ESCG by guiding you through a series of sample queries examples. These and other examples are also available in the B2i examples folder in the Project Explorer View.

Opening the editor and creating a query file

To create and edit ESCG expressions you can either use

Embedded ESCG editor

The embedded ESCG editor is called embedded because it is part of the Advanced Search and can be accessed through the search dialogue. To start the editor:

  1. Click the search button in the main toolbar to bring up the search dialogue. For doing advanced searches, please refer to the Search dialogue section.
  2. Choose the ESCG Query tab in the SNOMED CT Concept Search field to open the ESCG editor.

Embedded ESCG editor

Even though the embedded editor has the same query functions as the standalone editor, there are a few limitation:

Standalone ESCG editor

Before you can start using the standalone ESCG editor, you need to create a project and a file with an .escg extension.

To create a new project right-click somewhere in the Project Explorer to bring up the context menu.

Opening the New Project wizard

Specify the project name, then click Finish. Your new project will now appear in the Project Explorer view and you are ready to create a file with an .escg extention.

Specifying the file name with .escg extension

Double-clicking the new .escg file will launch the standalone ESCG editor.

Standalone ESCG Editor

Working with the ESCG Editor

Entering the script

Since the query editor is a text field, you can simply enter your query script. However, it's much easier to use the content assist:

To bring up the content assist menu, simply hit Ctrl + Space. The menu will only display selections that are valid, e.g operators that can be used at the active part of your query. The selection displayed by the content assist is based on the position of your cursor.

Content assist in the ESCG Editor

There are two different ways of adding a concept to your query script.

Tip: Any concept within the ESCG expression can act as a hyperlink if you hold down the Ctrl ( Cmd on Mac OS X) key and hover over a concept. Clicking on the hyperlink reveals the concept in the SNOMED CT Concepts view and opens it in an editor.

Colored Syntax, Validation and Quick fix

Apart from the content assist, we included a few more features that facilitate semantic querying.

Queries are validated instantaneously as you enter the expression:

By clicking on the Quick-fix icon on the margin of the editor, the incorrect ID can be easily fixed by picking the correct ID or description from the SNOMED CT store (see screenshot below). Please note that quick fix is only available in the standalone editor.

Validation and quick fix for incorrect concept

To query text is colored based on the syntax to make it more readable. The most important default colors are:

You can customize your syntax colors at the respective preference page with is accessible via the main menu File > Preferences > Snow Owl > ESCG Editor > Syntax Coloring.

Syntax coloring

Executing a query and saving the script

To run the query, click the Execute button on the main toolbar. If you are using the embedded expression editor of the search dialogue, you need to click the Search button to perform your query.

It might be useful to save your query script, so that you can easily update your search results when your release data are changing. Unsaved changes are indicated by a small asterisk next to the title. Just hit the save button in the main toolbar to save your query script. The embedded editor can only be used to execute queries, not for saving the script. If you are planning on saving your script, you should create an .escg file in the standalone editor.

If you want to create another query, you need to create a new .escg file as described in the previous section. There is also a copy and paste function available in the context menu, which might be handy if you want to re-use parts of the script for a new query. To bring up the context menu, right-click the file you want to copy and select Copy. Now select the folder you want to copy the new file into, right-click and select Paste. This will create an identical copy of your query script which you can for editing.

Results of the query

The results of your query will be displayed in the Search view where you can see the number of results and the execution time. You can filter the results in the text field on the top of the view or sort them by clicking on the top of the column (e.g. sort by ID). Use the context menu to bookmark your query results or to add them to a reference set.

Search View displaying the results of a query

Do you speak ESCG?

No? Don't worry, this section will give you a step by step introduction to the Extended SNOMED Compositional Grammar. If you already know ESCG and just want to have a brief overview, we recommend the summary at the end of this section.

The three basics: Operators, IDs and optional text

Let's start with a simple query that will retrieve all SNOMED CT concepts:

<<138875005|SNOMED CT Concept|

This expression has three different components:

When you run the query, you will get over 305.000 results, depending on what release of SNOMED CT you are working with.

For computing purposes, the concept ID (138875005) and the operator (<<) are sufficient. Text between vertical bars is optional, this means it will not be processed when running the query. In practice the name of a concept is usually added as optional text. This facilitates reading an expression, specially when you are working with several concepts.

Retrieving concepts and reference set members

As you already know, the << operator which will retrieve the concept and all of its subtypes. If you want to retrieve only the subtypes of a concept but not the concept itself, you have to use a the < operator. Try running this query and look at the number of results

<138875005|SNOMED CT Concept|

You should have retrieved one concept less than in the previous query because the SNOMED CT root concept was excluded.

These operators work for any SNOMED concept. If you want to retrieve all clinical findings, use this query:

<404684003|Clinical finding|

These operators are not restricted to the focus concepts of the expression, you can also use them at the relationship type refinements (e.g. <Associated with will search for concepts with the relationship type of After, Causative agent or Due to, since those are children of Associated with) or values (e.g. Structure of cardiovascular system).

The caret operator ^ will list the members of a reference set. Here is an example for retrieving the members of the Cardiology reference set:

^152725851000154106|Cardiology reference set|

Intersection and Union

The intersection operator (+) is used on the left hand side of the expression (before the colon) to combine focus concepts and provide a domain intersection, to which further refinements can apply.

This query will retrieve Clinical findings that are also members of the Cardiology refset.

<404684003|Clinical finding| + ^152725851000154106|Cardiology reference set|

If you want to do the opposite and get all Clinical findings except the members of the reference set, you would put the exclusion operator (!) in front of the part that you want to exclude.

<404684003|Clinical finding| + !^152725851000154106|Cardiology reference set|

The UNION operator combines the results of two queries. To find all Diseases and all Procedures, you would use the following query.

<<64572001|Disease| UNION <<71388002|Procedure|

It also works for combining queries that are more complicated like this one:

<404684003|Clinical finding| + !^447564002|Non-human simple reference set| : 
    363698007|Finding site| = <<113257007|Structure of cardiovascular system|
UNION
<<71388002|Procedure| + !^447564002|Non-human simple reference set| : 
    <<363704007|Procedure site| = <<113257007|Structure of cardiovascular system|

It retrieves all clinical findings and procedures that are related to the Cardiovascular structure and are relevant to humans. You can see that the Non-human reference set was excluded in both queries. Structure of Cardiovascular system was defined as the Finding and the Procedure site, respectively.

Exclusion

When you want to omit concepts or members of a reference set from your query, you can use the ! operator. It will exclude the concept behind it.

If you are only interested in concepts that are relevant to humans it's useful to exclude the members of the non-human reference set. To do this you would use the !^ expression, which is a combination of the ! exclusion operator and the ^reference set operator, it will omit all members of the reference set.

You can also omit a sub-hierarchy by using the !< expression. It will omit this concept and all of its children. For example, this expression will exclude the sub-hierarchy of diseases from your query results:

Now let's look at some examples to see how these expressions are used in context. You want to find all Clinical findings that are not a Disease

The full query is

<<404684003|Clinical finding| + !<<64572001|Disease|

It works the same for members of a reference set: Let's exclude veterinary concepts from clinical findings.

The full query is

<<404684003|Clinical finding| + !^447564002|Non-human simple reference set|

You can also use the exclusion to express negation, e.g. look for a set of concepts that do not have a particular relationship and value in their definition.

This query

<<404684003|Clinical finding|:
    246075003|Causative agent| =  !<<409822003|Bacteria|

will return all subtypes of Clinical finding that do not have a Bacteria causative agent. These concepts either do not have any causative agents at all, or they have causative agents that are other concepts but the Bacteria.

Refinement

The refinement operator (:) is usually used in combination with the attribute value operator (=). These operators are useful when you want to restrict a query to concepts with certain attributes. For example, you can look for all Clinical findings that have a Finding site relationship with the target concept being the Cardiovascular system.

<404684003|Clinical finding|:
    363698007|Finding site| = <<113257007|Structure of cardiovascular system|

Let's look at another example: You want to find all bacterial infectious diseases of the lung. You would query for:

Make sure to use the << operator to include the children of the lung structure.

The entire query looks like this:

<<87628006|Bacterial infectious disease|:
    363698007|Finding site| = <<39607008|Lung structure|

You can narrow this query to a certain kind of bacterial infection by adding the causative organism. In our example, we will use Streptococcus pneumonia as a Causative agent. To add an expression, just use a comma (,) as a separator. This query retrieves bacterial infectious diseases of the lung caused by streptococcus pneumonia.

<<87628006|Bacterial infectious disease|:
    363698007|Finding site| = <<39607008|Lung structure|, 
    246075003|Causative agent| = <<9861002|Streptococcus pneumoniae|

Let's do a more advanced query: How would you search for congenital autoimmune disorders?

<<404684003|Clinical finding|:
    246454002|Occurrence|  = 255399007|Congenital|,
    370135005|Pathological process| = <<263680009|Autoimmune|

AND and OR

AND and OR are always used on the right hand side of the expression, at the refinements. You can use them to specify the relationship targets for your queries, e.g.:

Let's take a look at a query that uses OR

<<404684003|Clinical finding|:
    116676008|Associated morphology| = <<56208002|Ulcer| OR <<118622000|Fistula| 

It is searching for all the Clinical findings, that have an Associated morphology relationship, which has the target of either Ulcer (or subtypes) OR Fistula (or subtypes). This allows you to extend your search results and specify more allowed targets for the same relationship type. It is using an union of the two relationship targets.

AND is used similarly, but while the OR broadens your search results, the AND narrows it by specifying an intersection of the two relationship targets. For example:

<<404684003|Clinical finding|:
    116676008|Associated morphology| = <<56208002|Ulcer| AND <<23583003|Inflammation|

This query is searching for findings that have an associated morphology relationship, which has a target that is both an Ulcer and an Inflammation (e.g. Ulcerative inflammations).

Summary: ESCG operators

Operator Function
<< Retrieves the concept and all of its subtypes
< Retrieves all subtypes of this concept, but not the concept itself
|text| Displays Preferred term of the concept to aid readability
^ Retrieves all the members of this reference set
+ Retrieves only concepts that are results of both expressions (intersection)
UNION Combines the result set of two queries
!^ Excludes members of this reference set
!<< Excludes this concept and all of its subtypes
!< Excludes this concept's subtypes
= Defines an attribute refinement, e.g. a finding site or a causative agent
: Refines an attribute range, operator is used in combination with an attribute
AND Used to express intersections of attribute ranges
OR Used to express unions of attribute ranges

<< Retrieves the concept and all of its subtypes

<<138875005|SNOMED CT Concept|

< Retrieves all subtypes of this concept, but not the concept itself

<138875005|SNOMED CT Concept|

|text| Include preferred term or description to aid readability

|SNOMED CT Concept|

^ Retrieves all the members of this reference set

^152725851000154106|Cardiology reference set|

+ Retrieves only concepts that are results of both expressions (intersection)

<404684003|Clinical finding| + ^152725851000154106|Cardiology reference set|

UNION Combines the result set of two queries

<<64572001|Disease| UNION <<71388002|Procedure|

!^ Excludes members of this reference set

!^447564002|Non-human simple reference set|

!<< Excludes this concept and all of its subtypes

!<<64572001|Disease|

!< Excludes this concept's subtypes

!<64572001|Disease|

= Defines an attribute refinement, e.g. a finding site or a causative agent

363698007|Finding site| = <<113257007|Structure of cardiovascular system|

: Refines an attribute range, operator is used in combination with an attribute

<404684003|Clinical finding| :
    363698007|Finding site| = <<113257007|Structure of cardiovascular system|

AND Used to express intersections of attribute ranges

<<404684003|Clinical finding|:
    116676008 |Associated morphology| = <<56208002|Ulcer| AND <<23583003|Inflammation|

OR Used to express unions of attribute ranges

<<404684003|Clinical finding|:
    116676008|Associated morphology| =<< 56208002|Ulcer| OR <<118622000|Fistula|