# Making Unstructured Text Data Usable with Semantic Parsing

If you’ve seen any technology publication in the last few years, you’d
know this one incontrovertible truth: the robots are coming, and there’s
no stopping them. Even if they’re not literal robots, AI is everywhere,
and it’s probably sitting in your living room already, taking the form
of a Google Assistant or Amazon Echo.

If you are one of the 50 million people who has such a device, you might
consider running the following experiment. Ask it the following
questions, posed by Steven Pinker in his 1994 book The Language
Instinct
:

The main lesson of thirty-five years of AI research is that the hard
problems are easy and the easy problems are hard…if you want to
stump an artificial intelligence system, ask it questions like: ’Which
is bigger, Chicago or a breadbox? Do zebras wear underwear? Is the
floor likely to rise up and bite you? If Susan goes to the store, does

My Alexa told me she didn’t know for three of them, and read me the
summary portion of the Wikipedia page on underwear for the other. Pinker
lays out the heart of the problem:

Understanding a sentence is one of these hard easy problems. To
interact with computers we still have to learn their languages; they
are not smart enough to learn ours. In fact, it is all too easy to
give computers more credit at understanding than they deserve.

Although Alexa can’t yet tell us whether zebras wear underwear, natural
language understanding techniques have actually come quite a long way
since 1994. Computational linguists and data scientists can make large
amounts of unstructured text data into machine-readable information,
which can then be used to answer questions about the apparel commonly
worn by equines.

Turning text data into usable information requires a technique called
semantic parsing. In semantic parsing, a sentence is analyzed and
broken into its component parts which are then labelled with the
relations between them. Then, those relations are interpreted and fed
into a database. For example, take a sentence like John gave a duck to Mary", which has been labelled with the parts of speech for each word. <pre><code>John gave a duck to Mary. NOUN VERB ART NOUN PREP NOUN </code></pre> At this point, there are two possibilities. One is to settle for a <strong>shallow semantic parse</strong>, and the other is to strive for a <strong>logical semantic parse</strong> (also called a <strong>deep</strong> semantic parse). A shallow semantic parse consists of using the words and labels to infer the <strong>semantic roles</strong> of the nouns in a sentence. You can think of a semantic role as whatjob“ a noun is doing in the meaning described
by the sentence. In the example, John is the giver”, <em>Mary</em> is therecipient“ and the duck is the “givee”. What semantic roles
belong with what kinds of verbs is highly idiosyncractic, so shallow
semantic parsing based on semantic role labelling often require looking
up the roles for the verb in a resource like FrameNet. Linguists have
come up with annotation schemes which capturing the best generalizations
across verbs, so the labelling would look like below.

John gave a duck to Mary.
Donor Theme Recipient


The semantic parse for this sentence could then be recorded in a
database as the following attribute-value pairs:

{
"Donor":"John"
"Theme":"duck"
"Recipient":"Mary"
"event":"give"
}


In this format, the formerly unstructured text gains structure which can
systems, to dialog systems, to automated construction of ontologies.

For certain applications, like turning natural language into
instructions for robots, a logical semantic parse is often necessary.
The goal of a logical semantic parse is to convert a natural language
sentence into its equivalent in a logical formalism such as first-order
predicate logic or an ontology format like RDF.

• First-order logic: give(John, duck, Mary)
• RDF:

<rdf:Description rdf:about="arg:John">
<ns1:give rdf:resource="arg:duck"/>
<ns1:give_to rdf:resource="arg:Mary"/>
</rdf:Description>


The advantage that logical semantic parsing has over shallow semantic
parsing is that it is a more complete and explicit representation of the
text, but it comes at the cost of much greater complexity and the need
for (possibly lots and lots of) labelled data. There are several ways of
creating such a parse; one way is to begin with a syntactic parse of the
sentence, and then identifying its principle parts – the subject,
predicate, and objects. Below is a dependency graph of the example
sentence created using the open-source Python library spaCy:

The dependency graph identifies the principle parts of a logical or RDF
representation by the dependency labels (ROOT, nsubj, dobj,
dative). The relations can then straightforwardly be translated into
an RDF or first-order logical representation.

If a shallow semantic parse is all that is needed, it is usually the
better option as it is simpler and often less error prone. Shallow
semantic parsing avoids facing one of the biggest problems in natural
language understanding – the inherent ambiguity of language – head-on,
while logical semantic parsing has to solve it or at least make informed
guesses about the correct meaning. Both techniques, however, are capable
of taking unstructured text data and converting it into a usable format.

Although AI is still a long way from possessing a conceptual structure
of the world that is capable of answering questions about the size of
cities versus breadboxes or whether hardwood floors possess fangs,
effective semantic parsing is a necessary step to parsing,
understanding, storing and retrieving unstructured text data.

## You might also enjoy

##### Don’t Frustrate Your Data Scientists (If You Want Them to Stay)

Data scientists are producing models that are making substantial contributions to the business, and thus more and more models are being used in production applications.  But as a result, data scientists face several challenges.

##### Off Grid CTO: Starlink in Action

Exciting news up here at the off grid cabin location, we now have actual real high speed internet! With Starlink service we have a true high speed connection, and it even uses less power than the Viasat installation did.

##### Investing In AI Doesn’t Need To Be A Leap Of Faith: How To Track Your AI ROI

Many enterprise leaders have taken a “Field of Dreams” approach to AI ROI — “If we build it, profits will come.” While it’s one thing to take that approach for smaller pilot projects, it’s another thing when you’re consistently being asked to fund initiatives with seven-figure price tags.

Get the Latest News in Your Inbox

##### Major Financial Company Leverages ModelOps to Maximize Value from AI Initiatives

Case Study Reveals Significant Gains from Model Operations Platform and Organizational Alignment.