BIMA X Microsoft Masterclass – Language

By BIMA
18 Jun 2020

The language capabilities of AI are advanced. And every day they get smarter still. In this session, Phil Harvey, Senior Cloud Solution Architect for Data & AI in One Commercial Partner, Microsoft UK and Lydia Gregory, CEO & Co-founder of FeedForward AI explored the state of language understanding and text analytics. Then they considered the what, why and how of putting Microsoft Cognitive Services to work in your bots, translators, apps, readers and devices. These were the key points from an enthralling webinar:

The Meaning Barrier

How do you give a machine context?

John Searle said, “A digital computer executing a program cannot be shown to have a ‘mind’, ‘understanding’ or ‘consciousness’ regardless of how intelligently or human-like the program may make the computer behave.”

Meaning makes the whole thing hard. A computer has no mind and no meaning. But what we do have – in vast quantities – is data.

Yet when people started to explore data to find meaning, what they found instead was human bias. Language data unavoidably contains bias – which means we need to decide what to do about that. It’s one of the fundamental problems in using language.

Analysing language

Always you need to treat language like any other data source: you need to know about the data before you try to use it. So how do we analyse language so we can use it as a dataset? Azure is Microsoft’s machine learning and AI platform and language is a core part of it.

Text Analytics (TA), for example provides the meta data that reveals keyphrases, named entities, linked entities, sentiment and more in each piece of content. It’s built on a deep learning tech but the API lets users access key insights without having to do the deep learning themselves.

TA demonstrates that you don’t need to jump to deep learning too early. Looking at word clouds and counting sentence length, analysing parts of speech and relating analysed data to other things is all valid and useful for understanding the data you’re working with.

Other things you can do to analyse language:

Deep learning: Microsoft’s NLP Recipes Repository has lots of models and methodologies to enable you to harness deep learning. It can enable you to do anything from text classification to sentiment analysis. But one of the areas of greatest interest right now is text summarisation, which shows a fascinating split between generative summarisations (i.e. where the algorithm is used to generate a new piece of text to summarise the old piece of text) versus reductive summarisation, which takes the text of, for example, an academic paper and boils it down to create a summary.

This is the current state of the art – the ability to take something written by a human that contains meaning, and boil it down via the algorithm to find the essence of that meaning.

Word embeddings are a key tool in creating machine understanding, but they are also the frontline in the fight against bias. Word embeddings enable a machine to understand the relationship between words: e.g. sister is to brother as queen is to king.

But because embeddings can only be developed based on ideas offered by humans, bias creeps in. Gender specificity is built into the language of sister, brother, father, mother. But even in non-gender-specific definitions – nurse, doctor, vocalist, guitarist – the statistical usage in the language creates biases, resulting in default constructions such as ‘she is a nurse’. There’s lots of work in this space right now to filter out the bias and create a correct neutrality.

Understanding language

LUIS, Microsoft’s Language Understanding Intelligence Service is often used in bots but it can also be used for analysis.

This can help bots to drive the right actions, and a similar approach is now being applied to more complex machine comprehension tasks.

Generating language

Phil used a picture of a bird on a branch as the catalyst for generative text. He took blocks of text from Shakespeare, Dickens and Edgar Allen Poe and built a simple model using deep learning concepts to tell the algorithm which words typically follow on from others, and what their relationships mean. Then he added a description of the image with tags and objects to generate a poem:

“hideous and I could fathom, and keeping out the sagging, uneven bed and infernal. Yelling the lens into a raven having dwelt in it were laid, and do not good opinion, fat the devil, in fear nothing…”

It may not make a great deal of sense and the grammar is questionable, but it demonstrates what, with a much more in-depth deep learning makeover, is possible.

There are current examples – often in the scientific paper area – where the latest tech is performing well. But elsewhere, as Phil’s poem demonstrates, it’s very easy to spot the lapses in meaning. As humans, we’re excellent at spotting when something’s not right because the meaning isn’t quite there.

The quickest route to value?

Where does a business entering the AI language space for the first time begin? Phil had three tips:

  1. Get new useful meta data with language analysis.
  2. Use things like MS Azure Cognitive Services first.
  3. Then trying something more advanced.

The Q&A

How do we find the right resource to make AI work?

Start by using Microsoft Cognitive Services – which makes the tools accessible to developers who may not be specialists in the AI space. If you need more capability, Microsoft has an extensive Partner network – including many BIMA members who can help.

If you’re working with your own machine learning models, skilling up your software engineers to be able to deploy those models becomes extremely important.

Where do we get the right binary from and where do we start?

A lot of the traditional pieces of software are available via API on a cost per call basis and that can be a very cost-effective way to start. You can then look at downloading the models – Microsoft makes some of them available as containers so they fit into your containerised language.

ML.net is a great extension for .net languages. NLTK is part of the Python language. Natural in JavaScript is also a really nice library, and it’s really straightforward.

Which are the use cases where this really works?

Microsoft has been focusing on enterprise language, and its Partners have had success in the regulated industries space and in financial services because there are lots of companies with lots of data.

So these are relatively mature areas, but if your business is not in that world, you need to test to make sure it works. Where work is required, you’ll need to teach the model and to do that you’ll need to decide whether to invest in helping your developers through that learning curve, or whether to call in a team who have already been through the process.

You can watch the complete event here.

Discover more about Microsoft’s Partner Programme here.

Please note, the initial sound issues are resolved after a couple of minutes!

 

 

Development

Latest news