Oracle Digital Assistant – Better Data Quality with Data Manufacturing

One of the biggest challenges in creating a Chatbot is to train the model with good data. That means having a good set of utterances that represent the target personas of the chatbot. Now, that is something not easily achieved within the boundaries of a project. Fortunately the Oracle Digital Assistant (ODA) has a cool capability that allows to outsource the gathering of Data. It’s called Data Manufacturing.

It allows us to create a Form/Survey that can be shared across everyone to collect different inputs to Utterances (and also Entities).

Lets have a look at the Job of Intent Paraphrasing -> The Paraphrase Job is how you collect utterances from the crowd.

Create a Data Manufacturing Job

These are the available options.

What does the documentation say?

Annotation Jobs

You can assign an Annotation Job when you have logging data that needs to be classified to an intent, or when a single intent is too broad and needs to be broken down into separate intents. You can also assign crowd workers to annotate the key words and phrases from the training data that relate to an ML Entity.

Validation Jobs

For Validation jobs, crowd workers review utterances to ascertain if they fit the task or action described by the intent, or if the correct ML Entity has been identified. Only utterances that are judged valid by crowd workers get added to the training data.

Paraphrasing Jobs

The Paraphrase Job is how you collect utterances from the crowd. This assignment describes how they should craft their utterances.

Choose Type, Language and number of paraphrases per Intent give it a Name, and select which Intents to add.

Specify the Prompts and Hints – this will help the crowd users in better understand the task at hand.

Now the Job is created and configured – you can use the generated link (see yellow highlight), and send that by email to your crowd.

Start Paraphrasing Job as a User

If you get the email and press the link, this is what you will see. In this case the Job is for the Portuguese Language, hence everything is for the target language.

It asks for the name and email address.

It shows Instructions with examples of Valid/Invalid Utterances.

And this is the screen where the user can input the Utterances.

Once that is done he can press the button Enviar (Send), and that concludes the Task.

Collect all the Data

As soon as users start submitting data, all of that becomes available in the ODA Data Manufacturing page.

And the ODA Data Manufacturing Admin is able to Accept/Reject the Utterances.

When he accepts, there us a confirmation Prompt…

…after which the data becomes available in the Intents/Utterances.

This is extremely valuable as it brings valuable data so that the model is trained more accurately.