No supervision required
By combining the natural language processing with machine learning and cloud computing, the field of augmented analytics is set to change the way businesses create insights, moving control away from data scientists and into the hands of regular employees. It’s easy to use, easy to interpret and even brings insights to the user without them having to ask. The possibilities are endless but the technology still presents a number of hurdles.
Naomi Harte, Associate Professor in Digital Media Systems, Electronic & Electrical Engineering and a principal investigator at the Science Foundation Ireland research centre Adapt, has been studying aspects of natural language processing (NLP) since the 2000s. Her work has looked at aspects of verbal and non-verbal communication, from visual cues to the affects of ageing. In conversation with TechRadio, she argued that the first hurdle to consistent and accurate translation is knowing what sounds to filter out and which ones to retain for processing.
“Ultimately we want to get to human speech and when we capture those signals they rarely tend to be as clean as we would like them to be,” she said. “There are lots of problems in capturing speech like background noise and competing speakers and we have to home in on what part of the signal we actually want information extracted from first. That can be quite challenging.”
Harte said she was generally enthusiastic about advances made with devices like the Amazon Echo, the many implementations of Google Assistant and Siri on the iPhone.
“A lot of these devices are in a mode of constantly listening and they’ve got quite sophisticated microphone systems whereby they can cancel out a lot of noise so the signal that they end up with they can do a lot with,” she said.
“A lot of the hardware issues in terms of capture in a quiet room at home when you’re the only person in the room is straightforward. It’s a little more difficult if you’re listening to the radio at the same time as speaking. It’s very difficult to use Siri if you’re walking down the road if there are buses going by but if it’s quiet it can do a good job. The big challenge then becomes human speech. We, as humans, have the ability to deal with all these variations in speech, which machines don’t do so well.”
Harte said previous models of language understanding have proved to the unreliable but that neural networks have delivered promising results but that “Until we really crack how to analyse what we see, what we hear to the level that humans can do it, that’s the stage at which you’ll really find NLP taking off.”
Another issue Harte raised is that of regional dialects. “If you buy speech recognition software for it will work very well for American English or British English when you first pull it out of the box where you have a drop down menu but is ‘Irish’ is not there. Adapting to accents is something that of software can do over days and weeks but if you want to do something spontaneously it gets very confused.
“We tend to pronounce words very differently. We tend to use quite different vowel sounds in Hiberno English, our soft ‘T’ is a classic way of confusing technology then it’s not just how we say things, we have different words, different sentence constructs, we have words that somebody from England might not even understand, so how can we teach technology to understand it?”
One company dedicated to improving the accuracy of machine translation is Smartling. Co-founder Jack Welde, believes combining machine learning with human editors will be the most accurate way to deliver translated content for the foreseeable future.
“Right now the most important content is still translated by human beings,” he says. “They may use machines to help them and they may use machine translation to get them started. They may use a computer-based translation tool to help them with that process but the most mission-critical content or content that’s designed for connecting with humans’ emotional responses around marketing content is still translated by humans today.”
Smartling’s mission is to make website content viewable across any language barrier, which means the role of the human editor extends beyond text translation to expertise in specific verticals. Welde points to the examples of legal, pharmaceutical or aeronautical content, where sectoral expertise is as necessary as literal translation.
“Many of the translators we work with are skilled in those kinds of verticals. They were bi-lingual or tri-lingual and spent a lot of time working in those fields before they became translators,” he says. “Any kind of specialty content is going to be very challenging. More generalised content marketing, training materials and so on, a big part of the way we deal with that complexity is understanding the customer’s style, how they communicate. You can imagine a business-to-consumer company with a software platform for teens is going to be very different than for a law firm with their own key terminology. “
Welde gives his AIs a head start on decoding jargon by using glossaries of industry-specific terms.
“Every business has a glossary of their most-used words and phrases that describes themselves and usually its not five words, it’s 500 words and phrases to be able to do that,” he says. “We use a machine learning approach, extracting all of that [sectoral] content and being able to say ‘we think this is the 500 most important words and phrases, lets get agreement on how they should be translated into French or German or Japanese, and help the translator to gently enforce those terms translated consistently from there’.”
Moving on from the role of machine learning, augmented analytics also presents a challenge to how workflows will change as the ability to extract business insights from raw data moves into the hands of non-technical staff.
Edward McDonell, director of Ceadar, an Enterprise Ireland and IDA-supported centre for research in applied AI, calls this “the disintermediation of the data scientist from the analytics or AI workflow”.
“The stage you want to get to is where the business owner can input data easily into the analysis system, allow the system to do its work and to hide the complexity from him/her but then present with a visually understandable and attractive visualisation what the data is saying and what insights can be derived from it,” he says. “At the moment a lot of analytics and AI is craftwork, we want to move it on to the point where it becomes mainstream.”
Part of this problem of this ‘disintermediaton’ is that hiding the complexity of algorithms is great for the user but not so good for data scientists. McDonnell is unconcerned, saying it has always been thus. “The challenge computer engineers always face is we are always doing ourselves out of a job because we were automating ourselves out.”
McDonnell counters this argument but putting the ability to recognise and match sources of data together as something AI still cannot do. “The greatest insight comes when we fuse together multiple different data sources rather than just relying on one.”
Like Welde, McDonnell cites industry-specific knowledge, specifically the legal sector, as benefitting from AI-powered workflows. “Imagine being able to process thousands and thousands of documents, whether you’re looking for case law in thousands of court reports. If you had computer assistance to help with that in a contract, the system will be able to tell you who were the parties to the contract, what is the legal jurisdiction of the contract has been signed under, what’s the duration of the contract, etc.”
Dr Gavin Shorten, programme delivery manager, Cloud & AI Foundry at IBM Ireland Lab has seen applications for augmented analytics across many industries, developing end-to-end solutions starting with the identification of data sources and finishing with the deployment of working applications. He sees cloud as an integral element in augmented analytics as a “delivery vehicle”, taking the example of a commuter survey where an AI was used to discern positive and negative responses on Twitter.
“We used a Web application to run a campaign like ‘what do you think about the new proposals for College Green city plaza. You can ask yes/no questions like ‘do you think it’s a useful tool to promote the city’? When users responded from their Twitter account we analysed whether they responded positively or negatively and why. Maybe they thought ‘yes, this is cool, it will bring more people into the area’ or ‘no, it would have some impact in terms of rubbish’. We have all the capabilities to build these tools rapidly, so you don’t need a data scientist, you don’t need a software developer, you just engage with the Web application – we’re trying to empower people that way.”
As a vendor you would expect IBM to rely on proprietary software, but Shorten maintains that while the Watson AI powers their projects there is plenty of room for open source tools.
“We obviously have a lot of our own proprietary know-how but we do adopt a lot of open source tools in our approaches so we tend to have compatible technology because it’s built off the same open source approaches as other people are using,” he says. “Because the world is becoming ‘API-driven’ or ‘service-driven’ things are being component-ised to discrete working parts, an application could have 15 or 16 APIs, it becomes very easy to integrate because it’s a standardised communications framework that can connect it with APIs on Twitter or anything else you want to connect it to.”
With demand for data scientists still outstripping supply, the analytics field is still learning as it goes. Ceadar’s McDonnell calls the current market for data “insatiable”. Shorten says that the inability of companies to fill jobs is creating opportunities for developers to move into the field – assuming it hasn’t been completely automated by the time they qualify. The results, from his perspective, have been positive in that he can work with people who have a broad skill set.
“If we look at developers and data scientists my ideal colleague is someone that has experience of both. You’ll find that quite a bit. You’ll have guys who mathematicians who have taken post graduate degrees or go into industry and upskill on their computer programming, or vice versa, I can see them as being symbiotic and augmented analytics is very much akin to that,” he says. “We have a person in our team who used to work as a software developer and then she began her PhD in analytics when she joined our team. She took her computer science and technology background and coupled it with the natural language processing skills and capabilities she has grown to develop a recommender system that enables software developers to ask questions. You could type in ‘I’m looking for a use case to test an API that does x’, and it goes off and mines through 200,000 test cases that were pre-written, tries to match up the description of the test case you are looking for, and it comes back and says ‘here’s an example of 10 test cases that might match your question’, ranked in order of how accurately they match your question. That’s one side of the house enabling software development and vice versa.”
As for putting the value of augmented analytics into the hands of non-technical staff, the lack of actual demand would indicate a lack of awareness.
Marc O’Regan, chief technology officer of Dell Technologies takes the view that “like all emerging technologies the offload of these current process and skills to the machine requires a ‘data first, digitally driven’ approach and a leap of faith. Managing the data flow and lifecycle will also surface issues and solutions.”
O’Regan echoes the point about the quality of data sets. For him, the absence of unified data standards means the role of data scientist isn’t threatened, though it may change.
“One of the core issues is data preparation and this is what data scientists struggle with most. Aggregation of data and resources to help in this regard has been an industry requirement for some time,” he says.
“Data scientists are interested in the correlation of data, the use of mathematics and mathematical functions and the mapping of such to data and data sets and points of interest in order to extract value. We still need these skills in order to deliver specific outcomes.”
Whether the arrival of augmented analytics will force any organisational changes is to be decided. This raises the question whether new roles should be created or expertise spread across the enterprise?
“I think the enterprise needs to adapt to the distributed nature of the ecosystem as it develops,” says O’Regan. “This is a sea change for most organisations and addressing data across this linear ecosystem takes vision and execution from C-suite.”