Adapt, LinkedIn collaboration brings multilingual content to millions of social network users

Domain-specific system focuses on the translation of categories such as product descriptions

Life

Image: Shutterstock via Dennis

13 March 2023

LinkedIn, the world’s largest professional network, and Adapt, the SFI research centre for AI-driven digital content technology at Dublin City University (DCU), have collaborated on a technology initiative with the potential to increase access to multilingual content for Linkedin members.

LinkedIn engineers and Adapt researchers, developed a machine translation (MT) system focused on specific categories of information that will enable their translation to specific languages. This will improve the experience of LinkedIn members across the globe searching for information on products in different languages.

The need for accessible multilingual content has grown significantly in recent years as LinkedIn’s global membership has expanded to more than 900 million members in more than 200 countries and territories worldwide. However, building MT systems for a specific domain is challenging as it requires a large and accurate parallel corpus which can be hard to obtain. LinkedIn’s engineering team working with Adapt investigated whether highly accurate such systems could enable LinkedIn to translate specific categories of information such as product descriptions from English to a target language.

Speaking about the development that was announced in Washington during the Irish Government’s St Patrick’s Day festivities, Declan McKibben, Executive Director of the Adapt centre said: “As Ireland’s leading research centre for human-centered AI, Adapt was the perfect partner for the R&D. Through our collaborative approach we have succeeded in advancing the MT capabilities within LinkedIn, pioneered new technologies to meet the evolving needs of their users, and strengthened the international research network between Ireland and the US.”

The domain-specific MT system focuses on the translation of specific categories such as product descriptions. It can help with augmenting existing LinkedIn data, as well as with developing multilingual product classification and recommendation systems at LinkedIn. Typically, LinkedIn has more labelled training data available in English than in other languages.

The new system makes it possible to translate the English data into French and German by an MT system and create additional ‘synthetic’ labelled data for the LinkedIn classifier. This has a multiplier effect and allows for the training of larger and more accurate classification models in the given target language.

“LinkedIn is committed to providing a positive and inclusive experience for everyone, no matter what part of the world they call home,” said Dr Tatiana Habruseva, staff AI engineer at Linkedin. “We want our members to be able to access the professional knowledge and content they are looking for in their native language wherever possible, and this collaboration is one step on our journey to achieving this. Between our engineering and AI teams in Ireland and across the globe, we were able to collaborate with the world class researchers at Adapt to look at how machine translation can be used to provide greater access to information in multiple languages.”

Prof Andy Way, Deputy Director of Adapt and professor of Computing at DCU, said: “Our MT team at DCU are world leaders in the area of AI-assisted translation and have developed a variety of tools powered by artificial intelligence, machine learning, and neural networks for organisations. Building Machine Translation systems for a specific domain requires a sufficiently large and good quality parallel corpus in that domain. However, this is a challenging task because for many domains and language-pairs, there is no parallel data.“

“In this collaboration, ADAPT developed English-to-French translation systems for software product descriptions from the LinkedIn website. Moreover, a first-ever parallel test set of product descriptions was created. Several MT systems were built and compared: a baseline system trained on publicly available parallel data from general domain, and domain-adapted systems trained on specialised data selected using sentence embedding-based corpus filtering and domain-specific sub-corpora extraction. Evaluation results show that the domain-adapted model based on our proposed approaches outperforms the baseline.”

TechCentral Reporters