AI: More Welsh data needed to improve accuracy, says business

image caption,

Tom Burke, co-founder of Haia, says the lack of Welsh language data means translations and transcriptions are often inaccurate.

Tech developers say better collaboration in Wales is needed to deliver artificial intelligence (AI) features in Welsh.

Chatbot ChatGPT’s ability to understand and communicate in Welsh impressed researchers, with some saying the language was “part of the AI ​​revolution”.

But they said copyrighted Welsh language material must be available to train the computer software.

The Welsh Government has said its strategy will soon be updated.

One company that is already using AI to provide bilingual services is Anglesey-based Haia.

The online events company uses simultaneous translation software to allow speakers to speak in Welsh or English with translated subtitles.

But its co-founder Tom Burke said their product could be improved if more data were legally available in Welsh.

“One of our problems is its accuracy. If you compare it to German or Spanish, Welsh is a small data set,” Mr Burke said.

“Often we will find that there are inaccuracies in the translation or transcriptions and the way to improve this is to get access to a lot of data that is actually in Welsh.

image source, Getty Images

image caption,

Access to larger Welsh language datasets could mean the use of smart speakers in Welsh

Language AI technology works with computerized large language models that use vast amounts of data, such as web pages, books and articles, to predict which words and phrases go together.

Welsh language data could also include radio and television programmes.

“If we can get that data, use it to train the models, then the Welsh language models will become more accurate,” added Mr Burke.

“It gives us a head start in this technology and allows us to look at other lesser-used languages ​​around the world where we can use the lessons we’ve learned here in Wales to drive the technology into those markets as well.”

“In the long run, this will allow for new businesses, new innovation and Wales to become a hub for language technology.”

Welsh chatbot

Researchers at Bangor University, Canolfan Bedwyr, launched Macsen, a Welsh-language chatbot prototype, eight years ago.

Now they are running it using ChatGPT, developed by OpenAI in the US.

As well as the economic potential, Gruffudd Prys, Head of Language Technology at Canolfan Bedwyr, said the material should be available in Welsh to make the technology “more relevant to the needs of the Welsh language and Wales in general. “.

He said: “One of the things we can do to improve the quality of AI is to make the data out there available under the right licenses so that the models reflect the reality of Wales and not be too American or American. international models’.

image source, Getty Images

image caption,

Tom Burke says that some languages ​​have larger datasets, which means translations and transcriptions are more accurate

Tom Burke said access to the data should happen soon.

“We’ve already lost 12 months of innovation time and what’s going to happen is that eventually we’re just going to be behind the curve and by the time we can start using them, the rest of the world will already have it,” he said.

“We have this great position, we have this bilingual country.

“We have a fantastic university like Bangor developing this technology. We need to do it now so companies can start using it and start there.”

A ‘major priority’ in Wales

Welsh Government Minister for the Welsh Language, Jeremy Miles MS, said the use of artificial intelligence to develop the Welsh language was “hugely important”.

“This has been a key priority in our technology strategy for Wales, which we intend to update for the next period,” said Mr Miles.

“We spent 2 million for this. sterling and it remains a very high priority for our next strategy, so we will be able to address all of these issues.

“Because of technological changes, it’s really important that it’s available in Welsh and other languages.”

Godfrey Kemp

"Bacon fanatic. Social media enthusiast. Music practitioner. Internet scholar. Incurable travel advocate. Wannabe web junkie. Coffeeaholic. Alcohol fanatic."