@AnnemarieBridy Absolutely, and it’s positive that they strive for more. For me the dilemma sta...

@AnnemarieBridy

Absolutely, and it’s positive that they strive for more.

For me the dilemma starts before the LLMs, as the amount of content available online for training large language models in different languages is quite unbalanced, as the article suggests. Hence their performance will differ significantly depending on language. It’s a bias built into the fabric of the internet (and hence the training data) rather than an inherent problem of language models.

We often forget that there are more than 6,000 languages in the world. I take every opportunity to remind myself of this. 😊