Holistic Sustainability is my personal term1 to describe that sustainability is not only about the environment. For a truly sustainable society, we should minimise all forms of inequality, because they are all interconnected. If we cannot respect each other, how are we ever to respect the environment we are all part of and the earth we all share?

Immigration and integration are two very loaded terms these days, and were big topics in the recent Swedish election. Without going into detail on these complex questions, part of the discussion revolves around difficulties to get new citizens feel included. Despite the fact that Swedes in general can and do speak English, most of Swedish society requires one to know Swedish. Personally, I believe in providing people with the tools they need to succeed. So what if we could provide people a way to learn Swedish language and culture simultaneously?

“How is AI shifting power?” – Pratyusha Kalluri

SVT is Sweden’s public service TV. They provide a streaming service called SVT Play with news, movies, documentaries, game shows - you name it. Most of their videos have subtitles in Swedish, but no other language. Many of their videos (like the one below) even have forced Swedish subtitles. Such subtitles are of course great if you know Swedish and have trouble hearing. But what if you don’t know Swedish?

I’ve learned a lot of English by watching English language movies and TV shows with subtitles in Swedish. By being exposed to a language like that you get used to it and start to pick up phrases. Wouldn’t it be a great idea if SVT had, at the very least, English subtitles?

This is where OpenAI Whisper comes in. Whisper is a “general-purpose speech recognition model”, which means that it’s a computer program that can transcribe voice into text. It can transcribe voices from 98 different languages, and - get this - even translate the text into English.

I spent part of my weekend playing with it, and it is surprisingly easy to use. There are already many tutorials online on setup and basic use of Whisper, so I shall leave technical details for another post. The only relevant technical information I should make clear is that I used the “medium” size model. Results would likely be better with the “large” model. Yet, Whisper works surprisingly well.

The result

If you cannot see the video below, check it out at Vimeo. I put the English subtitles at the top as the video already had hard coded Swedish subtitles at the bottom.

If you know Swedish, you’ll see the translation is quite good. Remember, this is with the medium size model, so presumably the large model would be even better. At around the 1min50s mark, the subtitle output gets ahead of itself and is a sentence ahead of what is being said for the rest of the video. We could adjust this manually, or there could be a Whisper setting to make this better from the start. Overall though, the subtitles do what subtitles should!

Also, since Whisper translates audio, we don’t get a translation of the title text at the start of the video. But if our primary audience is people learning Swedish, this is not a top priority anyway.

Thoughts and next steps

I think the result is really impressive, especially considering this was my first test. I haven’t looked into all the Whisper settings, or had a chance to test the large model.

However, Whisper is not capable of real-time translation and to get decent output speed you need a fairly beefy computer. But the system works well enough that it would be possible to create a web site where the user enters a URL to, for example, SVT Play, and after a few minutes you’re provided with a subtitled video. Such a service is the obvious next step for this project, so keep an eye out for my next blog post by following me on linkedin, twitter or subscribe to the rss feed of this blog.

Oh, by the way, I’m looking for a job in the ML/AI space, hire me!

PS. If you like this idea and can fund cloud compute for this project, please get in touch.

  1. Most likely, what I mean by the term is already included in existing theories like for example intersectionality. I just finished a PhD and have not had time to read up on these topics properly. I’m obviously not the first to come up with this specific term, seeing as holisticsustainability.com was registered in 2009