There are a lot of pictures on the web and it can be a problem for the blind and visually impaired. They can have the text read out by a digital voice, but information in pictures they often miss. It is possible to enter text that describes the image and which is then read out, but that text must be entered manually and many people forget that.
But now Microsoft may have a solution to the problem. The company has developed an AI solution that is as good as, or even better than, people at describing images . Microsoft already has such a solution, but the new variant should be twice as good.
Vivo, as the solution is called, is already available in Microsoft's Seeing AI, which is an app to help the blind and visually impaired get information from images. But to give the technology more spread, it is also available as part of Azure Cognitive Services. This allows developers who use Microsoft's Azure to integrate the technology into their own solutions.
Simply put, Vivo works so that the AI can see images with keywords. Each keyword is linked to a specific object in the image. It works much like when a small child is learning to read. An image of an apple is displayed above the text "apple" for example.
When Vivo has learned to link the right object to the right text, it's time for the next step, the AI will learn to write complete sentences that contain the keywords.
Here, Vivo differs from other methods that normally use complete captions for training. The disadvantage of this, according to Microsoft researchers, is that it will be difficult for the AI to learn how different objects belong together, which Vivo can do without problems.
Better than people on captions
This method works so well that Vivo's descriptions were better than those written by humans according to a study conducted by the research team.
Microsoft researchers now hope that many other companies will also use Vivo to make it easier for the blind and visually impaired to use computer systems. But the rest of us can also benefit from the solution, the researchers say.
- Refining techniques to describe images can help all users. It makes it easier to find images via search engines and for the visually impaired it will be a dramatic improvement when they have to use the web and computer programs, says Eric Boyd at Azure AI.