Today, Google open source its latest version for image captioning system available as open source model in TensorFlow. This release contains significant improvements to the computer vision component of the captioning system, is much faster to train, and produces more detailed and accurate descriptions compared to the original system. These improvements are outlined and analyzed in the paper Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge, published in IEEE Transactions on Pattern Analysis and Machine Intelligence.

TensorFlow
TensorFlow

Google Brain team started working on a system that could analyze an image and write caption for it.. which was started in 2014. The V1 system able to achieve an accuracy of 89.6% and later upgraded to Inception V2 enabling 91.8% accuracy.

 

Automatically captioned by our system.
Automatically captioned by our system.

The current version V3 enables the system to analyze images upto 93.9% of accuracy. The latest version V3 can detect multiple objects in an image along with their characteristics and write more relevant caption.

Left: the better image model allows the captioning model to generate more detailed and accurate descriptions. Right: after fine-tuning the image model, the image captioning system is more likely to describe the colors of objects correctly.

Google later announced that image captioning system is now available open source which is a part of TensorFlow. This version of release contains significant improvements and new updates which is much faster to produce more details and accurate descriptions compared to the original system.