VisLang - Vision, Language and Learning Lab at Rice University

SBU Captions Explorer

The SBU Captions Dataset contains 1 million images with captions obtained from Flickr circa 2011 as documented in Ordonez, Kulkarni, and Berg. NeurIPS 2011. These are captions written by real users, pre-filtered by keeping only captions that have at least two nouns, a noun-verb pair, or a verb-adjective pair. They also exclude many noisy captions and trivial captions. The final set still contains noise which might be significant for some use cases, nevertheless this dataset has been used for research purposes for several tasks e.g. Google's Show-and-Tell and Microsoft's UNITER. Here we provide a search tool to find images on this dataset. Often researchers want to test their systems with specific images, this tools allows searching for some that match human-written text descriptions. If you're interested in dowloading this whole dataset go here instead.

The SBU Captions Dataset contains 1 million images with captions and was obtained from Flickr circa 2011 as documented in Ordonez, Kulkarni, and Berg. NeurIPS 2011. If you're interested in dowloading this whole dataset go here instead.