Vision, Language and Learning Lab
|
Vision, Language and Learning Lab
|
home people demos publications

demos

Ambient Sound Generator

Ask the model to generate the sound of a car accelerating and it will generate a sound that matches the description.

General Image-Text Matcher

This demo attemps to highlight areas of an image conditioned on an arbitrary input text.

Visual Relation Predictor

This demo predicts subject-object relations. The input is an object name and its bounding box and the predictions are all related objects and their locations.

Genderless

This demo attemps to make it difficult for a model to predict gender from an image by modifying it so that this task becomes harder while retaining most image information.

Text2Scene

This demo turns textual descriptions into a scene generated automatically by stitching objects sequentially on a plain background step-by-step using sequence generation neural networks.

Visual Translator

This demo attemps to translate a sentence in English into visual feature space and into a sentence in both German (Deutsch) and Japanese (日本語).

SBU Captions Explorer

Search images by text in the SBU Captions Dataset which has 1 million images with captions from Flickr and has been used in numerous projects

COCO Captions Explorer

Search images by text in the popular Common Objects in Context (COCO) dataset maintained by the Common Visual Data Foundation.

Department of Computer Science @ Rice University ‒ 6100 Main St, Duncan Hall, Houston, TX 77005-1827