recent preprints
-
Taming Data and Transformers for Audio Generation
arXiv:2406.19388 June 2024. [project page] [arxiv] -
Generative Visual Instruction Tuning
arXiv:2406.11262 June 2024. [github] [arxiv] -
Learning from Models and Data for Visual Grounding
arXiv:2403.13804 March 2024. [project page] [arxiv]
publications
-
PropTest: Automatic Property Testing for Improved Visual Programming
Conf. on Empirical Methods in Natural Language Processing. EMNLP 2024 (Findings). [project page] [arxiv] -
ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
European Conference on Computer Vision ECCV 2024. Milan, Italy. [project page] [arxiv] [github] -
Grounding Language Models for Visual Entity Recognition
European Conference on Computer Vision ECCV 2024. Milan, Italy. [github] [arxiv] -
Improved Visual Grounding through Self-Consistent Explanations
Conf. on Computer Vision and Pattern Recognition CVPR 2024. Seattle, WA. [project page] [arxiv] -
ElasticDiffusion: Training-free Arbitrary Size Image Generation
Conf. on Computer Vision and Pattern Recognition CVPR 2024. Seattle, WA. [project page] [arxiv] [code] -
SCoRD: Subject-Conditional Relation Detection with Text-Augmented Data
Winter Conference on Applications of Computer Vision WACV 2024. Waikoloa, HI. [arxiv] [code] -
Variation of Gender Biases in Visual Recognition Models Before and After Finetuning
Workshop on Algorithmic Fairness through the Lens of Time at NeuRIPS 2023. New Orleans, LA. [arxiv] [code] -
Going Beyond Nouns With Vision & Language Models Using Synthetic Data
International Conference on Computer Vision. ICCV 2023. Paris, France. [project page] [arxiv] [github] -
Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations
Conf. on Computer Vision and Pattern Recognition CVPR 2023. Vancouver, Canada. [arxiv] [code] [demo] -
Estimating and Maximizing Mutual Information for Knowledge Distillation
Workshop on Fair, Data Efficient and Trusted Computer Vision at CVPR 2023. Vancouver, Canada. [arxiv] -
ConStruct-VL: Data-Free Continual Structured VL Concepts Learning.
Conf. on Computer Vision and Pattern Recognition CVPR 2023. Vancouver, Canada. [arxiv] -
CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning
Conf. on Computer Vision and Pattern Recognition CVPR 2023. Vancouver, Canada. [arxiv] -
CLIP-Lite: Information Efficient Visual Representation Learning from Textual Annotations
Int. Conf. on Artificial Intelligence and Statistics AISTATS 2023. Valencia, Spain / Hybrid. [arxiv] -
On the Transferability of Visual Features in Generalized Zero-Shot Learning
arXiv:2211.12494 November 2022. [arxiv] [github] -
SimVQA: Exploring Simulated Environments for Visual Question Answering. Conf. on Computer Vision and Pattern Recognition CVPR 2022. New Orleans, LA. [project page] [arxiv] [bibtex]
-
Towards Understanding Gender-Seniority Compound Bias in Natural Language Generation. Language Resources and Evaluation Conference LREC 2022. [arxiv]
-
Backpropagation-Based Decoding for Multimodal Machine Translation
Frontiers in Artificial Intelligence. January 2022. [link] [bibtex] -
Evolving Image Compositions for Feature Representation Learning
British Machine Vision Conference. BMVC 2021. November 2021. [project page] [arxiv] [bibtex] -
VisualNews : Benchmark and Challenges in Entity-aware Image Captioning
Empirical Methods in Natural Language Processing. EMNLP 2021. Virtual / Punta Cana, Dominican Republic. November 2021. [arxiv] [code] [bibtex] (~Oral presentation) -
Instance-level Image Retrieval using Reranking Transformers
International Conference on Computer Vision. ICCV 2021. [arxiv] [code] [bibtex] -
MEDIRL: Predicting the Visual Attention of Drivers via Maximum Entropy Deep Inverse Reinforcement Learning. International Conference on Computer Vision. ICCV 2021. [project page] [code] [arxiv] [bibtex]
-
General Multi-label Image Classification with Transformers
Conference on Computer Vision and Pattern Recognition CVPR 2021. [arxiv] [bibtex] -
Black-box Explanation of Object Detectors via Saliency Maps
Conference on Computer Vision and Pattern Recognition CVPR 2021. [arxiv] (~Oral presentation) -
Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning
The Thirty-Fifth AAAI Conference on Artificial Intelligence. AAAI 2021. February 2021 [arxiv] [code] [bibtex] -
Enabling AI at the Edge with XNOR-Networks
Communications of the ACM. December 2020 (Vol. 62, No. 12). (~Research Highlight)
[link] [bibtex] -
Chair Segments: A Compact Benchmark for the Study of Object Segmentation
arxiv:2011.14027 Nov 2020. [code] [arxiv] [bibtex] -
Using Visual Feature Space as a Pivot Across Languages
Findings of Empirical Methods in Natural Language Processing. Findings of EMNLP 2020. short. Accepted September 2020. [pdf] [project page] [code] [bibtex] -
CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation
Empirical Methods in Natural Language Processing. EMNLP 2020. short. Nov. 2020 [arxiv] [bibtex] -
Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation
Association for Computational Linguistics. ACL 2020. July 2020. [arxiv] -
Generative-discriminative Feature Representations for Open-set Recognition
Conference on Computer Vision and Pattern Recognition CVPR 2020. [pdf] [bibtex] -
Testing DNN Image Classifiers for Confusion & Bias Errors
International Conference on Software Engineering. ICSE 2020. October 2020. [arxiv] [bibtex] -
Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries
Conf. on Neural Information Processing Systems. NeurIPS 2019. Vancouver, Canada. December 2019. [arxiv] [code] [bibtex] -
Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations .
International Conference on Computer Vision. ICCV 2019. Seoul, South Korea. October 2019. [arxiv] [code] [demo] [bibtex] -
Text2Scene: Generating Compositional Scenes from Textual Descriptions
Intl. Conference on Computer Vision and Pattern Recognition. CVPR 2019. Long Beach, California. June 2019. [arxiv] [code] [demo] [bibtex] (~Oral presentation + Best Paper Finalist -- top 1% of submissions)
- IBM Research Blog Coverage
- NVIDIA News Coverage -
Moviescope: Large-scale Analysis of Movies using Multiple Modalities
arXiv:1908.03180. August 2019. [arxiv] [project page] [bibtex]
- TechXplore News Coverage -
Gender Bias in Contextualized Word Embeddings
North American Chapter of the Association for Computational Linguistics. NAACL 2019. short. Minneapolis, Minnesota. June 2019. [arxiv] [bibtex]
(~Oral presentation) -
Chat-crowd: A Dialog-based Platform for Visual Layout Composition
North American Chapter of the Association for Computational Linguistics. NAACL 2019. System Demonstrations. Minneapolis, MN. June 2019. [arxiv] [project page] [code] -
Deep Feature Aggregation and Image Re-ranking with Heat Diffusion for Image Retrieval .
IEEE Transactions on Multimedia 2019 (Journal). [arxiv] [bibtex] -
Feedback-prop: Convolutional Neural Network Inference under Partial Evidence
Conference on Computer Vision and Pattern Recognition. CVPR 2018. Salt Lake City, Utah. June 2018. [pdf] [arXiv] [code] [bibtex] -
Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods
North American Chapter of the Association for Computational Linguistics. NAACL 2018. short. New Orleans, Louisiana. June 2018. [pdf] [arXiv] [code] [bibtex] -
Building Discriminative CNN Image Representations for Object Retrieval using the Replicator Equation . Pattern Recognition 2018 (Journal). Volume 83. Pages 150-160. [link] [code] [bibtex]
-
Where and Who? Automatic Semantic-Aware Person Composition
Winter Conference on Applications of Computer Vision. WACV 2018. Lake Tahoe, Nevada. March 2018. [pdf] [arXiv] [supp. material] [code] [bibtex] -
Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints .
Empirical Methods in Natural Language Processing. EMNLP 2017. Copenhagen, Denmark. September 2017. [pdf] [code] [bibtex]
(~Oral presentation + Best Long Paper Award!)
- WIRED News Coverage
- Daily Mail News Coverage
- Times of London News Coverage -
Obj2Text: Generating Visually Descriptive Language from Object Layouts
Empirical Methods in Natural Language Processing. EMNLP 2017. Copenhagen, Denmark. September 2017. [pdf] [arxiv] [code] [bibtex] (~Oral presentation) -
Commonly Uncommon: Semantic Sparsity in Situation Recognition
Intl. Conference on Computer Vision and Pattern Recognition. CVPR 2017. Honolulu, Hawaii. July 2017. [pdf] [arXiv] [bibtex] [demo] -
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
European Conference on Computer Vision. ECCV 2016. Amsterdam, The Netherlands. October 2016. [arXiv] [project page] [code] [bibtex] (~Oral presentation)
- New York Times News Coverage
- Article on University of Washington News -
Stating the Obvious: Extracting Visual Common Sense Knowledge
North American Chapter of the Association for Computational Linguistics. NAACL 2016. short. San Diego, CA. June 2016. [pdf] [bibtex] (~Oral presentation) -
Learning to Name Objects
Communications of the ACM. March 2016 (Vol. 59, No. 3). (~Research Highlight) [pdf] [link] [technical perspective] [bibtex]