The Vision, Language, and Learning Lab at the University of Virginia

recent preprints

AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation
Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Alper Canberk, Kwot Sin Lee, Vicente Ordonez, Sergey Tulyakov. arXiv:2412.15191 June 2024. [project page] [arxiv]
Taming Data and Transformers for Audio Generation
Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin, Guha Balakrishnan, Sergey Tulyakov, Vicente Ordonez. arXiv:2406.19388 June 2024. [project page] [arxiv]
Generative Visual Instruction Tuning
Jefferson Hernandez, Ruben Villegas, Vicente Ordonez. arXiv:2406.11262 June 2024. [github] [arxiv]
Learning from Models and Data for Visual Grounding
Ruozhen He, Paola Cascante-Bonilla, Ziyan Yang, Alexander C. Berg, Vicente Ordonez. arXiv:2403.13804 March 2024. [project page] [arxiv]

publications

LoCoRe: Image Re-ranking with Long-Context Sequence Modeling
Zilin Xiao, Pavel Suma, Ayush Sachdeva, Hao-Jen Wang, Giorgos Kordopatis-Zilos, Giorgos Tolias, Vicente Ordonez. Conf. on Computer Vision and Pattern Recognition. CVPR 2025. Nashville, TN. [arxiv]
FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
Xuehai He, Jian Zheng, Jacob Zhiyuan Fang, Robinson Piramuthu, Mohit Bansal, Vicente Ordonez, Gunnar A Sigurdsson, Nanyun Peng, Xin Eric Wang. Transactions of Machine Learning Research, TMLR 2025. [arxiv]
PropTest: Automatic Property Testing for Improved Visual Programming
Jaywon Koo, Ziyan Yang, Paola Cascante-Bonilla, Baishakhi Ray, Vicente Ordonez. Conf. on Empirical Methods in Natural Language Processing. EMNLP 2024 (Findings). [project page] [arxiv]
ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
Jefferson Hernandez, Ruben Villegas, Vicente Ordonez. European Conference on Computer Vision ECCV 2024. Milan, Italy. [project page] [arxiv] [github]
Grounding Language Models for Visual Entity Recognition
Zilin Xiao, Ming Gong, Paola Cascante-Bonilla, Xingyao Zhang, Jie Wu, Vicente Ordonez. European Conference on Computer Vision ECCV 2024. Milan, Italy. [github] [arxiv]
Improved Visual Grounding through Self-Consistent Explanations
Ruozhen He, Paola Cascante-Bonilla, Ziyan Yang, Alexander C. Berg, Vicente Ordonez. Conf. on Computer Vision and Pattern Recognition CVPR 2024. Seattle, WA. [project page] [arxiv]
ElasticDiffusion: Training-free Arbitrary Size Image Generation
Moayed Haji Ali, Guha Balakrishnan, Vicente Ordonez. Conf. on Computer Vision and Pattern Recognition CVPR 2024. Seattle, WA. [project page] [arxiv] [code]
SCoRD: Subject-Conditional Relation Detection with Text-Augmented Data
Ziyan Yang, Kushal Kafle, Zhe Lin, Scott Cohen, Zhihong Ding, Vicente Ordonez. Winter Conference on Applications of Computer Vision WACV 2024. Waikoloa, HI. [arxiv] [code]
Variation of Gender Biases in Visual Recognition Models Before and After Finetuning
Jaspreet Ranjit, Tianlu Wang, Baishakhi Ray, Vicente Ordonez. Workshop on Algorithmic Fairness through the Lens of Time at NeuRIPS 2023. New Orleans, LA. [arxiv] [code]
Going Beyond Nouns With Vision & Language Models Using Synthetic Data
Paola Cascante-Bonilla, Khaled Shehada, James Seale Smith, Sivan Doveh, Donghyun Kim, Rameswar Panda, Gül Varol, Aude Oliva, Vicente Ordonez, Rogerio Feris, Leonid Karlinsky. International Conference on Computer Vision. ICCV 2023. Paris, France. [project page] [arxiv] [github]
Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations
Ziyan Yang, Kushal Kafle, Franck Dernoncourt, Vicente Ordonez. Conf. on Computer Vision and Pattern Recognition CVPR 2023. Vancouver, Canada. [arxiv] [code] [demo]
Estimating and Maximizing Mutual Information for Knowledge Distillation
Aman Shrivastava, Yanjun Qi, Vicente Ordonez. Workshop on Fair, Data Efficient and Trusted Computer Vision at CVPR 2023. Vancouver, Canada. [arxiv]
ConStruct-VL: Data-Free Continual Structured VL Concepts Learning.
James Seale Smith, Paola Cascante-Bonilla, Assaf Arbelle, Donghyun Kim, Rameswar Panda, David Cox, Diyi Yang, Zsolt Kira, Rogerio Feris, Leonid Karlinsky. Conf. on Computer Vision and Pattern Recognition CVPR 2023. Vancouver, Canada. [arxiv]
CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning
James Seale Smith, Leonid Karlinsky, Vyshnavi Gutta, Paola Cascante-Bonilla, Donghyun Kim, Assaf Arbelle, Rameswar Panda, Rogerio Feris, Zsolt Kira. Conf. on Computer Vision and Pattern Recognition CVPR 2023. Vancouver, Canada. [arxiv]
CLIP-Lite: Information Efficient Visual Representation Learning from Textual Annotations
Aman Shrivastava, Ramprasaath R. Selvaraju, Nikhil Naik, Vicente Ordonez. Int. Conf. on Artificial Intelligence and Statistics AISTATS 2023. Valencia, Spain / Hybrid. [arxiv]
On the Transferability of Visual Features in Generalized Zero-Shot Learning
Paola Cascante-Bonilla, Leonid Karlinsky, James Seale Smith, Yanjun Qi, Vicente Ordonez. arXiv:2211.12494 November 2022. [arxiv] [github]
SimVQA: Exploring Simulated Environments for Visual Question Answering. Paola Cascante-Bonilla, Hui Wu, Letao Wang, Rogerio Feris, Vicente Ordonez. Conf. on Computer Vision and Pattern Recognition CVPR 2022. New Orleans, LA. [project page] [arxiv] [bibtex]
Towards Understanding Gender-Seniority Compound Bias in Natural Language Generation. Samhita Honnavalli, Aesha Parekh, Lily Ou, Sophie Groenwold, Sharon Levy, Vicente Ordonez, William Yang Wang. Language Resources and Evaluation Conference LREC 2022. [arxiv]
Backpropagation-Based Decoding for Multimodal Machine Translation
Ziyan Yang, Leticia Pinto-Alva, Franck Dernoncourt, Vicente Ordonez. Frontiers in Artificial Intelligence. January 2022. [link] [bibtex]
Evolving Image Compositions for Feature Representation Learning
Paola Cascante-Bonilla, Arshdeep Sekhon, Yanjun Qi, Vicente Ordonez. British Machine Vision Conference. BMVC 2021. November 2021. [project page] [arxiv] [bibtex]
VisualNews : Benchmark and Challenges in Entity-aware Image Captioning
Fuxiao Liu, Yinghan Wang, Tianlu Wang, Vicente Ordonez. Empirical Methods in Natural Language Processing. EMNLP 2021. Virtual / Punta Cana, Dominican Republic. November 2021. [arxiv] [code] [bibtex] (~Oral presentation)
Instance-level Image Retrieval using Reranking Transformers
Fuwen Tan, Jiangbo Yuan, Vicente Ordonez.
International Conference on Computer Vision. ICCV 2021. [arxiv] [code] [bibtex]
MEDIRL: Predicting the Visual Attention of Drivers via Maximum Entropy Deep Inverse Reinforcement Learning. Sonia Baee, Erfan Pakdamanian, Inki Kim, Lu Feng, Vicente Ordonez, Laura Barnes. International Conference on Computer Vision. ICCV 2021. [project page] [code] [arxiv] [bibtex]
General Multi-label Image Classification with Transformers
Jack Lanchantin, Tianlu Wang, Vicente Ordonez, Yanjun Qi. Conference on Computer Vision and Pattern Recognition CVPR 2021. [arxiv] [bibtex]
Black-box Explanation of Object Detectors via Saliency Maps
Vitali Petsiuk, Rajiv Jain, Varun Manjunatha, Vlad I. Morariu, Ashutosh Mehra, Vicente Ordonez, Kate Saenko. Conference on Computer Vision and Pattern Recognition CVPR 2021. [arxiv] (~Oral presentation)
Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning
Paola Cascante-Bonilla, Fuwen Tan, Yanjun Qi, Vicente Ordonez. The Thirty-Fifth AAAI Conference on Artificial Intelligence. AAAI 2021. February 2021 [arxiv] [code] [bibtex]
Enabling AI at the Edge with XNOR-Networks
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi. Communications of the ACM. December 2020 (Vol. 62, No. 12). (~Research Highlight)
[link] [bibtex]
Chair Segments: A Compact Benchmark for the Study of Object Segmentation
Leticia Pinto-Alva, Ian K. Torres, Rosangel Garcia, Ziyan Yang, Vicente Ordonez. arxiv:2011.14027 Nov 2020. [code] [arxiv] [bibtex]
Using Visual Feature Space as a Pivot Across Languages
Ziyan Yang, Leticia Pinto-Alva, Franck Dernoncourt, Vicente Ordonez. Findings of Empirical Methods in Natural Language Processing. Findings of EMNLP 2020. short. Accepted September 2020. [pdf] [project page] [code] [bibtex]
CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation
Tianlu Wang, Xuezhi Wang, Yao Qin, Ben Packer, Kang Lee, Jilin Chen, Alex Beutel, Ed Chi Empirical Methods in Natural Language Processing. EMNLP 2020. short. Nov. 2020 [arxiv] [bibtex]
Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation
Tianlu Wang, Xi Victoria Lin, Nazneen Fatema Rajani, Bryan McCann, Vicente Ordonez, Caiming Xiong. Association for Computational Linguistics. ACL 2020. July 2020. [arxiv]
Generative-discriminative Feature Representations for Open-set Recognition
Pramuditha Perera, Vlad I. Morariu, Rajiv Jain, Varun Manjunatha, Curtis Wigington, Vicente Ordonez, and Vishal M. Patel. Conference on Computer Vision and Pattern Recognition CVPR 2020. [pdf] [bibtex]
Testing DNN Image Classifiers for Confusion & Bias Errors
Yuchi Tian, Ziyuan Zhong, Vicente Ordonez, Gail Kaiser, Baishakhi Ray. International Conference on Software Engineering. ICSE 2020. October 2020. [arxiv] [bibtex]
Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries
Fuwen Tan, Paola Cascante-Bonilla, Xiaoxiao Guo, Hui Wu, Song Feng, Vicente Ordonez. Conf. on Neural Information Processing Systems. NeurIPS 2019. Vancouver, Canada. December 2019. [arxiv] [code] [bibtex]
Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations .
Tianlu Wang, Jieyu Zhao, Mark Yatskar, Kai-Wei Chang, Vicente Ordonez. International Conference on Computer Vision. ICCV 2019. Seoul, South Korea. October 2019. [arxiv] [code] [demo] [bibtex]
Text2Scene: Generating Compositional Scenes from Textual Descriptions
Fuwen Tan, Song Feng, Vicente Ordonez. Intl. Conference on Computer Vision and Pattern Recognition. CVPR 2019. Long Beach, California. June 2019. [arxiv] [code] [demo] [bibtex] (~Oral presentation + Best Paper Finalist -- top 1% of submissions)
- IBM Research Blog Coverage
- NVIDIA News Coverage
Moviescope: Large-scale Analysis of Movies using Multiple Modalities
Paola Cascante-Bonilla, Kalpathy Sitaraman, Mengjia Luo, Vicente Ordonez. arXiv:1908.03180. August 2019. [arxiv] [project page] [bibtex]
- TechXplore News Coverage
Gender Bias in Contextualized Word Embeddings
Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, Kai-Wei Chang. North American Chapter of the Association for Computational Linguistics. NAACL 2019. short. Minneapolis, Minnesota. June 2019. [arxiv] [bibtex]
(~Oral presentation)
Chat-crowd: A Dialog-based Platform for Visual Layout Composition
Paola Cascante-Bonilla, Xuwang Yin, Vicente Ordonez, Song Feng. North American Chapter of the Association for Computational Linguistics. NAACL 2019. System Demonstrations. Minneapolis, MN. June 2019. [arxiv] [project page] [code]
Deep Feature Aggregation and Image Re-ranking with Heat Diffusion for Image Retrieval .
Shanmin Pang, Jin Ma, Jianru Xue, Jihua Zhu, Vicente Ordonez. IEEE Transactions on Multimedia 2019 (Journal). [arxiv] [bibtex]
Feedback-prop: Convolutional Neural Network Inference under Partial Evidence
Tianlu Wang, Kota Yamaguchi, Vicente Ordonez. Conference on Computer Vision and Pattern Recognition. CVPR 2018. Salt Lake City, Utah. June 2018. [pdf] [arXiv] [code] [bibtex]
Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods
Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, Kai-Wei Chang. North American Chapter of the Association for Computational Linguistics. NAACL 2018. short. New Orleans, Louisiana. June 2018. [pdf] [arXiv] [code] [bibtex]
Building Discriminative CNN Image Representations for Object Retrieval using the Replicator Equation . Shanmin Pang, Jihua Zhu, Jiaxing Wang, Vicente Ordonez, Jianru Xue. Pattern Recognition 2018 (Journal). Volume 83. Pages 150-160. [link] [code] [bibtex]
Where and Who? Automatic Semantic-Aware Person Composition
Fuwen Tan, Crispin Bernier, Benjamin Cohen, Vicente Ordonez, Connelly Barnes. Winter Conference on Applications of Computer Vision. WACV 2018. Lake Tahoe, Nevada. March 2018. [pdf] [arXiv] [supp. material] [code] [bibtex]
Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints .
Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, Kai-Wei Chang. Empirical Methods in Natural Language Processing. EMNLP 2017. Copenhagen, Denmark. September 2017. [pdf] [code] [bibtex]
(~Oral presentation + Best Long Paper Award!)
- WIRED News Coverage
- Daily Mail News Coverage
- Times of London News Coverage
Obj2Text: Generating Visually Descriptive Language from Object Layouts
Xuwang Yin, Vicente Ordonez. Empirical Methods in Natural Language Processing. EMNLP 2017. Copenhagen, Denmark. September 2017. [pdf] [arxiv] [code] [bibtex] (~Oral presentation)
Commonly Uncommon: Semantic Sparsity in Situation Recognition
Mark Yatskar, Vicente Ordonez, Luke Zettlemoyer, Ali Farhadi. Intl. Conference on Computer Vision and Pattern Recognition. CVPR 2017. Honolulu, Hawaii. July 2017. [pdf] [arXiv] [bibtex] [demo]
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi. European Conference on Computer Vision. ECCV 2016. Amsterdam, The Netherlands. October 2016. [arXiv] [project page] [code] [bibtex] (~Oral presentation)
- New York Times News Coverage
- Article on University of Washington News
Stating the Obvious: Extracting Visual Common Sense Knowledge
Mark Yatskar, Vicente Ordonez, Ali Farhadi. North American Chapter of the Association for Computational Linguistics. NAACL 2016. short. San Diego, CA. June 2016. [pdf] [bibtex] (~Oral presentation)
Learning to Name Objects
Vicente Ordonez, Wei Liu, Jia Deng, Yejin Choi, Alexander C. Berg, Tamara L. Berg. Communications of the ACM. March 2016 (Vol. 59, No. 3). (~Research Highlight) [pdf] [link] [technical perspective] [bibtex]

Group picture October 2024.

ECCV 2024 vislang.ai team.

CVPR 2024 lunch with Tamara and Alex Berg in Seattle.

Catherine presenting her poster at CVPR 2024 in Seattle.

Group picture after Paola's PhD Defense, April 2024.

Group visit to the Menil Collection in Houston, Texas, February 2023.

vislang group picture Fall 2022.

PhD Student Ziyan Yang presenting her work at the XAI workshop at CVPR 2022 in New Orleans.

UVA vislang group first social event in Houston, Spring 2022.

Hiking at Bearfence trail in the Shenandoah National Park 2021.

UVA vislang group in a zoom meeting in early 2021.

vislang.ai group at NeurIPS 2019 in Vancouver, Canada.

PhD student Paola Cascante presenting at NeurIPS 2019 in Vancouver, Canada.

PhD student Tianlu Wang presenting at ICCV 2019 in Seoul, South Korea.

vislang.ai / AI2 team at ICCV 2019 in Seoul, South Korea.

Hanging out in Charlottesville, VA -- beginning of Fall 2019.

UVA vislang group, alumni and friends at CVPR 2019 in Long Beach, CA.

Summer trip to the Luray Caverns in Virginia, Summer 2017.

CURRENT AND PAST SPONSORS