recent preprints
-
Agentic Discovery with Active Hypothesis Exploration for Visual Recognition
arXiv:2604.12999 [paper] [pdf] [bibtex] -
Beyond Referring Expressions: Scenario Comprehension Visual Grounding
arxiv:2604.02323 [paper] [pdf] [bibtex] -
MotionBits: Video Segmentation through Motion-Level Analysis of Rigid Bodies
arXiv:2603.06846 [paper] [pdf] [bibtex] -
GViT: Representing Images as Gaussians for Visual Recognition
arXiv:2506.23532 [paper] [pdf] [bibtex] -
The Amazon Nova Family of Models: Technical Report and Model Card
arXiv:2506.12103 March 2025. [paper] [pdf] [bibtex] -
ParallelSpec: Parallel Drafter for Efficient Speculative Decoding
arXiv:2410.05589 October 2024. [paper] [pdf] [bibtex] -
Fairness and Bias Mitigation in Computer Vision: A Survey
arXiv:2408.02464 August 2024. [paper] [pdf] [bibtex] -
Generative Visual Instruction Tuning
arXiv:2406.11262 June 2024. [paper] [github] [pdf] [bibtex]
publications
-
NEW! One Model, Many Budgets: Elastic Latent Interfaces for Diffusion Transformers
IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2026. [paper] [github] [pdf] [bibtex] -
NEW! ProxyThinker: Test-Time Guidance through Small Visual Reasoners
International Conference on Learning Representations. ICLR 2026. [paper] [github] [pdf] [bibtex] -
NEW! MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction
International Conference on Learning Representations. ICLR 2026. [paper] [github] [pdf] [bibtex] -
NEW! SportR: A Benchmark for Multimodal Large Language Model Reasoning in Sports
International Conference on Learning Representations. ICLR 2026. [paper] [pdf] [bibtex] -
NEW! Taming Data and Transformers for Audio Generation
International Journal of Computer Vision. IJCV 2026 [paper] [project page] [github] [pdf] [bibtex] -
NEW! Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance
IEEE Winter Conference on Applications of Computer Vision. WACV 2026. Tucson, AZ. [paper] [pdf] [bibtex] -
Improving Progressive Generation with Decomposable Flow Matching
Conf on Neural Information Processing Systems. NeurIPS 2025. San Diego, CA. [paper] [project website] [github] [pdf] [bibtex] -
Learning from Synthetic Data for Visual Grounding
British Machine Vision Conference. BMVC 2025. Sheffield, UK. [paper] [project page] [pdf] [bibtex] -
Improving Large Vision and Language Models by Learning from a Panel of Peers
International Conference on Computer Vision. ICCV 2025. Honolulu, HI. [paper] [pdf] [bibtex] -
AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation
International Conference on Computer Vision. ICCV 2025. Honolulu, HI. [paper] [project page] [github] [pdf] [bibtex] -
LoCoRe: Image Re-ranking with Long-Context Sequence Modeling
Conf. on Computer Vision and Pattern Recognition. CVPR 2025. Nashville, TN. [paper] [github] [pdf] [bibtex] -
FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
Transactions of Machine Learning Research, TMLR 2025. [paper] [pdf] [bibtex] -
PropTest: Automatic Property Testing for Improved Visual Programming
Conf. on Empirical Methods in Natural Language Processing. EMNLP 2024 (Findings). [paper] [project page] [github] [pdf] [bibtex] -
Zero-Shot Controllable Image-to-Video Animation via Motion Decomposition
ACM Multimedia MM 2024. Melbourne, Australia. [paper] [openreview] [bibtex] -
ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
European Conference on Computer Vision ECCV 2024. Milan, Italy. [paper] [project page] [github] [pdf] [bibtex] -
Grounding Language Models for Visual Entity Recognition
European Conference on Computer Vision ECCV 2024. Milan, Italy. [paper] [github] [pdf] [bibtex] -
Improved Visual Grounding through Self-Consistent Explanations
Conf. on Computer Vision and Pattern Recognition CVPR 2024. Seattle, WA. [paper] [project page] [github] [pdf] [bibtex] -
ElasticDiffusion: Training-free Arbitrary Size Image Generation
Conf. on Computer Vision and Pattern Recognition CVPR 2024. Seattle, WA. [paper] [project page] [code] [pdf] [bibtex] -
SCoRD: Subject-Conditional Relation Detection with Text-Augmented Data
Winter Conference on Applications of Computer Vision WACV 2024. Waikoloa, HI. [paper] [code] [pdf] [bibtex] -
Variation of Gender Biases in Visual Recognition Models Before and After Finetuning
Workshop on Algorithmic Fairness through the Lens of Time at NeuRIPS 2023. New Orleans, LA. [paper] [code] [pdf] [bibtex] -
Going Beyond Nouns With Vision & Language Models Using Synthetic Data
International Conference on Computer Vision. ICCV 2023. Paris, France. [paper] [project page] [github] [pdf] [bibtex] -
-
Estimating and Maximizing Mutual Information for Knowledge Distillation
Workshop on Fair, Data Efficient and Trusted Computer Vision at CVPR 2023. Vancouver, Canada. [paper] [pdf] [bibtex] -
ConStruct-VL: Data-Free Continual Structured VL Concepts Learning.
Conf. on Computer Vision and Pattern Recognition CVPR 2023. Vancouver, Canada. [paper] [pdf] [bibtex] -
CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning
Conf. on Computer Vision and Pattern Recognition CVPR 2023. Vancouver, Canada. [paper] [pdf] [bibtex] -
CLIP-Lite: Information Efficient Visual Representation Learning from Textual Annotations
Int. Conf. on Artificial Intelligence and Statistics AISTATS 2023. Valencia, Spain / Hybrid. [paper] [pdf] [bibtex] -
On the Transferability of Visual Features in Generalized Zero-Shot Learning
arXiv:2211.12494 November 2022. [paper] [github] [pdf] [bibtex] -
SimVQA: Exploring Simulated Environments for Visual Question Answering.
Conf. on Computer Vision and Pattern Recognition CVPR 2022. New Orleans, LA. [paper] [project page] [pdf] [bibtex] -
Towards Understanding Gender-Seniority Compound Bias in Natural Language Generation.
Language Resources and Evaluation Conference LREC 2022. [paper] [pdf] [bibtex] -
Backpropagation-Based Decoding for Multimodal Machine Translation
Frontiers in Artificial Intelligence. January 2022. [paper] [bibtex] -
Evolving Image Compositions for Feature Representation Learning
British Machine Vision Conference. BMVC 2021. November 2021. [paper] [project page] [pdf] [bibtex] -
VisualNews : Benchmark and Challenges in Entity-aware Image Captioning
Empirical Methods in Natural Language Processing. EMNLP 2021. Virtual / Punta Cana, Dominican Republic. November 2021. [paper] [code] [pdf] [bibtex] -
Instance-level Image Retrieval using Reranking Transformers
International Conference on Computer Vision. ICCV 2021. [paper] [code] [pdf] [bibtex] -
MEDIRL: Predicting the Visual Attention of Drivers via Maximum Entropy Deep Inverse Reinforcement Learning.
International Conference on Computer Vision. ICCV 2021. [paper] [project page] [code] [pdf] [bibtex] -
General Multi-label Image Classification with Transformers
Conference on Computer Vision and Pattern Recognition CVPR 2021. [paper] [pdf] [bibtex] -
Black-box Explanation of Object Detectors via Saliency Maps
Conference on Computer Vision and Pattern Recognition CVPR 2021. [paper] [pdf] [bibtex] -
Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning
The Thirty-Fifth AAAI Conference on Artificial Intelligence. AAAI 2021. February 2021 [paper] [code] [pdf] [bibtex] -
Enabling AI at the Edge with XNOR-Networks
Communications of the ACM. December 2020 (Vol. 62, No. 12). [paper] [bibtex] -
Chair Segments: A Compact Benchmark for the Study of Object Segmentation
arxiv:2011.14027 Nov 2020. [paper] [code] [pdf] [bibtex] -
Using Visual Feature Space as a Pivot Across Languages
Findings of Empirical Methods in Natural Language Processing. Findings of EMNLP 2020. short. Accepted September 2020. [paper] [project page] [code] [bibtex] -
CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation
Empirical Methods in Natural Language Processing. EMNLP 2020. short. Nov. 2020 [paper] [pdf] [bibtex] -
Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation
Association for Computational Linguistics. ACL 2020. July 2020. [paper] [pdf] [bibtex] -
Generative-discriminative Feature Representations for Open-set Recognition
Conference on Computer Vision and Pattern Recognition CVPR 2020. [paper] [pdf] [bibtex] -
Testing DNN Image Classifiers for Confusion & Bias Errors
International Conference on Software Engineering. ICSE 2020. October 2020. [paper] [pdf] [bibtex] -
Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries
Conf. on Neural Information Processing Systems. NeurIPS 2019. Vancouver, Canada. December 2019. [paper] [code] [pdf] [bibtex] -
-
-
Moviescope: Large-scale Analysis of Movies using Multiple Modalities
arXiv:1908.03180. August 2019. [paper] [project page] [pdf] [bibtex] -
Gender Bias in Contextualized Word Embeddings
North American Chapter of the Association for Computational Linguistics. NAACL 2019. short. Minneapolis, Minnesota. June 2019. [paper] [pdf] [bibtex] -
-
Deep Feature Aggregation and Image Re-ranking with Heat Diffusion for Image Retrieval
IEEE Transactions on Multimedia 2019 (Journal). [paper] [pdf] [bibtex] -
Feedback-prop: Convolutional Neural Network Inference under Partial Evidence
Conference on Computer Vision and Pattern Recognition. CVPR 2018. Salt Lake City, Utah. June 2018. [paper] [pdf] [code] [bibtex] -
Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods
North American Chapter of the Association for Computational Linguistics. NAACL 2018. short. New Orleans, Louisiana. June 2018. [paper] [pdf] [code] [bibtex] -
Building Discriminative CNN Image Representations for Object Retrieval using the Replicator Equation
Pattern Recognition 2018 (Journal). Volume 83. Pages 150-160. [paper] [code] [bibtex] -
Where and Who? Automatic Semantic-Aware Person Composition
Winter Conference on Applications of Computer Vision. WACV 2018. Lake Tahoe, Nevada. March 2018. [paper] [pdf] [supp. material] [code] [bibtex] -
Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints
Empirical Methods in Natural Language Processing. EMNLP 2017. Copenhagen, Denmark. September 2017. [paper] [code] [bibtex] -
Obj2Text: Generating Visually Descriptive Language from Object Layouts
Empirical Methods in Natural Language Processing. EMNLP 2017. Copenhagen, Denmark. September 2017. [paper] [pdf] [code] [bibtex] -
Commonly Uncommon: Semantic Sparsity in Situation Recognition
Intl. Conference on Computer Vision and Pattern Recognition. CVPR 2017. Honolulu, Hawaii. July 2017. [paper] [pdf] [demo] [bibtex] -
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
European Conference on Computer Vision. ECCV 2016. Amsterdam, The Netherlands. October 2016. [paper] [project page] [code] [pdf] [bibtex] -
Stating the Obvious: Extracting Visual Common Sense Knowledge
North American Chapter of the Association for Computational Linguistics. NAACL 2016. short. San Diego, CA. June 2016. [paper] [bibtex] -
Learning to Name Objects
Communications of the ACM. March 2016 (Vol. 59, No. 3). [paper] [link] [technical perspective] [bibtex]