recent preprints
-
GViT: Representing Images as Gaussians for Visual Recognition
arXiv:2506.23532 [paper] [arxiv] [pdf] [bibtex] -
The Amazon Nova Family of Models: Technical Report and Model Card
arXiv:2506.12103 March 2025. [paper] [arxiv] [pdf] [bibtex] -
ParallelSpec: Parallel Drafter for Efficient Speculative Decoding
arXiv:2410.05589 October 2024. [paper] [arxiv] [pdf] [bibtex] -
Fairness and Bias Mitigation in Computer Vision: A Survey
arXiv:2408.02464 August 2024. [paper] [arxiv] [pdf] [bibtex] -
publications
-
-
-
-
NEW! SportR: A Benchmark for Multimodal Large Language Model Reasoning in Sports
International Conference on Learning Representations. ICLR 2026. [paper] [arxiv] [pdf] [bibtex] -
NEW! Taming Data and Transformers for Audio Generation
International Journal of Computer Vision. IJCV 2026 [paper] [project page] [github] [arxiv] [pdf] [bibtex] -
NEW! Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance
IEEE Winter Conference on Applications of Computer Vision. WACV 2026. Tucson, AZ. [paper] [arxiv] [pdf] [bibtex] -
Improving Progressive Generation with Decomposable Flow Matching
Conf on Neural Information Processing Systems. NeurIPS 2025. San Diego, CA. [paper] [project website] [github] [arxiv] [pdf] [bibtex] -
Learning from Synthetic Data for Visual Grounding
British Machine Vision Conference. BMVC 2025. Sheffield, UK. [paper] [project page] [arxiv] [pdf] [bibtex] -
Improving Large Vision and Language Models by Learning from a Panel of Peers
International Conference on Computer Vision. ICCV 2025. Honolulu, HI. [paper] [arxiv] [pdf] [bibtex] -
AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation
International Conference on Computer Vision. ICCV 2025. Honolulu, HI. [paper] [project page] [github] [arxiv] [pdf] [bibtex] -
-
FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
Transactions of Machine Learning Research, TMLR 2025. [paper] [arxiv] [pdf] [bibtex] -
PropTest: Automatic Property Testing for Improved Visual Programming
Conf. on Empirical Methods in Natural Language Processing. EMNLP 2024 (Findings). [paper] [project page] [github] [arxiv] [pdf] [bibtex] -
Zero-Shot Controllable Image-to-Video Animation via Motion Decomposition
ACM Multimedia MM 2024. Melbourne, Australia. [paper] [project page] [openreview] [bibtex] -
ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
European Conference on Computer Vision ECCV 2024. Milan, Italy. [paper] [project page] [arxiv] [github] [pdf] [bibtex] -
-
Improved Visual Grounding through Self-Consistent Explanations
Conf. on Computer Vision and Pattern Recognition CVPR 2024. Seattle, WA. [paper] [project page] [github] [arxiv] [pdf] [bibtex] -
ElasticDiffusion: Training-free Arbitrary Size Image Generation
Conf. on Computer Vision and Pattern Recognition CVPR 2024. Seattle, WA. [paper] [project page] [arxiv] [code] [pdf] [bibtex] -
-
-
Going Beyond Nouns With Vision & Language Models Using Synthetic Data
International Conference on Computer Vision. ICCV 2023. Paris, France. [paper] [project page] [arxiv] [github] [pdf] [bibtex] -
-
Estimating and Maximizing Mutual Information for Knowledge Distillation
Workshop on Fair, Data Efficient and Trusted Computer Vision at CVPR 2023. Vancouver, Canada. [paper] [arxiv] [pdf] [bibtex] -
ConStruct-VL: Data-Free Continual Structured VL Concepts Learning.
Conf. on Computer Vision and Pattern Recognition CVPR 2023. Vancouver, Canada. [paper] [arxiv] [pdf] [bibtex] -
CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning
Conf. on Computer Vision and Pattern Recognition CVPR 2023. Vancouver, Canada. [paper] [arxiv] [pdf] [bibtex] -
CLIP-Lite: Information Efficient Visual Representation Learning from Textual Annotations
Int. Conf. on Artificial Intelligence and Statistics AISTATS 2023. Valencia, Spain / Hybrid. [paper] [arxiv] [pdf] [bibtex] -
-
SimVQA: Exploring Simulated Environments for Visual Question Answering.
Conf. on Computer Vision and Pattern Recognition CVPR 2022. New Orleans, LA. [paper] [project page] [arxiv] [pdf] [bibtex] -
Towards Understanding Gender-Seniority Compound Bias in Natural Language Generation.
Language Resources and Evaluation Conference LREC 2022. [paper] [arxiv] [pdf] [bibtex] -
Backpropagation-Based Decoding for Multimodal Machine Translation
Frontiers in Artificial Intelligence. January 2022. [paper] [link] [bibtex] -
Evolving Image Compositions for Feature Representation Learning
British Machine Vision Conference. BMVC 2021. November 2021. [paper] [project page] [arxiv] [pdf] [bibtex] -
-
-
MEDIRL: Predicting the Visual Attention of Drivers via Maximum Entropy Deep Inverse Reinforcement Learning.
International Conference on Computer Vision. ICCV 2021. [paper] [project page] [code] [arxiv] [pdf] [bibtex] -
General Multi-label Image Classification with Transformers
Conference on Computer Vision and Pattern Recognition CVPR 2021. [paper] [arxiv] [pdf] [bibtex] -
Black-box Explanation of Object Detectors via Saliency Maps
Conference on Computer Vision and Pattern Recognition CVPR 2021. [paper] [arxiv] [pdf] [bibtex] -
-
Enabling AI at the Edge with XNOR-Networks
Communications of the ACM. December 2020 (Vol. 62, No. 12). [paper] [link] [bibtex] -
-
Using Visual Feature Space as a Pivot Across Languages
Findings of Empirical Methods in Natural Language Processing. Findings of EMNLP 2020. short. Accepted September 2020. [paper] [pdf] [project page] [code] [bibtex] -
CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation
Empirical Methods in Natural Language Processing. EMNLP 2020. short. Nov. 2020 [paper] [arxiv] [pdf] [bibtex] -
Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation
Association for Computational Linguistics. ACL 2020. July 2020. [paper] [arxiv] [pdf] [bibtex] -
Generative-discriminative Feature Representations for Open-set Recognition
Conference on Computer Vision and Pattern Recognition CVPR 2020. [paper] [pdf] [bibtex] -
Testing DNN Image Classifiers for Confusion & Bias Errors
International Conference on Software Engineering. ICSE 2020. October 2020. [paper] [arxiv] [pdf] [bibtex] -
-
-
-
Moviescope: Large-scale Analysis of Movies using Multiple Modalities
arXiv:1908.03180. August 2019. [paper] [arxiv] [project page] [pdf] [bibtex] -
Gender Bias in Contextualized Word Embeddings
North American Chapter of the Association for Computational Linguistics. NAACL 2019. short. Minneapolis, Minnesota. June 2019. [paper] [arxiv] [pdf] [bibtex] -
Chat-crowd: A Dialog-based Platform for Visual Layout Composition
North American Chapter of the Association for Computational Linguistics. NAACL 2019. System Demonstrations. Minneapolis, MN. June 2019. [paper] [arxiv] [project page] [code] [pdf] [bibtex] -
Deep Feature Aggregation and Image Re-ranking with Heat Diffusion for Image Retrieval
IEEE Transactions on Multimedia 2019 (Journal). [paper] [arxiv] [pdf] [bibtex] -
-
-
Building Discriminative CNN Image Representations for Object Retrieval using the Replicator Equation
Pattern Recognition 2018 (Journal). Volume 83. Pages 150-160. [paper] [link] [code] [bibtex] -
Where and Who? Automatic Semantic-Aware Person Composition
Winter Conference on Applications of Computer Vision. WACV 2018. Lake Tahoe, Nevada. March 2018. [paper] [pdf] [arXiv] [supp. material] [code] [bibtex] -
Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints
Empirical Methods in Natural Language Processing. EMNLP 2017. Copenhagen, Denmark. September 2017. [paper] [pdf] [code] [bibtex] -
-
-
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
European Conference on Computer Vision. ECCV 2016. Amsterdam, The Netherlands. October 2016. [paper] [arXiv] [project page] [code] [pdf] [bibtex] -
Stating the Obvious: Extracting Visual Common Sense Knowledge
North American Chapter of the Association for Computational Linguistics. NAACL 2016. short. San Diego, CA. June 2016. [paper] [pdf] [bibtex] -
Learning to Name Objects
Communications of the ACM. March 2016 (Vol. 59, No. 3). [paper] [pdf] [link] [technical perspective] [bibtex]