A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
-
Updated
Mar 27, 2026 - Python
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
[PRL 2024] This is the code repo for our label-free pruning and retraining technique for autoregressive Text-VQA Transformers (TAP, TAP†).
Enable intelligent retrieval, filtering, and summarization of scientific papers from multiple sources for efficient research and report generation.
🖥️ Switch focus between monitors effortlessly with MMF, a macOS utility that streamlines your multitasking using just the keyboard.
Add a description, image, and links to the textvqa topic page so that developers can more easily learn about it.
To associate your repository with the textvqa topic, visit your repo's landing page and select "manage topics."