Paper Title:
On the Performance of Multimodal Language Models
Published on:
4 October 2023
Primary Category:
Computation and Language
Paper Authors:
Utsav Garg,
Erhan Bas
Compares 5 major multimodal LLMs on diverse benchmarks
Shows importance of large vision encoders and decoder fine-tuning
Demonstrates data diversity is key, model size plateaus
Analyzes tradeoffs of different model components
Provides guidance for training robust multimodal LLMs
Evaluating Multimodal Language Models
This paper benchmarks publicly available multimodal language models on tasks like visual question answering, captioning, and classification. Through experiments, it reveals insights on optimal model components and training strategies.
Illuminating Language Models: Unlocking Multimodal Understanding through Modularized Learning
Evaluating language models as conversational agents
Evaluating multimodal AI assistants through standardized tests
Large language models struggle with long-range dependencies
Language models as optimizers for vision-language models
Aligning vision-language models to generate detailed news image captions
No comments yet, be the first to start the conversation...
Sign up to comment on this paper