Llava Vision Model

About 1,400,000 results

Open links in new tab

Any time

llava-vl.github.io
https://llava-vl.github.io
LLaVA
We introduce LLaVA (L arge L anguage- a nd- V ision A ssistant), an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and …
microsoft.com
https://www.microsoft.com › en-us › research › project › llava...
LLaVA: Large Language and Vision Assistant - Microsoft Research
LLaVA is an open-source project, collaborating with research community to advance the state-of-the-art in AI. LLaVA represents the first end-to-end trained large multimodal model (LMM) that …
github.com
https://github.com › haotian-liu › LLaVA
GitHub - haotian-liu/LLaVA: [NeurIPS'23 Oral] Visual ...
With additional scaling to LLaVA-1.5, LLaVA-NeXT-34B outperforms Gemini Pro on some benchmarks. It can now process 4x more pixels and perform more tasks/applications than before.
arxiv.org
https://arxiv.org › abs
[2304.08485] Visual Instruction Tuning - arXiv.org
Apr 17, 2023 · When fine-tuned on Science QA, the synergy of LLaVA and GPT-4 achieves a new state-of-the-art accuracy of 92.53%. We make GPT-4 generated visual instruction tuning data, …
learnopencv.com
https://learnopencv.com › llava-training-a-visual-assistant
LLaVA Architecture: From Frozen ViT to Fine-Tuned LLM
Jun 10, 2025 · A complete technical breakdown of the LLaVA-1.5 multimodal visual assistant. Explore its architecture, open-source training data, and how to use the model.
huggingface.co
https://huggingface.co › liuhaotian
liuhaotian/llava-v1.5-7b · Hugging Face
Model type: LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, …
dev.to
https://dev.to › evanlin › what-are-llava-and-llava...
What are LLaVA and LLaVA-Interactive? - DEV Community
3 days ago · LLaVA-Interactive is a research prototype system for multimodal AI interaction. The system can engage in multi-round conversations with users by receiving multimodal user input …

Some results have been removed
Pagination
- Next
- Next

LLaVA

LLaVA: Large Language and Vision Assistant - Microsoft Research

GitHub - haotian-liu/LLaVA: [NeurIPS'23 Oral] Visual ...

[2304.08485] Visual Instruction Tuning - arXiv.org

LLaVA Architecture: From Frozen ViT to Fine-Tuned LLM

liuhaotian/llava-v1.5-7b · Hugging Face

What are LLaVA and LLaVA-Interactive? - DEV Community