VisOnlyQA: A New Dataset for Evaluating the Visual Perception of LVLMs (Large Vision Language Models)
Large Vision Language Models (LVLMs) have demonstrated significant advancements across various challenging multi-modal tasks over the past few years. Their ability to interpret visual information in figures, known as visual perception, relied on visual encoders and multimodal training. Even with these advancements, visual perception errors still cause many mistakes in LVLMs and impact their ability […] The post VisOnlyQA: A New Dataset for Evaluating the Visual Perception of LVLMs (Large Vision Language Models) appeared first on MarkTechPost. read more