Super HN - Super Hacker News

Vision Language Models Are Biased (vlmsarebiased.github.io) 12 points by taesiri 4 hours ago

This paper explores a different aspect of the limitations of VLMs compared to the paper VLMs are Blind (https://vlmsareblind.github.io). While in VLMs are Blind, o3 achieved 90% accuracy (https://openai.com/index/thinking-with-images), on similarly easy tasks using the counterfactual images from VLMs are Biased, o3 only reached 18.5%.

This may indicate that while VLMs might possess the necessary capability, their strong biases can cause them to overlook important cues, and their overconfidence in their own knowledge can lead to incorrect answers. by vokhanhan25 46 minutes ago

Very human-like errors. by bryanlarsen 5 minutes ago

State-of-the-art Vision Language Models achieve 100% accuracy counting on images of popular subjects (e.g. knowing that the Adidas logo has 3 stripes and a dog has 4 legs) but are only ~17% accurate in counting in counterfactual images (e.g. counting stripes in a 4-striped Adidas-like logo or counting legs in a 5-legged dog). by taesiri 4 hours ago

fun findings related to memorization of AI models. It simply means LLMs/VLLMs do not know how to predict generally but memorizing instead. A new perspective on adversarial attack methods. by shenkha 52 minutes ago

for overly represented concepts, like popular brands, it seems that the model “ignores” the details once it detects that the overall shapes or patterns are similar. Opening up the vision encoders to find out how these images cluster in the embedding space should provide better insights. by taesiri 23 minutes ago