A recent study critiques Shapley values, finding misalignment in evaluation metrics and human utility.
2 results for: evaluation metrics
Test-Time Matching Enhances Compositional Reasoning in Multimodal Models
A new test-time matching method improves compositional reasoning in AI models, achieving state-of-the-art results.