An arXiv paper introduces a quantitative-plus-error-analysis benchmark for Vietnamese legal text, comparing GPT-4o, Claude 3 Opus, Gemini 1.5 Pro and Grok-1.
3 results for: arxiv
Full fine-tuning concentrates LLM attribution in code-compliance models
An arXiv study uses perturbation-based attribution to compare FFT, LoRA, and quantized LoRA across model sizes and finds FFT yields more focused interpretive patterns.
Merge GNN Predictions with LLM Reasoning in GLOW for Open-World QA
GLOW pairs a pre-trained GNN with an LLM to answer questions over incomplete knowledge graphs and ships GLOW-BENCH, a 1,000-question evaluation.