The paper reports that full fine-tuning and model scale change how LLMs prioritize numerical constraints and rule identifiers when generating computer-processable compliance rules.
AI Quick Take
- Full fine-tuning (FFT) produces attribution patterns that are statistically different and more focused than LoRA and quantized LoRA.
- As model size grows, LLMs shift toward prioritizing numerical constraints and rule IDs; semantic-match improvements level off above ~7B parameters.
An arXiv study applies a perturbation-based attribution analysis to compare full fine-tuning (FFT), low-rank adaptation (LoRA), and quantized LoRA across multiple model sizes for automated code compliance tasks. The paper reports that FFT produces attribution patterns that are statistically different and more focused than parameter-efficient fine-tuning methods, and finds scale-linked interpretive changes as model parameter counts increase.
The researchers tracked how models attribute importance in the source text when generating machine-processable compliance rules. They found larger models tend to prioritize numerical constraints and explicit rule identifiers in the building text, while semantic similarity between generated rules and references improves with model size only up to about 7 billion parameters-after that, gains level off. These outcomes are derived from a perturbation-based attribution method applied across fine-tuning strategies and scales.
The operational implication is that fine-tuning choices can alter not just model performance but the internal focus of a model in rule - driven tasks, a material consideration for teams that must demonstrate why a model produced a particular interpretation in regulated settings. The plateaus in semantic-match improvement also suggest there are scaling limits for this task that affect cost-benefit decisions. Follow-up work should test these attribution patterns on production compliance datasets and evaluate whether focused attribution under FFT improves auditability in practice.