From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation
2Shanghai Artificial Intelligence Laboratory
3Fudan University
The left panel illustrates the comparison between UniMod and traditional baselines, where UniMod introduces a structured reasoning trajectory comprising Evidence, Modality, Risk, Policy, and Answer. The center panel, UniTrace, demonstrates the consensus mechanism used to select specialized teacher models (e.g., GLM) for labeling each trajectory node. The right panel details the Training stage, where the UniMod is optimized via UniRM. UniRM utilizes a shared VLM backbone with task-specific heads, incorporating head-wise weight subspace decoupling and stochastic head scheduling.
VLM-based models are ranked by an overall score (mean of Text and Image Avg), indicated by blue shading where deeper intensity signifies a higher ranking. UniMod achieves the best overall performance while using substantially fewer training samples than prior high-performing VLM-based guards. Best results are shown in bold, and second-best are underlined.
Best results are shown in bold, and second-best are underlined.
The first three panels illustrate the training dynamics for Formality, Modality and Risk attributes. The final panel shows the downstream F1 scores for both text and image moderation.
(a-b) Average performance and variance of the UniRM under various ablation settings. (c) Data Scaling: F1 score improvement of UniMod when training data is scaled from $L_1$ to $L_2$. (d) Model Scaling: Comparison of F1 score gains ($\Delta$) across different moderation models when increasing model capacity from 3B to 7B parameters.
Abstract
Safety moderation is pivotal for identifying harmful content. Despite the success of textual safety moderation, its multimodal counterparts remain hindered by a dual sparsity of data and supervision. Conventional reliance on binary labels lead to shortcut learning, which obscures the intrinsic classification boundaries necessary for effective multimodal discrimination. Hence, we propose a novel learning paradigm (UniMod) that transitions from sparse decision-making to dense reasoning traces. By constructing structured trajectories encompassing evidence grounding, modality assessment, risk mapping, policy decision, and response generation, we reformulate monolithic decision tasks into a multi-dimensional boundary learning process. This approach forces the model to ground its decision in explicit safety semantics, preventing the model from converging on superficial shortcuts. To facilitate this paradigm, we develop a multi-head scalar reward model (UniRM). UniRM provides multi-dimensional supervision by assigning attribute-level scores to the response generation stage. Furthermore, we introduce specialized optimization strategies to decouple task-specific parameters and rebalance training dynamics, effectively resolving interference between diverse objectives in multi-task learning. Empirical results show UniMod achieves competitive textual moderation performance and sets a new multimodal state-of-the-art (SOTA) using less than 40% of the training data used by leading baselines. Ablations further validate our multi-attribute trajectory reasoning, offering an effective and efficient framework for multimodal moderation.