Transfer Learning from Text-Motion Retrieval to Violence Classification
Violent actions are rare in motion datasets, making violence recognition difficult due to scarcity of available samples and, in general, scarcity of labelled data. This is a major safety concern with the rise of robots interacting directly with people.
For our Advanced Machine Learning course project, we investigated whether semantic priors from text-motion retrieval models (TMR++) could transfer to violence classification.
By framing this as a few-shot learning problem, we applied Model-Agnostic Meta-Learning (MAML) with prototypical networks to enable rapid adaptation from minimal examples.
To prevent feature collapse, a common issue in contrastive learning with imbalanced data, we implemented a decoupled Three-Loss Architecture (BCE + Center + Hinge) to explicitly enforce separation between safety-critical categories without losing fine-grained semantics.
Key Findings
- MAML significantly outperforms the standard Prototypical Network baseline, proving that meta-learning is highly effective for data-scarce scenarios
- We achieved a 73.1% Violence Detection Rate while maintaining a negligible 1.0% False Positive Rate
While fully supervised fine-tuning still holds the ceiling, meta-learning proved to be a viable, data-efficient alternative for deploying safety models where labeled data is scarce.