LIMMT: Less Is More
for Motion Tracking

* equal contribution   † corresponding author
1Tsinghua University   2GalBot   3Shanghai Jiao Tong University   4Peking University   5Shanghai Qi Zhi Institute

A data-centric framework that curates motion data via physics feasibility, action diversity, and action complexity — training on just 3% of AMASS outperforms training on the full corpus.

Abstract

We argue that high-quality motion data can steer tracking policies toward better optimization trajectories early in training. In this work, we introduce LIMMT (Less Is More for Motion Tracking). To our knowledge, this is the first data-centric study for physics-based humanoid motion tracking. We go beyond simply removing low-quality and erroneous clips, but define motion data quality through three dimensions: physics feasibility, diversity, and complexity. We show that even training with under 3% of AMASS yields better tracking performance than training with the full dataset. We further conduct data cleaning on the estimated web-sourced mocap data. Extensive experiments and analyses validate the effectiveness of our framework.

GQS Pipeline Overview

Figure 1. The proposed GQS pipeline. The framework operationalizes motion quality through three stages: filtering physically infeasible data, mapping motions to a semantic latent space, and selecting a subset via complexity-weighted sampling.

Highlights

1 A data-centric perspective redefining "quality" over "quantity": physics feasibility, action diversity, and action complexity — not dataset scale — are the decisive factors for robust humanoid tracking.
2 We propose General Quality Selection (GQS), a hierarchical pipeline that first eliminates physically infeasible artifacts via simulator grounding, then maximizes behavioral coverage and dynamic richness using harmonic embeddings.
3 A Less-Is-More paradigm for efficient learning: training on just 3% of curated AMASS data consistently outperforms training on the full corpus across all evaluated metrics.
4 GQS gains transfer in a plug-and-play manner across diverse trackers (Any2Track, TWIST2) and datasets (AMASS, PHUMA), indicating that GQS improves the training signal rather than exploiting algorithmic idiosyncrasies.
5 Extensive ablations isolate the contribution of each dimension, including a calibrated weighting of physical violations that distinguishes toxic artifacts from valuable dynamic motions.

Method Overview

GQS is a three-stage pipeline that transforms a large, noisy motion corpus into a compact, high-value training subset. The staged design encodes a key insight: feasibility, diversity, and complexity must be addressed in the right order. Filtering must come first; otherwise physically broken motions can dominate the representation space. Embedding learning must operate on feasible data to define a meaningful semantic manifold. Complexity weighting comes last; otherwise high-energy artifacts may be over-selected.

Stage I
Physics-based
Feasibility Filtering
Replay each candidate motion in a rigid-body simulator and compute a composite feasibility score Sphy. Severe failure modes (extended floating, foot sliding) are heavily penalized; rare or mild signals (self-collision, jerk) receive minimal weights. Motions with Sphy < 90 are discarded.
Stage II
Semantic Motion
Embedding (HME)
Embed motions using Harmonic Motion Embedding implemented as a Periodic Autoencoder, decomposing motion into Amplitude A, Frequency F, Phase φ, and Offset b. Global descriptor zglobal = mean([A,F]) yields a phase-invariant manifold for diversity-aware sampling.
Stage III
Global Weighted
FPS Selection
Perform Global Weighted FPS over the embedding space. The selection score α · D̂(u,S) + (1−α) · Ĉ(u) primarily maximizes diversity while preferring dynamically richer motions (higher kinetic energy and acceleration) when candidates are comparable in distance.

Main Results on AMASS

Comparison of Success Rate, MPJPE, and MPKPE across different methods and data ratios on the AMASS benchmark. GQS at 3% data outperforms the Full Data baseline across all metrics, while Random 3% causes catastrophic performance collapse — demonstrating that the "less is more" effect is not about using less data per se, but about using the right data.

Method Physics Filter FPS Ratio Success Rate ↑ MPJPE (rad) ↓ MPKPE (mm) ↓
Any2Track×0.942 ± 0.0110.114 ± 0.00339.24 ± 1.12
Any2Track + RandomRandom 3%0.838 ± 0.0180.159 ± 0.005158.76 ± 14.34
Any2Track + PHC0.948 ± 0.0090.111 ± 0.00236.18 ± 0.98
Any2Track + GQS100%0.954 ± 0.0110.112 ± 0.00334.12 ± 0.95
Any2Track + GQS10%0.959 ± 0.0100.107 ± 0.00230.15 ± 0.81
Any2Track + GQS3%0.956 ± 0.0120.108 ± 0.00229.87 ± 0.76
TWIST2×0.825 ± 0.0140.099 ± 0.00335.80 ± 1.08
TWIST2 + RandomRandom 3%0.649 ± 0.0210.177 ± 0.006263.19 ± 27.87
TWIST2 + PHC0.845 ± 0.0120.096 ± 0.00233.54 ± 0.94
TWIST2 + GQS100%0.843 ± 0.0120.094 ± 0.00331.25 ± 0.89
TWIST2 + GQS10%0.868 ± 0.0110.084 ± 0.00227.21 ± 0.72
TWIST2 + GQS3%0.861 ± 0.0130.092 ± 0.00227.09 ± 0.68

Performance vs. Data Ratio

We evaluate GQS across data ratios from 1% to 100%. Remarkably, GQS Success Rate crosses the full-data baseline at just 3% data, while tracking error (MPJPE) drops below the baseline around 5%-10%. GQS significantly outperforms the Full Data baseline in both efficiency and quality.

Performance vs Data Ratio

Figure 2. Performance vs. Data Ratio. The red line shows GQS Success Rate crossing the baseline at just 3% data. The blue line shows that tracking error peaks around 5%-10%. GQS significantly outperforms the Full Data baseline in both efficiency and quality.

Component Ablation

Component ablation study at 3% data ratio. The full GQS framework achieves the best performance, demonstrating the synergistic benefit of all three components: Physics filtering is critical — removing it drops Success Rate from 95.6% to 91.1%; Diversity is the primary prerequisite; Complexity weighting further refines selection to prioritize informative motions.

Physics Sparsity Complexity Success Rate ↑ MPJPE (rad) ↓
×0.911 ± 0.0140.1213 ± 0.001
×0.934 ± 0.0090.1197 ± 0.003
×0.946 ± 0.0080.1079 ± 0.002
0.956 ± 0.0120.1079 ± 0.002

Training Dynamics & Cross-Domain Generalization

GQS-curated data achieves higher reward and lower tracking error from the early stages of training (before 0.5B steps), confirming that curated data provides cleaner gradients that steer the policy toward better solutions early. The advantage reflects a fundamentally better optimization trajectory rather than merely faster convergence. Cross-domain experiments on PHUMA further show that the 10% subset outperforms the full dataset when transferred zero-shot to AMASS (92.8% vs 91.0% SR).

Training Dynamics

Figure 3. Training Dynamics. Learning curves comparing GQS 10% against Full Data. We achieve higher reward throughout training, not just at convergence, confirming that data curation improves the optimization trajectory from early stages.

PHUMA In-Domain Efficiency

(a) In-Domain Efficiency on PHUMA. The 10% subset achieves lower MPJPE than the full dataset, surpassing the performance ceiling using only 30% of the data.

Cross-Domain Transfer to AMASS

(b) Cross-Domain Transfer to AMASS. The 10% subset significantly outperforms the full dataset when transferred zero-shot to AMASS (92.8% vs 91.0% SR).

Figure 4. Generalization Analysis on PHUMA.

Real-World Deployment

We deploy our tracker on the physical Unitree G1 humanoid robot. The policy trained on only 10% GQS-curated data achieves successful real-world deployment without any fine-tuning. The deployed policy demonstrates robust tracking across diverse motion categories — from expressive dance routines to athletic skills — validating that our data curation strategy not only improves simulation metrics but also produces policies with strong generalization to real-world conditions.

Dance: Can Do Can Go! (1)
Dance: Can Do Can Go! (2)
Dance: Gokuraku Joudo (1)
Dance: Gokuraku Joudo (2)
Dance: Gokuraku Joudo (3)
Dance: Old Town Road
Athletic: Huo Yuan Jia / Fearless
Dance: Buqi
Expressive: Yao Bai
Whole-Body Tracking (1)
Whole-Body Tracking (2)

Real-time deployment of GQS-curated 10% policy on the Unitree G1 humanoid — including Dance, Athletic, and Expressive motions.

Conclusion

We present LIMMT, a data-centric framework for humanoid motion tracking. Our three-stage GQS pipeline filters infeasible motions, embeds them in a semantic space, and selects a compact subset via complexity-weighted sampling. Training on just 3% of curated data outperforms full-corpus baselines. The gains are plug-and-play across trackers and datasets. Motion data is valuable when it is physically feasible, behaviorally diverse, and dynamically rich — not when it merely grows in volume.

BibTeX

@article{limmt2026,
      title={LIMMT: Less Is More for Motion Tracking},
      author={Guan, Yu and Qi, Zekun and Lin, Chenghuai and Chen, Xuchuan and Liu, Dairu and
              Zhang, Wenyao and Wang, Jilong and Yu, Xinqiang and Wang, He and Yi, Li},
      journal={arXiv preprint arXiv:2026.xxxxx},
      year={2026}
    }