Single-Demonstration Imitation with Residual Reinforcement Learning for Dual-Arm Robotic Bottle Opening
DOI:
https://doi.org/10.64117/simposioscea.v2i2.209Resumen
Learning manipulation from extremely limited data remains challenging for robotic systems. We present a framework that combines Behavioural Cloning (BC) from a single kinesthetic demonstration with residual Reinforcement Learning (RL) to solve a long-horizon dual-arm bottle unscrewing task. A base policy is first trained via supervised imitation, capturing nominal behaviour. A residual policy is then learned in simulation using PPO to produce bounded corrective actions, improving robustness and generalization to variations in bottle geometry. Results show that while single-demonstration BC performs reliably under nominal conditions, it degrades under distribution shifts. The residual formulation preserves nominal performance and significantly improves robustness. The final controller is deployed in a one-shot sim-to-real transfer, achieving successful execution on different bottle types.