Learning-based approaches to quadruped locomotion commonly adopt generic policy architectures like fully connected MLPs. As such architectures contain few inductive biases, it is common in practice to incorporate priors in the form of rewards, training curricula, imitation data, or trajectory generators. In nature, animals are born with priors in the form of their nervous system's architecture, which has been shaped by evolution to confer innate ability and efficient learning. For instance, a horse can walk within hours of birth and can quickly improve with practice. Such architectural priors can also be useful in ANN architectures for AI. In this work, we explore the advantages of a biologically inspired ANN architecture for quadruped locomotion based on neural circuits in the limbs and spinal cord of mammals. Our architecture achieves good initial performance and comparable final performance to MLPs, while using less data and orders of magnitude fewer parameters. Our architecture also exhibits better generalization to task variations, even admitting deployment on a physical robot without standard sim-to-real methods. This work shows that neural circuits can provide valuable architectural priors for locomotion and encourages future work in other sensorimotor skills.
NCAP successfully learns to locomote across various speeds and terrains. These representative videos show the robot on Flat terrain at Walk (0.5 m/s) and Run (1.0 m/s) speeds. We simultanously visualize the Rhythm Generation (RG) neural activity and footfall plots. As brainstem command to RG increases, the expressed gait transitions from walk to bound. Having an achitectural prior thus enables control of the resulting behavior without additional reward/imitation priors.
We deploy the architectures to a physical Unitree A1 quadruped robot. We expect the domain gap to be large since we do not perform controller tuning or system identification with the physical robot, the simulated training does not apply aggressive domain randomization, and the architectures do not have mechanisms for online adaptation.
MLP falls immediately due to its erratic and unstable actions, often launching the robot aggressively into the wall.
Even before training, NCAP is stable and produces slight walking movements with small foot displacements.
After training, NCAP walks successfully on most trials, though with less smoothness than in simulation.