Neural Circuit Architectural Priors for Quadruped Locomotion

New York University

Abstract

Learning-based approaches to quadruped locomotion commonly adopt generic policy architectures like fully connected MLPs. As such architectures contain few inductive biases, it is common in practice to incorporate priors in the form of rewards, training curricula, imitation data, or trajectory generators. In nature, animals are born with priors in the form of their nervous system's architecture, which has been shaped by evolution to confer innate ability and efficient learning. For instance, a horse can walk within hours of birth and can quickly improve with practice. Such architectural priors can also be useful in ANN architectures for AI. In this work, we explore the advantages of a biologically inspired ANN architecture for quadruped locomotion based on neural circuits in the limbs and spinal cord of mammals. Our architecture achieves good initial performance and comparable final performance to MLPs, while using less data and orders of magnitude fewer parameters. Our architecture also exhibits better generalization to task variations, even admitting deployment on a physical robot without standard sim-to-real methods. This work shows that neural circuits can provide valuable architectural priors for locomotion and encourages future work in other sensorimotor skills.

1. Simulated Body: Fixed Speed

NCAP successfully learns to locomote across various speeds and terrains. These representative videos show the robot on Flat terrain at Walk (0.5 m/s) and Run (1.0 m/s) speeds. We simultanously visualize the Rhythm Generation (RG) neural activity and footfall plots. As brainstem command to RG increases, the expressed gait transitions from walk to bound. Having an achitectural prior thus enables control of the resulting behavior without additional reward or imitation priors.

Videos include sound

2. Simulated Body: Naturalistic Gaits

Examining the qualitative performance of MLP reveals high variability that is occluded in the quantitative performance curves. MLP develops a good walking gait on seed 3, a mediocre limping gait on seed 2, and a failed on-the-floor shuffle on seed 1.
In contrast, NCAP exhibits more naturalistic and consistent gaits due to its RG prior.

3. Simulated Body: Variable Speed

We investigate a more complex task in which the robot must transition between speeds within an episode. NCAP uses the target speed to modulate the brainstem command that controls the RG circuit. This enables it to alter its gait in a speed-dependent manner.

MLP

MLP learns to alter its speed, but its posture and gait are not naturalistic without additional priors.

4. Physical Body

We deploy the architectures to a physical Unitree A1 quadruped robot. We expect the domain gap to be large since we do not perform controller tuning or system identification with the physical robot, the simulated training does not apply aggressive domain randomization, and the architectures do not have mechanisms for online adaptation.

MLP (Trained)

MLP falls immediately due to its erratic and unstable actions, often launching the robot aggressively into the wall.

NCAP (Untrained)

Even before training, NCAP is stable and produces slight walking movements with small foot displacements.

NCAP (Trained)

After training, NCAP walks successfully on most trials, though with less smoothness than in simulation.