Perception and Action: Carnegie Mellon University Researchers Design Novel Method for Teaching Drones

Researchers at Carnegie Mellon University have designed a novel method that allows drones to learn perception and action separately. This two-stage approach overcomes the “simulation-to-reality gap,” and creates a way to move drones from a simulated environment into real-world course navigation.

“Typically drones trained on even the best photorealistic simulated data will fail in the real world because the lighting, colors and textures are still too different to translate,” said Rogerio Bonatti, a doctoral student in the School of Computer Science’s Robotics Institute. “Our perception module is trained with two modalities to increase robustness against environmental variabilities.”

The researchers first used a photorealistic simulator to create an environment that included the drone, a soccer field, and red square gates positioned randomly to create a track. They then used this simulated environment to build a large dataset of images which were used to train the drone to perceive its position and orientation in space. Using multiple modalities reinforced the drone’s experience so that it could “understand” the essence of the field and gates in a way that translates from simulation to reality. The images were compressed to have fewer pixels, creating a low-dimensional representation which aided the model in seeing through the visual noise in the real world.

The drone learned to navigate the course by going through training steps controlled by the researchers. It learned which velocity to apply as it navigated the course and encountered each gate.

Bonatti said: “I make the drone turn to the left and to the right in different track shapes, which get harder as I add more noise. The robot is not learning to recreate going through any specific track. Rather, by strategically directing the simulated drone, it’s learning all of the elements and types of movements to race autonomously.”

The paper, Learning Visuomotor Policies for Aerial Navigation Using Cross-Modal Representations, has been accepted to the International Conference on Intelligent Robots and Systems 2020, and is open-sourced and available for other researchers.