Reinforcement Learning Approach to Robot Navigation and Obstacle Avoidance


GitHub

Video demo


Motivation

The deployment of mobile robots in real world is a challenging task, since the robots are expected to safely navigate themselves in new or dynamic environments, where perserving an explicit map can be impractical. Online pathfinding algorithms and SLAM algorithms have been developed and optimized for a long time to solve this challenge. While recent years have seen application of reinforcement learning recorded than any other time, the potential of applying RL on mobile robots cannot be easily neglected. Therefore, a RL approach to robot navigation and obstacle avoidance is presented in this project.

Overview

This project aims to train a deep reinforcement learning model for TurbleBot 3 to perform navigation and obstacle avoidance in a random environment. A training framework based on curriculum learning strategy as well as a training pipeline in simulated environment have been developed to achieve this goal. It is assumed that no priori information about the environment is given to the robot, but only distance, relative bearing and lidar readings are accessible.

Software

Training Strategy: Curriculum Learning

The mission of the desired model is doing robot navigation and obstalce avoidance in a static environment with a high level of randomness. Training a model to fulfill all these three elements requires a complicated environment and a complex reward function, which is a nightmare for debugging. Learning all of them at once is even more unrealistic, since the model can easily get stuck in any local optimum. Therefore the curriculum learning strategy is adopted for this project.

Curriculum learning let the model to be trained on increasing difficulty. The model is going to learn the general principles from easier cases at early phases, and then be exposed to more complex and nuanced cases gradually at later phases to incorporate higher level information.

This project is divided into the following three phases (six sub-phases).

Phase 1: Navigation in an Obstacle Free Environment

Phase 2 Navigation and Obstacle Avoidance in a Fixed Environment

Phase 3 Navigation and Obstacle Avoidance in a Random Environment
Now a high level of randomness is added to the training process. The robot spawning, goal setting and obstacle placing are all random for each episode!

Future Scope

  1. Implement More RL Algorithms: Due to the time constraint of the winter quarter, the development on this project by this post so far only has a span of 10 weeks. Most of the models across Phase 1 to Phase 3 were trained with PPO, some of them were also trained with A2C. Stable Baselines 3 provides optimized implementation of a great number of RL algorithms, which should definitely be tried on this topic.
  2. Deploy Trained Models on a Real TurtleBot3: The performance of trained models has only been verified in simulation. Model deployment on real robots is expected in the future.
  3. Train the Model in an Dynamic Environment: So far the training has only been conducted in many static environments, while the real world is highly dynamic. Moving objects can be added into the training environment in the future.

Citations

  1. Dobrevski M, Skočaj D. Deep reinforcement learning for map-less goal-driven robot navigation. International Journal of Advanced Robotic Systems. 2021;18(1). doi:10.1177/1729881421992621