navigation_policy_demo

Example training code using stable-baselines3 PPO for one BEHAVIOR activity. Note that due to the sparsity of the reward, this training code will not converge and achieve task success. This only serves as a starting point that users can further build upon.