📑 Tasks
Description
Task
s define the high-level objectives that an agent must complete in a given Environment
, subject to certain constraints (e.g. not flip over).
Task
s have two important internal variables:
_termination_conditions
: a dict of {str
:TerminationCondition
} that define when an episode should be terminated. For each of the termination conditions,termination_condition.step(...)
returns a tuple of(done [bool], success [bool])
. If any of the termination conditions returnsdone = True
, the episode is terminated. If any returnssuccess = True
, the episode is cnosidered successful._reward_functions
: a dict of {str
:RewardFunction
} that define how the agent is rewarded. Each reward function has areward_function.step(...)
method that returns a tuple of(reward [float], info [dict])
. Thereward
is a scalar value that is added to the agent's total reward for the current step. Theinfo
is a dictionary that can contain additional information about the reward.
Task
s usually specify task-relevant observations (e.g. goal location for a navigation task) via the _get_obs
method, which returns a tuple of (low_dim_obs [dict], obs [dict])
, where the first element is a dict of low-dimensional observations that will be automatically flattened into a 1D array, and the second element is everything else that shouldn't be flattened. Different types of tasks should overwrite the _get_obs
method to return the appropriate observations.
Task
s also define the reset behavior (in-between episodes) of the environment via the _reset_scene
, _reset_agent
, and _reset_variables
methods.
_reset_scene
: reset the scene for the next episode, default isscene.reset()
._reset_agent
: reset the agent for the next episode, default is do nothing._reset_variables
: reset any internal variables as needed, default is do nothing.
Different types of tasks should overwrite these methods for the appropriate reset behavior, e.g. a navigation task might want to randomize the initial pose of the agent and the goal location.
Usage
Specifying
Every Environment
instance includes a task, defined by its config that is passed to the environment constructor via the task
key.
This is expected to be a dictionary of relevant keyword arguments, specifying the desired task configuration to be created (e.g. reward type and weights, hyperparameters for reset behavior, etc).
The type
key is required and specifies the desired task class. Additional keys can be specified and will be passed directly to the specific task class constructor.
An example of a task configuration is shown below in .yaml
form:
point_nav_example.yaml
Runtime
Environment
instance has a task
attribute that is an instance of the specified task class.
Internally, Environment
's reset
method will call the task's reset
method, step
method will call the task's step
method, and the get_obs
method will call the task's get_obs
method.
Types
OmniGibson
currently supports 5 types of tasks, 7 types of termination conditions, and 5 types of reward functions.
Task
DummyTask Dummy task with trivial implementations.
|
PointNavigationTask PointGoal navigation task with fixed / randomized initial pose and goal location.
|
PointReachingTask Similar to PointNavigationTask, except the goal is specified with respect to the robot's end effector.
|
GraspTask Grasp task for a single object.
|
BehaviorTask BEHAVIOR task of long-horizon household activity.
|
Follow our tutorial on BEHAVIOR tasks!
To better understand how to use / sample / load / customize BEHAVIOR tasks, please read our BEHAVIOR tasks documentation!
TerminationCondition
Timeout FailureCondition : episode terminates if max_step steps have passed.
|
Falling FailureCondition : episode terminates if the robot can no longer function (i.e.: falls below the floor height by at least
fall_height or tilt too much by at least tilt_tolerance ).
|
MaxCollision FailureCondition : episode terminates if the robot has collided more than max_collisions times.
|
PointGoal SuccessCondition : episode terminates if point goal is reached within distance_tol by the robot's base.
|
ReachingGoal SuccessCondition : episode terminates if reaching goal is reached within distance_tol by the robot's end effector.
|
GraspGoal SuccessCondition : episode terminates if target object has been grasped (by assistive grasping).
|
PredicateGoal SuccessCondition : episode terminates if all the goal predicates of BehaviorTask are satisfied.
|
RewardFunction
CollisionReward Penalization of robot collision with non-floor objects, with a negative weight r_collision .
|
PointGoalReward Reward for reaching the goal with the robot's base, with a positive weight r_pointgoal .
|
ReachingGoalReward Reward for reaching the goal with the robot's end-effector, with a positive weight r_reach .
|
PotentialReward Reward for decreasing some arbitrary potential function value, with a positive weight r_potential .
It assumes the task already has get_potential implemented.
Generally low potential is preferred (e.g. a common potential for goal-directed task is the distance to goal).
|
GraspReward Reward for grasping an object. It not only evaluates the success of object grasping but also considers various penalties and efficiencies. The reward is calculated based on several factors:
|