# Continual Learning of Control Primitives: Skill Discovery via Reset-Games

@article{Xu2020ContinualLO, title={Continual Learning of Control Primitives: Skill Discovery via Reset-Games}, author={Kelvin Xu and Siddharth Verma and Chelsea Finn and Sergey Levine}, journal={ArXiv}, year={2020}, volume={abs/2011.05286} }

Reinforcement learning has the potential to automate the acquisition of behavior in complex settings, but in order for it to be successfully deployed, a number of practical challenges must be addressed. First, in real world settings, when an agent attempts a task and fails, the environment must somehow "reset" so that the agent can attempt the task again. While easy in simulation, this could require considerable human effort in the real world, especially if the number of trials is very large… Expand

#### 6 Citations

Autonomous Reinforcement Learning via Subgoal Curricula

- Computer Science
- 2021

Value-accelerated Persistent Reinforcement Learning is proposed, which generates a curriculum of initial states such that the agent can bootstrap on the success of easier tasks to efficiently learn harder tasks and reduces the reliance on human interventions into the learning. Expand

Persistent Reinforcement Learning via Subgoal Curricula

- Computer Science
- ArXiv
- 2021

Value-accelerated Persistent Reinforcement Learning is proposed, which generates a curriculum of initial states such that the agent can bootstrap on the success of easier tasks to efficiently learn harder tasks and reduces the reliance on human interventions into the learning. Expand

Explore and Control with Adversarial Surprise

- Computer Science
- ArXiv
- 2021

It is shown that Adversarial Surprise learns more complex behaviors, and explores more effectively than competitive baselines, outperforming intrinsic motivation methods based on active inference, novelty-seeking, and multi-agent unsupervised RL in MiniGrid, Atari and VizDoom environments. Expand

Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching

- Computer Science
- ArXiv
- 2021

A novel algorithm designed to maximize coverage while ensuring a constraint on the directedness of each skill is proposed, with a decoupled policy structure with a first part trained to be directed and a second diffusing part that ensures local coverage. Expand

Automatic Curricula via Expert Demonstrations

- Computer Science
- ArXiv
- 2021

We propose Automatic Curricula via Expert Demonstrations (ACED), a reinforcement learning (RL) approach that combines the ideas of imitation learning and curriculum learning in order to solve… Expand

Long-Term Exploration in Persistent MDPs

- Computer Science
- MICAI
- 2021

This paper proposes an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process, in which agents during training can roll back to visited states. Expand

#### References

SHOWING 1-10 OF 55 REFERENCES

Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning

- Computer Science, Mathematics
- ICLR
- 2018

This work proposes an autonomous method for safe and efficient reinforcement learning that simultaneously learns a forward and reset policy, with the reset policy resetting the environment for a subsequent attempt. Expand

Learning compound multi-step controllers under unknown dynamics

- Computer Science
- 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- 2015

It is demonstrated that a recently developed method that optimizes linear-Gaussian controllers under learned local linear models can tackle this sort of non-stationary problem, and that training controllers concurrently with a corresponding reset controller only minimally increases training time. Expand

Diversity is All You Need: Learning Skills without a Reward Function

- Computer Science
- ICLR
- 2019

The proposed DIAYN ("Diversity is All You Need"), a method for learning useful skills without a reward function, learns skills by maximizing an information theoretic objective using a maximum entropy policy. Expand

On the sample complexity of reinforcement learning.

- Computer Science
- 2003

Novel algorithms with more restricted guarantees are suggested whose sample complexities are again independent of the size of the state space and depend linearly on the complexity of the policy class, but have only a polynomial dependence on the horizon time. Expand

The Ingredients of Real-World Robotic Reinforcement Learning

- Computer Science, Mathematics
- ICLR
- 2020

This work discusses the required elements of a robotic system that can continually and autonomously improve with data collected in the real world, and proposes a particular instantiation of such a system, and demonstrates the efficacy of this proposed system on dexterous robotic manipulation tasks in simulation and thereal world. Expand

Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning

- Computer Science
- Artif. Intell.
- 1999

It is shown that options enable temporally abstract knowledge and action to be included in the reinforcement learning frame- work in a natural and general way and may be used interchangeably with primitive actions in planning methods such as dynamic pro- gramming and in learning methodssuch as Q-learning. Expand

Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition

- Computer Science, Mathematics
- NeurIPS
- 2018

Variational inverse control with events (VICE) is proposed, which generalizes inverse reinforcement learning methods to cases where full demonstrations are not needed, such as when only samples of desired goal states are available. Expand

Skew-Fit: State-Covering Self-Supervised Reinforcement Learning

- Computer Science, Mathematics
- ICML
- 2020

This paper proposes a formal exploration objective for goal-reaching policies that maximizes state coverage and presents an algorithm called Skew-Fit, which enables a real-world robot to learn to open a door, entirely from scratch, from pixels, and without any manually-designed reward function. Expand

Temporal abstraction in reinforcement learning

- Computer Science
- ICML 2000
- 2000

A general framework for prediction, control and learning at multiple temporal scales, and the way in which multi-time models can be used to produce plans of behavior very quickly, using classical dynamic programming or reinforcement learning techniques is developed. Expand

Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play

- Computer Science, Mathematics
- ICLR
- 2018

This work describes a simple scheme that allows an agent to learn about its environment in an unsupervised manner, and focuses on two kinds of environments: (nearly) reversible environments and environments that can be reset. Expand