OMNI-EPIC

Maxence Faldor^{1 *}
Jenny Zhang^{2,3 *}
Antoine Cully^{1 †}
Jeff Clune^{2,3,4 †}

¹Imperial College London ²University of British Columbia

³Vector Institute ⁴Canada CIFAR AI Chair

^*Co-authors ^†Co-senior authors

Abstract

Open-ended and AI-generating algorithms aim to continuously generate and solve increasingly complex tasks indefinitely, offering a promising path toward more general intelligence. To accomplish this grand vision, learning must occur within a vast array of potential tasks. Existing approaches to automatically generating environments are constrained within manually predefined, often narrow distributions of environment, limiting their ability to create any learning environment. To address this limitation, we introduce a novel framework, OMNI-EPIC, that augments previous work in Open-endedness via Models of human Notions of Interestingness (OMNI) with Environments Programmed in Code (EPIC). OMNI-EPIC leverages foundation models to autonomously generate code specifying the next learnable (i.e., not too easy or difficult for the agent's current skill set) and interesting (e.g., worthwhile and novel) tasks. OMNI-EPIC generates both environments (e.g., an obstacle course) and reward functions (e.g., progress through the obstacle course quickly without touching red objects), enabling it, in principle, to create any simulatable learning task. We showcase the explosive creativity of OMNI-EPIC, which continuously innovates to suggest new, interesting learning challenges. We also highlight how OMNI-EPIC can adapt to reinforcement learning agents' learning progress, generating tasks that are of suitable difficulty. Overall, OMNI-EPIC can endlessly create learnable and interesting environments, further propelling the development of self-improving AI systems and AI-Generating Algorithms.

Short Run with Learning

To demonstrate OMNI-EPIC's ability to generate tasks of suitable difficulty for training RL agents, we conducted a run with RL agent training. OMNI-EPIC leverages previously learned tasks as stepping stones to generate and master more challenging tasks. This iterative process allows RL agents to build upon existing skills to tackle increasingly complex environments.

OMNI-EPIC adapts to the current capabilities of trained RL agents, generating tasks that are both interesting and learnable. Tasks deemed interesting that are successfully learned are marked by a check and failures by a cross. Uninteresting tasks are not trained on and hence not included here. Arrows between tasks indicate instances where OMNI-EPIC modified a task that the RL agent failed to learn, adjusting the task difficulty to facilitate learning.

Method

OMNI-EPIC leverages FMs, including large language models (LLMs) and vision-language models (VLMs), to autonomously create an endless stream of learnable and interesting tasks for open-ended learning. OMNI-EPIC maintains a growing task archive that catalogs successfully learned and completed tasks, as well as unsuccessfully attempted ones. The task generator uses information from the archive about what has been learned and what has not, proposing the next new task, described in natural language, for the agent to attempt. These tasks are then translated into environment code by an environment generator, specifying the simulated world and functions required for RL. The newly generated task and its environment code are assessed by a model of interestingness, which emulates human capacity for nuanced judgments of interestingness in open-ended learning, to determine if the task is indeed interesting. Tasks deemed interesting are then used to train an RL agent. If deemed uninteresting, the task is discarded, and a new task is generated. After training, a success detector assesses whether the agent has successfully completed the task. Successfully completed tasks are added to the archive. Failed tasks are iterated upon a maximum number of times and added to the archive as failed tasks if the RL agents are not able to solve them. Then, the cycle of generating the next task restarts. OMNI-EPIC's iterative process ensures continuous generation and learning of new interesting tasks, forming a never-ending, growing collection of environments and learned agents.

Long Run with Simulated Learning

To illustrate the creative explosion of generated tasks, we run OMNI-EPIC without training RL agents, assuming all generated tasks can be successfully completed. OMNI-EPIC generates tasks that significantly diverge from the seed tasks used to initialize the archive. OMNI-EPIC not only explores different task niches (e.g., navigating across different terrains vs. retrieving objects) but also generates interesting variations within each niche (e.g., retrieving objects in different simulated world settings).

Explore the interactive graph below!

OMNI-EPIC generates a diverse array of tasks, ranging from wildly different objectives to interesting variations of similar overarching tasks. The node color reflects the generation number of the task. A check mark in the node means that the task was successfully learned. A ZZZ symbol means that the task was deemed uninteresting and discarded. The node connections illustrate which tasks were conditioned on when asking an FM to generate a similar yet new and interesting task. Grey nodes show task description seeds that initialized the run.

Experiments on Different Robots

Ant robot successfully completing different generated tasks. These examples highlights OMNI-EPIC's ability to train various robot types and operate across different action spaces.

Conclusion

In conclusion, OMNI-EPIC represents a leap towards open-ended learning by generating an endless stream of learnable and interesting tasks. Intriguingly, it also provides a new way of creating human entertainment and educational resources by offering a limitless supply of engaging challenges. OMNI-EPIC could potentially be applied in myriad ways, covering anything from math problems and poetry challenges to games and virtual worlds. By leveraging FMs to create tasks and environment code, OMNI-EPIC opens up a vast space of possibilities for AI and human agents to explore and master. By combining that expressive power with human notions of interestingness, OMNI-EPIC presents a promising path towards the development of truly open-ended and creative AI.

Acknowledgements

This research was supported by the Vector Institute, the Canada CIFAR AI Chairs program, a grant from Schmidt Futures, an NSERC Discovery Grant, the Center for AI Safety Compute Cluster, DARPA and a generous donation from Rafael Cosman. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors. We also thank Aaron Dharna, Arthur Braida, Ben Norman, Cong Lu, Gabriel Béna, Luca Grillotti, Rach Pradhan, and Shengran Hu, for insightful discussions and feedback.

If you find this work useful, please cite it as:

@article{faldor2024omni,
title={OMNI-EPIC: Open-endedness via models of human notions of interestingness with environments programmed in code},
author={Faldor, Maxence and Zhang, Jenny and Cully, Antoine and Clune, Jeff},
journal={arXiv preprint arXiv:2405.15568},
year={2024}
}