Part IV Project Management System

Description:

Pokémon Blue (and its counterpart, Pokémon Red) are excellent games to explore Reinforcement Learning. Reinforcement learning is a type of machine learning where an agent learns to make decisions by interacting with an environment. The agent receives feedback through rewards or penalties based on the actions it takes. The goal of reinforcement learning is for the agent to learn a policy, a set of rules or strategies, that maximizes the cumulative reward over time. The learning process involves the agent exploring different actions and exploiting its knowledge to make better decisions. Reinforcement learning is commonly used in scenarios where explicit programming is challenging, and the optimal strategy needs to be discovered through trial and error. It has applications in various fields, including robotics, game playing, finance, and natural language processing.

Modern RL agents can easily play Atari Games and even learn to carry out simple physical tasks on robotic platforms - however, these examples are highly constrained compared to learning to conduct complex tasks in real-world environments. Pokemon presents an interesting complex task to explore advanced RL techniques:

State Space Complexity: While the game is not overly complex compared to some modern video games, it still offers a reasonably rich state space. The Pokémon world includes various locations, Pokémon species, moves, and strategic interactions. This complexity allows for interesting challenges in learning optimal strategies in large and diverse environments.

Exploration-Exploitation Tradeoff: In Pokémon, there is a need for the player to balance exploration (trying new moves or strategies) and exploitation (using known effective moves). Reinforcement learning algorithms often deal with the exploration-exploitation tradeoff, making Pokémon a good environment for experimenting with such concepts. How do you enable an agent to quickly and efficiently explore a complex state space safely?

Reward Design: Reinforcement learning requires a well-defined reward structure to guide the learning process. In Pokémon, winning a battle is an obvious positive reward, while losing is a negative one. Crafting a reward system that encourages the learning of effective strategies can be an interesting aspect of reinforcement learning experimentation. But what is a reward system that enables an AI to catch them all or become the Pokemon master? How do you balance short-term goals (winning battles) vs. long-term goals of progressing the game?

Long Horizons: The player in Pokemon has to complete a long series of puzzles and quests to progress the story and complete the game - there isn't a clear indication of progress and in some cases the player has to back-track through the environment. How do you train the agent to learn how to solve the puzzles? how does the agent learn to teach their Pokemon HM moves to progress the story? Is there a means of doing this that doesn't rely on expert knowledge and can allow the agent to learn the context of these requirements without human input?

Action Space: How does an agent learn complex actions (e.g. training pokemon to level up, catching pokemon) from lower-level action sequences (button presses)? How can an RL agent learn a hierarchy of higher-order actions from simple actions?

This research will explore using Intrinsic Motivation theory to teach an AI agent to play Pokemon. Intrinsic motivation theory suggests that individuals are driven to engage in activities for the inherent satisfaction and enjoyment they derive from the activities themselves, rather than relying solely on external rewards or punishments. This theory posits that people have an innate desire to explore, learn, and master new skills, driven by their internal psychological needs for autonomy, competence, and relatedness. Intrinsic motivation is characterized by a genuine interest and personal enjoyment in the task at hand, fostering a sense of curiosity and self-determination. Researchers often highlight the importance of fostering intrinsic motivation as it is associated with sustained effort, creativity, and improved overall well-being.

Type:

Undergraduate

Outcome:

The outcome of this project will be:

A benchmark set of experiments comparing the state-of-the-art in RL against novel Intrinsically motivated RL approaches on Pokemon Red or Blue.
A detailed report and evaluation of the performance of the implemented learning systems
A publication to an international conference.

Prerequisites

None

Specialisations

Software Engineering

Supervisor

Henry Williams

Co-supervisor

Trevor Gee

Team

Samuel Boasman
PK Wadsworth

Lab

Robotics (405.652, Lab)

Project #11: AI Plays Pokémon - can AI become the very best like no one ever was