This short post introduces how to create your own OpenAI gym environment.

The problem

The environment I’ll create is “Blocking Maze” from Sutton’s RL book [1] (Chapter 8):

The environment is a simple 6x9 grid world with a wall (the shady area). The agent needs to find the rightmost path to reach the goal (left image). In this problem, the wall position will change after some iterations, and the agent needs to find a new optimal path again then (right image)

blocking_maze

Image credit: [1]

Code

There’s an official document here How to create new environments for Gym. You can understand what you’ll need based on this.

According to the document, you need files/directories like this:

gym-foo/
  README.md
  setup.py
  gym_foo/
    __init__.py
    envs/
      __init__.py
      foo_env.py
      foo_extrahard_env.py

So, let’s make it!

1. Top level directory

Create a project directory and README.md first (README is just a blank file at this point).

mkdir blocking_maze
cd blocking-maze
touch README.md
2. setup.py
from setuptools import setup

setup(name='blocking_maze',
      version='0.0.1',
      install_requires=['gym']
)
3. Create other directories inside
mkdir blocking_maze
cd blocking_maze
mkdir envs
4. blocking-maze/blocking_maze/__init__.py
from gym.envs.registration import register

register(
    id='blocking-maze-v01',
    entry_point='blocking_maze.envs:MazeEnv1',
)
5. blocking-maze/blocking_maze/envs/__init__.py
from blocking_maze.envs.blocking_maze_env01 import MazeEnv1
6. blocking-maze/blocking_maze/envs/blocking_maze_env01.py

For now, let’s just create a template. You can just copy and paste the snippet from the document.

import gym
from gym import error, spaces, utils
from gym.utils import seeding

class FooEnv(gym.Env):
    metadata = {'render.modes': ['human']}

    def __init__(self):
        print("init")
    def step(self, action):
        pass
    def reset(self):
        pass
    def render(self, mode='human'):
        pass
    def close(self):
        pass

We need to implement step, reset and render functions, otherwise you’ll see NotImplementedError: gym/gym/core.py

Although the code doesn’t do anything (I’ll provide the full code later), let’s test the installation now.

Install

Move to the parent directory (the directory which contains blocking-maze/), then

pip install -e blocking-maze

Then, you can check like this

$ python
>>> import gym
>>> gym.make('blocking_maze:blocking-maze-v01')
init
<blocking_maze.envs.blocking_maze_env01.MazeEnv1 object at 0x1052b9cf8>
>>>

Alright, seems working :)

Defining the details

We just confirmed that we could use the custom environment. Now let’s define details.

Well, the implementation itself is not complicated, so please check this gist: blocking_maze_env01.py

What you need to do is:

  1. Define your map (I used ‘w’ to represent wall and whitespace as walkable tiles)
  2. Implement all necessary functions (step, reset, and render) along with some other helper functions

I also set action_space and observation_space as follows (but this is not necessary. Your code runs without this)

self.action_space = spaces.Discrete(len(self.actions)
self.observation_space = spaces.Discrete(46)

Once you implement the details, test it like this:

import random
import gym

env = gym.make('blocking_maze:blocking-maze-v01')
env.render()

done = False
while not done:
    action = random.randrange(4)
    observation, reward, done, info = env.step(action)
    env.render()
    print(env.a_loc)

blocking_maze_exec

Yay, it works!

I implemented the switch_maze function which changes a map layout from 1 to 2. You can use this to design your experiment as an example from the book (e.g., environment changes after 1,000 timesteps). Enjoy!

References

[2]: How to create new environments for Gym