Step 4: Add a second actor implementation based on a heuristic
This part of the tutorial follows step 3, make sure you've gone through it before starting this one. Alternatively the completed step 3 can be retrieved from the tutorial's repository.
In this step of the tutorial, we will go over another actor implementation and learn about using the received observations before doing an action.
Creating a second actor implementation
Let's start by creating another implementation of the player
actor class. Because we expect it to be rather small and not use additional dependencies, this second implementation will live in the same service as the previous one. We will start by copying the random_agent
implementation.
In random_agent/main.py
copy/paste the random_agent
function and name it heuristic_agent
.
async def heuristic_agent(actor_session):
Then, in the same file, register this "new" implementation in the main
function.
context.register_actor(
impl=heuristic_agent,
impl_name="heuristic_agent",
actor_classes=["player"])
When the service starts it will now host the two implementations.
We can now configure one of the player
in the default trial, defined in client/main.py
, to use the heuristic_agent
implementation.
actor_2_params = cogment.ActorParameters(
cog_settings,
name="player_2",
class_name="player",
endpoint=RANDOM_AGENT_ENDPOINT,
implementation="heuristic_agent"
)
Modify the client/main.py
file to include the above addition.
You can now build and run the application to check that it still works. Nothing should have changed except one of the player uses the code from the new implementation.
Implementing a simple heuristic's agent
While the random_player
ignored the state of the game, picking its move at random, our new implementation will consider the received observations to pick its move.
We will implement a subset of the strategies described in this article:
- If I won the last round, do the same thing,
- If my opponent won the last round, play the move that would have won against his,
- If the last round was a draw, play a random move.
We will start by redefining in random_agent/main.py
the same DEFEATS
we used by the environment.
DEFEATS = {
ROCK: PAPER,
SCISSORS: ROCK,
PAPER: SCISSORS
}
Then, in the event loop, we look at the received observation before taking an action based on this simple strategy.
observation = event.observation
print(f"'{actor_session.name}' received an observation: '{observation}'")
if event.type == cogment.EventType.ACTIVE:
if observation.snapshot.me.won_last:
# I won the last round, let's play the same thing
actor_session.do_action(PlayerAction(move=observation.snapshot.me.last_move))
elif observation.snapshot.them.won_last:
# I lost the last round, let's play what would have won
actor_session.do_action(PlayerAction(move=DEFEATS[observation.snapshot.them.last_move]))
else:
# last round was a draw, let's play randomly
actor_session.do_action(PlayerAction(move=random.choice(MOVES)))
Modify the random_player/main.py
file accordingly.
You can now build and run the application to check that it works. Don't expect the heuristic player to beat the random player, the nature of the game actually rewards pure randomness in the playing. You can however implement various strategies and see how they fare against each other.
This concludes the step 4 of the tutorial: you've learned about adding and using different implementations of an actor class and how to access and use the received observations.
Let’s move on to adding a human player in the mix with step 5.