Core Concepts
Cogment is built around concepts adapted from multi-agent systems (agents, environment), Markov decision processes (action and observation space) and reinforcement learning (trials, rewards).
Observations & actions
Observations and actions are key concepts of Cogment, they are main input / output of the different components:
- Environments take actions as an input and output observations,
- Actors take observations as an input and output actions.
This discrete framework makes it easy to model the sequential decision making of the Actors in Environments, it is borrowed from Markov Decision Process (MDP) formalism, in particular Partially Observable MDP (POMDP): each Actor can get a different Observation from the environment representing what it perceives about the state of the world. The Action represents the decision the Actor takes upon receiving this observation, this action is then applied by the Environment. Cogment leverages this discrete update to orchestrate the execution of the components and the dispatch of data between them.
Trials
Trials are what a Cogment deployment runs. They enable Actors to interact with their Environment. Trials are started by clients connecting to Cogment. A trial can end either by being terminated from a client or end by itself, for example once a specific state of the Environment is reached.
During the trial:
- The Environment generates observations of its internal state and sends them to the actors.
- Given these observations, each actor chooses and sends an action.
- The Environment receives the actions and updates its state.
- Rewards can be sent to the actors from either the environment or other actors. A sent reward is a measure of an actor’s performance within the environment at a given point in time during the trial.
- Actors receive rewards if at least one was sent to them.
- The actors or the environment can send messages to actors or the environment.
- A log of the activity during the trial (observations, actions, rewards & messages) is produced and can be stored.
A trial is defined by the participating Actors and the host Environment. As a concept, Trials are quite close to Reinforcement Learning's Episodes, i.e. all the states that come between an initial state and a terminal state. However, because Cogment can be used outside of an RL context, we prefer using the more generic term of Trial.
Actors
Actors within a trial instantiate actor classes defined by the nature of the information they receive from the Environment, their observation space, and what actions they can perform, their action space.
In Cogment, the observation and action space are defined as typed data structures. In particular, Cogment uses Protocol Buffers - protobuf - as a format to specify these data structures. This typing defines both an interface contract between the Actors and the Environment and helps convey semantic information, thus facilitating the independent design and development of both.
An Actor might be controlled either by a software agent, or by a Human. Whichever the case, the process of generating actions based on observations remains the same, and the Environment treats them the same.
Some Actors connect to the trial (we call them "client" Actors) and others will wait for the trial to connect to them (we call these "service" Actors).
Environment
The Environment is the context within which the Trial takes place. The Environment receives the actions done by the Actors, usually updates an internal state, and generates an observation for each Actor.
The Environment is the main integration point between Cogment and an external system, either a simulation or a real world system.
The spec file
At the heart of every Cogment project is a YAML spec file typically called cogment.yaml
. It specifies the trials for this project including its actor classes and their action & observation spaces. You can learn more about the specification file in the dedicated reference page
Architecture
Running trials with Cogment usually involves the deployment of a cluster of services and clients. These components are either provided by the Cogment framework, depicted below in blue, or implemented for a particular project, depicted below in orange.
User implemented components use one of the Cogment SDKs or directly implement the underlying protocol. Components communicate using gRPC, clients can also communicate in a web-friendly way using gRPC-Web and grpcwebproxy.
Orchestrator
The Orchestrator is the glue that binds everything together. It is responsible for running the Trials and contacting other services as needed to ensure their execution.
The key aspect of Cogment's orchestrator is its capacity to handle a number of network connections in parallel while keeping its responsiveness.
Controller
The Controller is a key part of using Cogment, it initiates communication with the Orchestrator to control the execution of Trials. It is responsible for starting Trials, retrieving and watching their state (including the end of the trial), or requesting trial termination.
Environment
The Environment implementation is accessed by the Orchestrator to run the Environment during Trials.
Using one of Cogment's SDKs, the Environment can be implemented as a function integrating a "state of the world" with the Trial. This function performs the following tasks during the Trial:
- Generate Observations from the current state of the world, for example retrieving the visible objects from a 3D simulation.
- Apply the Actions, thus updating the state of the world, for example changing the velocity of a moving vehicle in a race simulation.
- Evaluate the performance of Actors and send them Rewards, for example by checking if a vehicle crossed the finish line in a race simulation.
- Send and receive direct messages.
Actors
Actors can be implemented in two different ways, either as a service or as a client. Service Actor implementations are accessed by the Orchestrator during Trials, while Client Actor implementations join a Trial by initiating the communication with the Orchestrator. Client Actors implementations can reach a Cogment deployment through NAT traversal. This makes them particularly well-suited to implement human-driven Actors, in web-browsers for example.
Using one of Cogment's SDKs Actors can be implemented as functions handling the integration between a decision-making Actor (software agent or Human) and the Trial. This function performs the following tasks during the Trial:
- Receive Observations and do Actions in response, for example vectorizing the retrieved observation, feeding it to a neural network and converting its output to an Action.
- Receive Rewards, for example using them to update a neural network.
- Send and receive direct messages.
Please note that rewards can also be retrieved after the fact using a datalog.
Additional components
On top of the core components described above, a Cogment deployment can include these additional ones:
- Datalog services can be used to listen to the activity during a trial (actions, observations, rewards, messages) in order to, for example, store these data for the offline training of AI agents. Trial Datastore is an out-of-the-box implementation of this.
- Model Registry handles the storage and dispatch of AI models trained with Cogment and used by the actors.
- Directory handles the publishing and discovery of cogment services.
- Pre-Trial Hooks can be used to dynamically setup Trials from a given configuration, for example changing the number of Actors or pointing to other Environment or Actor implementations.
Components availability summary
The following table summarizes how each component can either be implemented or used out of the box.