Skip to main content


Terminology is a very important part of understanding new concepts and learning how to use new technology. The words we use throughout our documentation may cause problems if one is not familiar with how we use those words; this is a glossary of terms for newcomers and seasoned developers alike. Some familiar terms may have additional caveats specifically added to their definition in the context of the Cogment Framework (generally for clarity).




An actor is somebody or something who/which interacts with the environment by executing certain actions, taking observations in, and receiving rewards (positive or negative) for this interaction. An Actor can be an Agent (of any level of complexity and any type of flexibility, from bots to ML agents), or a human user.

Actor Class

Each Actor always belongs to a single Actor Class. An Actor Class is primarily defined by its associated Action Space, as a property of an environment. For example, pilot and passenger could be two different Actor Classes.


  1. An Action is an interaction an Actor performs on the environment. Actions are picked from the Action Space,
  2. A single element of an Action Space.


We usually call agent any non-human Actor. Agents can use any sort of decision-making underlying system, able to learn or not.



The configurations (or "configs") are defined, set and referenced by users for user components and do not affect the rest of Cogment. They refer to the protobuf messages defined in the trial specifications in the config sections. A "trial config" may be given at the start of a new trial. Ultimately, the configurations and parameters for a trial are managed by the pre-trial hooks.



  1. The environment is the set of rules defining how a trial evolves over time for any given use case. For example, to train a pilot agent, a flight simulation would be the environment. Actors can interact with the environment itself, or with each other through the environment, within the boundaries of the environment ruleset (i.e. how an environment can change, from environmental rulesets or the actions of Actors in the environment).
  2. A stateful instance of an environment.

Environment State

An environment state is the specific set of conditions in which the environment is at a specific time (for example, when it is first instantiated). These conditions can be observable or not, and our Framework does not concern itself with the ones that are not.



These two elements combined are what we call the framework:


The interface, usually an app, that humans use to interact with the rest of the system; the software that turns humans into Actors.


Human - Artificial Intelligence Interaction Loop Training

We call Human - AI interaction loop training the fundamental paradigm our Framework was build for: a continuous loop between humans and agents where they learn from each other. It’s a way to train agents in an environment where direct human interactions, whether between humans, between humans and the environment, or between humans and agents, provide live data to the agents (first part of the loop), as well as a way for agents to interact with humans, either directly or through the environment (second part of the loop).



Messages can be sent from any actor or the environment to any actor or the environment. The message can be any protobuf class. This creates channels between any set of actors and the environment. These channels can be used for applications where communication between actors and the environment need to be outside of the standard observation and action spaces.


A model is a representation, usually a mathematical one in our context, of a concept, structure, system, or an aspect of the real world. It is usually a simplified and abstracted representation.



An observation is the subset of the environment state that an Actor based its choice of Action (or lack thereof) on.

Observation transition

An observation transition is an observation delta between two consecutive observations.

Observation space

An Observation space is the set of all possible observations an Actor can make of an environment.


The Orchestrator is the central piece of our framework; it’s part of the main Cogment executable and handles several things:

  • It circulates data flows between Actors and Environments.
  • It dumps datasets in the chosen storage location.
  • It compresses & encrypts data.
  • It collates various reward sources (usually environment or actors) into a single reward for an Actor.
  • It instantiates the trials.



The parameters (or "params") define everything that is specific to a trial, and control aspects of Cogment for that trial. Default parameters can be given at the start of the Orchestrator. Ultimately, the configurations and parameters for a trial are managed by the pre-trial hooks.


A plugin or extension adds functionality to our core framework. We provide plugins that handle special features such as Deployment, Dataset storage destinations, Analytics, that one may or may not choose to use alongside the core framework, depending on their specific needs.

Protocol Buffer

A binary data format for serialized communication, .proto files are used to specify the available data structures. You can learn more at



  1. A sent reward is a measure of an Actor’s performance within the environment at a given tick. The reward can be sent by the environment, and/or a different Actor. They are sent to the Orchestrator, which collates them before they are received by the target actor.

  2. A received reward is a single measure of an Actor’s performance. It is produced when at least one reward is sent to the actor at a given tick.

Reward function

A reward function describes how an agent "ought" to behave; what behaviours lead to Rewards. Note that in our case, Reward functions can be used to reward any Actor, regardless of it being human or not.

Reinforcement Learning (RL)

RL is a specific method to train agents, using reward functions.



The specifications (or “specs”) are the operational aspects of a type of trial (typically found in a file named "cogment.yaml"). The file is fed to the Cogment CLI to generate helpful SDK modules (e.g. "" and various "" for the Python SDK). These modules are required by the Cogment SDKs. The Orchestrator is independent of the specifications, but some of the trial parameters may be dependent on these specifications (e.g. class name of actors).



A tick is a discrete timestep between two states of the environment. In our Framework, ticks within a trial are numbered.


A trial is a single run of a use case, with a beginning and end, populated with a single instance of the use case’s environment and its actors.


Use case

The problem one wants to solve.