gomoku_rl.env module

class gomoku_rl.env.GomokuEnv(num_envs: int, board_size: int, device=None)[source]

Bases: object

__init__(num_envs: int, board_size: int, device=None)[source]

Initializes a parallel Gomoku environment.

Parameters:

num_envs (int) – The number of independent game environments to run in parallel. Each environment represents a separate instance of a Gomoku game.
board_size (int) – The size of the square Gomoku game board.
device – The computational device (e.g., CPU, GPU) on which the game simulations will run. If None, the default device is used.

property batch_size

property board_size

property device

property num_envs

reset(env_indices: Tensor | None = None) → TensorDict[source]

Resets the specified game environments to their initial states, or all environments if none are specified.

Parameters:: env_indices (torch.Tensor | None, optional) – Indices of environments to reset. Resets all if None. Defaults to None.
Returns:: A tensor dictionary containing the initial observations and action masks for all environments.
Return type:: TensorDict

set_post_step(post_step: Callable[[TensorDict], None] | None = None)[source]

Sets a function to be called after each step in the environment.

Parameters:: post_step (Callable[[TensorDict], None] | None, optional) – A function that takes a tensor dictionary as input and performs some action. Defaults to None.

step(tensordict: TensorDict) → TensorDict[source]

Advances the state of the environments by one timestep based on the actions provided in the tensordict.

Parameters:: tensordict (TensorDict) – A dictionary containing tensors with the actions to be taken in each environment. May also include optional environment masks to specify which environments should be updated.
Returns:: output tensor dictionary containing the updated observations, action masks, and other information for all environments.
Return type:: TensorDict

step_and_maybe_reset(tensordict: TensorDict, env_mask: Tensor | None = None) → TensorDict[source]

Simulates a single step of the game environment and resets the environment if the game ends.

Parameters:

tensordict (TensorDict) – A dictionary containing tensors with the current observations, action masks, and actions for each environment.
env_mask (torch.Tensor | None, optional) – A 1D tensor specifying which environments should be updated. If None, all environments are updated.

Returns:

A dictionary containing tensors with the updated observations, action masks, and other relevant information for each environment. For environments that have concluded their game and are reset, the ‘observation’ key will reflect the new initial state, but the ‘done’ flag remains set to True to indicate the end of the previous game within this timestep.

Return type:

TensorDict