gomoku_rl.core module

class gomoku_rl.core.Gomoku(num_envs: int, board_size: int = 15, device=None)[source]

Bases: object

__init__(num_envs: int, board_size: int = 15, device=None)[source]

Initializes a batch of parallel Gomoku game environments.

Parameters:

num_envs (int) – Number of parallel game environments.
board_size (int, optional) – Side length of the square game board. Defaults to 15.
device – Torch device on which the tensors are allocated. Defaults to None (CPU).

get_action_mask() → Tensor[source]

Generates a mask indicating valid actions for each environment.

Returns:: Action mask tensor, shaped (E, B*B), with 1s for valid actions and 0s otherwise.
Return type:: torch.Tensor

get_encoded_board() → Tensor[source]

Encodes the current board state into a tensor format suitable for neural network input.

Returns:: Encoded board state, shaped (E, 3, B, B), with separate channels for the current player’s stones, the opponent’s stones, and the last move.
Return type:: torch.Tensor

is_valid(action: Tensor) → Tensor[source]

Checks the validity of the specified actions in each environment.

Parameters:: action (torch.Tensor) – Actions to be checked, linearly indexed.
Returns:: Boolean tensor, shaped (E,), indicating the validity of each action.
Return type:: torch.Tensor

reset(env_indices: Tensor | None = None)[source]

Resets specified game environments to their initial state.

Parameters:: env_indices (torch.Tensor | None, optional) – Indices of environments to reset. Resets all if None. Defaults to None.

step(action: Tensor, env_mask: Tensor | None = None) → tuple[Tensor, Tensor][source]

Performs actions in specified environments and updates their states based on the provided action tensor. If an environment mask is provided, only the environments corresponding to True values in the mask are updated; otherwise, all environments are updated.

Parameters:

action (torch.Tensor) – 1D positions to place a stone, one per environment. Shape: (E,)
env_indices (torch.Tensor | None, optional) – Boolean mask to select environments for updating. If None, updates all. Shape should match environments.

Returns:

A tuple containing two tensors:

done_statuses: Boolean tensor with True where games ended.
invalid_actions: Boolean tensor with True for invalid actions in environments.

Return type:

tuple[torch.Tensor, torch.Tensor]

to(device)[source]

Transfers all internal tensors to the specified device.

Parameters:: device – The target device.
Returns:: The instance with its tensors moved to the new device.
Return type:: self

gomoku_rl.core.compute_done(board: Tensor, kernel_horizontal: Tensor, kernel_vertical: Tensor, kernel_diagonal: Tensor) → Tensor[source]

Determines if any game has been won in a batch of Gomoku boards.

Checks for a winning sequence of stones horizontally, vertically, and diagonally.

Parameters:

board (torch.Tensor) – The game boards, shaped (E, B, B), with E being the number of environments, and B being the board size. Values are 0 (empty), 1 (black stone), or -1 (white stone).
kernel_horizontal (torch.Tensor) – Horizontal detection kernel, shaped (1, 1, 5, 1).
kernel_vertical (torch.Tensor) – Vertical detection kernel, shaped (1, 1, 1, 5).
kernel_diagonal (torch.Tensor) – Diagonal detection kernels, shaped (2, 1, 5, 5), for both diagonals.

Returns:

Boolean tensor shaped (E,), indicating if the game is won (True) in each environment.

Return type:

torch.Tensor