gomoku_rl.policy package

Submodules

Module contents

gomoku_rl.policy.get_policy(name: str, cfg: DictConfig, action_spec: DiscreteTensorSpec, observation_spec: TensorSpec, device='cuda') Policy[source]

Retrieves a policy object based on the specified policy name, configuration, action and observation specifications, and device.

Parameters:
  • name (str) – The name of the policy to retrieve, which should match a key in the Policy registry.

  • cfg (DictConfig) – Configuration settings for the policy, typically containing hyperparameters and other policy-specific settings.

  • action_spec (DiscreteTensorSpec) – The specification of the action space, defining the shape, type, and bounds of actions the policy can take.

  • observation_spec (TensorSpec) – The specification of the observation space, defining the shape and type of observations the policy will receive from the environment.

  • device – The computing device (‘cuda’ or ‘cpu’) where the policy computations will be performed. Defaults to “cuda”.

Returns:

An instance of the requested policy class, initialized with the provided configurations, action and observation specifications, and device.

Return type:

Policy

gomoku_rl.policy.get_pretrained_policy(name: str, cfg: DictConfig, action_spec: DiscreteTensorSpec, observation_spec: TensorSpec, checkpoint_path: str, device='cuda') Policy[source]

Initializes and returns a pretrained policy object based on the specified policy name, configuration, action and observation specifications, checkpoint path, and device.

Parameters:
  • name (str) – The name of the policy to be loaded, corresponding to a key in the Policy registry.

  • cfg (DictConfig) – Configuration settings for the policy, typically containing hyperparameters and other policy-specific settings.

  • action_spec (DiscreteTensorSpec) – The specification of the action space, detailing the shape, type, and bounds of actions the policy can execute.

  • observation_spec (TensorSpec) – The specification of the observation space, detailing the shape and type of observations the policy will receive from the environment.

  • checkpoint_path (str) – The file path to the saved model checkpoint from which the policy’s state should be loaded.

  • device – The computing device (‘cuda’ or ‘cpu’) on which the policy computations will be executed. Defaults to “cuda”.

Returns:

An instance of the specified policy class, initialized with the provided configurations, action and observation specifications, and pretrained weights loaded from the given checkpoint path.

Return type:

Policy