gomoku_rl.utils.psro module
- class gomoku_rl.utils.psro.ConvergedIndicator(max_size: int = 15, mean_threshold: float = 0.99, std_threshold: float = 0.005, min_iter_steps: int = 40, max_iter_steps: int = 300)[source]
Bases:
object
- class gomoku_rl.utils.psro.PSROPolicyWrapper(policy: Policy, population: Population)[source]
Bases:
object
- class gomoku_rl.utils.psro.PayoffType(value)[source]
Bases:
Enum
An enumeration.
- black_vs_white = 1
- both = 2
- class gomoku_rl.utils.psro.Population(dir: str, initial_policy: ~typing.Callable[[~tensordict._td.TensorDict], ~tensordict._td.TensorDict] | list[~typing.Callable[[~tensordict._td.TensorDict], ~tensordict._td.TensorDict]] = <function uniform_policy>, device: ~torch.device | str | int | None = 'cuda')[source]
Bases:
object
- gomoku_rl.utils.psro.get_meta_solver(name: str) Callable[[ndarray], tuple[ndarray, ndarray]] [source]
- gomoku_rl.utils.psro.get_new_payoffs(env, population_0: Population, population_1: Population, old_payoffs: ndarray | None)[source]
- gomoku_rl.utils.psro.get_new_payoffs_sp(env, population: Population, old_payoffs: ndarray | None, type: PayoffType = PayoffType.both)[source]
- gomoku_rl.utils.psro.init_payoffs_sp(env, population: Population, type: PayoffType)[source]