trw.train.optimizers_v2

Module Contents

Classes

CosineAnnealingWarmRestartsDecayed

Scheduler based on torch.optim.lr_scheduler.CosineAnnealingWarmRestarts. In addition,

Optimizer

OptimizerSGD

OptimizerAdam

OptimizerAdamW

Attributes

SchedulerType

StepSchedulerType

trw.train.optimizers_v2.SchedulerType
trw.train.optimizers_v2.StepSchedulerType
class trw.train.optimizers_v2.CosineAnnealingWarmRestartsDecayed(optimizer: torch.optim.Optimizer, T_0: int, T_mult: int = 1, eta_min: float = 0, last_epoch: int = - 1, decay_factor: float = 0.7)

Bases: torch.optim.lr_scheduler.CosineAnnealingWarmRestarts

Scheduler based on torch.optim.lr_scheduler.CosineAnnealingWarmRestarts. In addition, every time the learning rate is restarted, the base learning rate is decayed by decay_factor

step(self, epoch=None)

Step could be called after every batch update

Example

>>> scheduler = CosineAnnealingWarmRestarts(optimizer, T_0, T_mult)
>>> iters = len(dataloader)
>>> for epoch in range(20):
>>>     for i, sample in enumerate(dataloader):
>>>         inputs, labels = sample['inputs'], sample['labels']
>>>         optimizer.zero_grad()
>>>         outputs = net(inputs)
>>>         loss = criterion(outputs, labels)
>>>         loss.backward()
>>>         optimizer.step()
>>>         scheduler.step(epoch + i / iters)

This function can be called in an interleaved way.

Example

>>> scheduler = CosineAnnealingWarmRestarts(optimizer, T_0, T_mult)
>>> for epoch in range(20):
>>>     scheduler.step()
>>> scheduler.step(26)
>>> scheduler.step() # scheduler.step(27), instead of scheduler(20)
class trw.train.optimizers_v2.Optimizer(optimizer_fn: Callable[[Iterator[torch.nn.parameter.Parameter]], torch.optim.Optimizer], scheduler_fn: Optional[Callable[[torch.optim.Optimizer], SchedulerType]] = None, step_scheduler_fn: Optional[Callable[[torch.optim.Optimizer], StepSchedulerType]] = None)
set_scheduler_fn(self, scheduler_fn: Optional[Callable[[torch.optim.Optimizer], SchedulerType]])
set_step_scheduler_fn(self, step_scheduler_fn: Optional[Callable[[torch.optim.Optimizer], StepSchedulerType]])
__call__(self, datasets: trw.basic_typing.Datasets, model: torch.nn.Module) Tuple[Dict[str, torch.optim.Optimizer], Optional[Dict[str, SchedulerType]], Optional[Dict[str, StepSchedulerType]]]
scheduler_step_lr(self, step_size: int, gamma: float = 0.1) Optimizer

Apply a scheduler on the learning rate.

Decays the learning rate of each parameter group by gamma every step_size epochs.

scheduler_cosine_annealing_warm_restart(self, T_0: int, T_mult: int = 1, eta_min: float = 0, last_epoch=- 1) Optimizer

Apply a scheduler on the learning rate.

Restart the learning rate every T_0 * (T_mult)^(#restart) epochs.

References

https://arxiv.org/pdf/1608.03983v5.pdf

scheduler_cosine_annealing_warm_restart_decayed(self, T_0: int, T_mult: int = 1, eta_min: float = 0, last_epoch=- 1, decay_factor=0.7) Optimizer

Apply a scheduler on the learning rate. Each time the learning rate is restarted, the base learning rate is decayed

Restart the learning rate every T_0 * (T_mult)^(#restart) epochs.

References

https://arxiv.org/pdf/1608.03983v5.pdf

scheduler_one_cycle(self, max_learning_rate: float, epochs: int, steps_per_epoch: int, learning_rate_start_div_factor: float = 25.0, learning_rate_end_div_factor: float = 10000.0, percentage_cycle_increase: float = 0.3, anneal_strategy: str = 'cos', cycle_momentum: bool = True, base_momentum: float = 0.85, max_momentum: float = 0.95)

This scheduler should not be used with another scheduler!

The learning rate or momentum provided by the Optimizer will be overriden by this scheduler.

clip_gradient_norm(self, max_norm: float = 1.0, norm_type: float = 2.0)

Clips the gradient norm during optimization

Parameters
  • max_norm – the maximum norm of the concatenated gradients of the optimizer. Note: the gradient is modulated by the learning rate

  • norm_type – type of the used p-norm. Can be 'inf' for infinity norm

See:

torch.nn.utils.clip_grad_norm_()

class trw.train.optimizers_v2.OptimizerSGD(learning_rate: float, momentum: float = 0.9, weight_decay: float = 0, nesterov: bool = False)

Bases: Optimizer

class trw.train.optimizers_v2.OptimizerAdam(learning_rate: float, weight_decay: float = 0, betas: Tuple[float, float] = (0.9, 0.999), eps: float = 1e-08)

Bases: Optimizer

class trw.train.optimizers_v2.OptimizerAdamW(learning_rate: float, weight_decay: float = 0.01, betas: Tuple[float, float] = (0.9, 0.999), eps: float = 1e-08)

Bases: Optimizer