trw.train.optimizers
¶
Module Contents¶
Functions¶
|
Create a learning rate scheduler. Every step_size, the learning late will be multiplied by gamma |
|
Create an optimizer and scheduler |
|
Create an ADAM optimizer for each of the dataset with optional scheduler |
|
Create an ADAM optimizer for each of the dataset with optional scheduler |
|
Create a Stochastic gradient descent optimizer for each of the dataset with optional scheduler |
|
Create a Stochastic gradient descent optimizer for each of the dataset with step learning rate scheduler |
|
Create a Stochastic gradient descent optimizer for each of the dataset with step learning rate scheduler |
|
Create a ADAM optimizer for each of the dataset with step learning rate scheduler |
- trw.train.optimizers.create_scheduler_step_lr(optimizer, step_size=30, gamma=0.1)¶
Create a learning rate scheduler. Every step_size, the learning late will be multiplied by gamma
- Parameters
optimizer – the optimizer
step_size – every number of epochs composing one step. Each step the learning rate will be decreased
gamma – apply this factor to the learning rate every time it is adjusted
- Returns
a learning rate scheduler
- trw.train.optimizers.create_optimizers_fn(datasets, model, optimizer_fn, scheduler_fn=None, per_step_scheduler_fn=None)¶
Create an optimizer and scheduler
Note
if model is an instance of`ModuleDict`, then the optimizer will only consider the parameters model[dataset_name].parameters() else model.parameters()
- Parameters
datasets – a dictionary of dataset
model – the model. Should be a Module or a ModuleDict
optimizer_fn – the functor to instantiate the optimizer
scheduler_fn – the functor to instantiate the scheduler to be run by epoch. May be None, in that case there will be no schedule
per_step_scheduler_fn – the functor to instantiate scheduler to be run per-step (batch)
- trw.train.optimizers.create_adam_optimizers_fn(datasets, model, learning_rate, weight_decay=0, betas=(0.9, 0.999), eps=1e-08, scheduler_fn=None, per_step_scheduler_fn=None)¶
Create an ADAM optimizer for each of the dataset with optional scheduler
- Parameters
datasets – a dictionary of datasets
model – a model to optimize
learning_rate – the initial learning rate
weight_decay – the weight decay
scheduler_fn – a scheduler, or None
betas – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))
eps – term to add to denominator to avoid division by zero
per_step_scheduler_fn – the functor to instantiate scheduler to be run per-step (batch)
- Returns
An optimizer
- trw.train.optimizers.create_adam_optimizers_scheduler_step_lr_fn(datasets, model, learning_rate, step_size, gamma, weight_decay=0, betas=(0.9, 0.999))¶
Create an ADAM optimizer for each of the dataset with optional scheduler
- Parameters
datasets – a dictionary of dataset
model – a model to optimize
learning_rate – the initial learning rate
step_size – the number of epoch composing a step. Each step the learning rate will be multiplied by gamma
gamma – the factor to apply to the learning rate every step
weight_decay – the weight decay
betas – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))
- Returns
An optimizer with a step scheduler
- trw.train.optimizers.create_sgd_optimizers_fn(datasets, model, learning_rate, momentum=0.9, weight_decay=0, nesterov=False, scheduler_fn=None, per_step_scheduler_fn=None)¶
Create a Stochastic gradient descent optimizer for each of the dataset with optional scheduler
- Parameters
datasets – a dictionary of dataset
model – a model to optimize
learning_rate – the initial learning rate
scheduler_fn – a scheduler, or None
momentum – the momentum of the SGD
weight_decay – the weight decay
nesterov – enables Nesterov momentum
per_step_scheduler_fn – the functor to instantiate scheduler to be run per-step (batch)
- Returns
An optimizer
- trw.train.optimizers.create_sgd_optimizers_scheduler_step_lr_fn(datasets, model, learning_rate, step_size, gamma, weight_decay=0, momentum=0.9, nesterov=False)¶
Create a Stochastic gradient descent optimizer for each of the dataset with step learning rate scheduler
- Parameters
datasets – a dictionary of dataset
model – a model to optimize
learning_rate – the initial learning rate
step_size – the number of epoch composing a step. Each step the learning rate will be multiplied by gamma
gamma – the factor to apply to the learning rate every step
weight_decay – the weight decay
nesterov – enables Nesterov momentum
momentum – the momentum of the SGD
- Returns
An optimizer with a step scheduler
- trw.train.optimizers.create_sgd_optimizers_scheduler_one_cycle_lr_fn(datasets, model, max_learning_rate, epochs, steps_per_epoch, additional_scheduler_kwargs=None, weight_decay=0, learning_rate_start_div_factor=25, learning_rate_end_div_factor=10000, percentage_cycle_increase=0.3, nesterov=False)¶
Create a Stochastic gradient descent optimizer for each of the dataset with step learning rate scheduler
- Parameters
datasets – a dictionary of dataset
model – a model to optimize
max_learning_rate – the maximum learning rate
epochs – The number of epochs to train for
steps_per_epoch – The number of steps per epoch. If 0 or None, the schedule will be based on mumber of epochs only
learning_rate_start_div_factor – defines the initial learning rate for the first step as initial_learning = max_learning_rate / learning_rate_start_div_factor
learning_rate_end_div_factor – defines the end learning rate for the last step as final_learning_rate = max_learning_rate / learning_rate_start_div_factor / learning_rate_end_div_factor
percentage_cycle_increase – The percentage of the cycle (in number of steps) spent increasing the learning rate
additional_scheduler_kwargs – additional arguments provided to the scheduler
weight_decay – the weight decay
nesterov – enables Nesterov momentum
momentum – the momentum of the SGD
- Returns
An optimizer with a step scheduler
- trw.train.optimizers.create_adam_optimizers_scheduler_one_cycle_lr_fn(datasets, model, max_learning_rate, epochs, steps_per_epoch, additional_scheduler_kwargs=None, weight_decay=0, betas=(0.9, 0.999), eps=1e-08, learning_rate_start_div_factor=25, learning_rate_end_div_factor=10000, percentage_cycle_increase=0.3)¶
Create a ADAM optimizer for each of the dataset with step learning rate scheduler
- Parameters
datasets – a dictionary of dataset
model – a model to optimize
max_learning_rate – the maximum learning rate
epochs – The number of epochs to train for
steps_per_epoch – The number of steps per epoch. If 0 or None, the schedule will be based on mumber of epochs only
learning_rate_start_div_factor – defines the initial learning rate for the first step as initial_learning = learning_rate_start_multiplier * max_learning_rate
learning_rate_end_div_factor – defines the end learning rate for the last step as final_learning_rate = max_learning_rate / learning_rate_start_div_factor / learning_rate_end_div_factor
percentage_cycle_increase – The percentage of the cycle (in number of steps) spent increasing the learning rate
additional_scheduler_kwargs – additional arguments provided to the scheduler
weight_decay – the weight decay
betas – betas of the ADAM optimizer
eps – eps of the ADAM optimizer
- Returns
An optimizer with a step scheduler