`trw.train.optimizers`¶

Module Contents¶

Functions¶

`create_scheduler_step_lr`(optimizer, step_size=30, gamma=0.1)	Create a learning rate scheduler. Every step_size, the learning late will be multiplied by gamma
`create_optimizers_fn`(datasets, model, optimizer_fn, scheduler_fn=None, per_step_scheduler_fn=None)	Create an optimizer and scheduler
`create_adam_optimizers_fn`(datasets, model, learning_rate, weight_decay=0, betas=(0.9, 0.999), eps=1e-08, scheduler_fn=None, per_step_scheduler_fn=None)	Create an ADAM optimizer for each of the dataset with optional scheduler
`create_adam_optimizers_scheduler_step_lr_fn`(datasets, model, learning_rate, step_size, gamma, weight_decay=0, betas=(0.9, 0.999))	Create an ADAM optimizer for each of the dataset with optional scheduler
`create_sgd_optimizers_fn`(datasets, model, learning_rate, momentum=0.9, weight_decay=0, nesterov=False, scheduler_fn=None, per_step_scheduler_fn=None)	Create a Stochastic gradient descent optimizer for each of the dataset with optional scheduler
`create_sgd_optimizers_scheduler_step_lr_fn`(datasets, model, learning_rate, step_size, gamma, weight_decay=0, momentum=0.9, nesterov=False)	Create a Stochastic gradient descent optimizer for each of the dataset with step learning rate scheduler
`create_sgd_optimizers_scheduler_one_cycle_lr_fn`(datasets, model, max_learning_rate, epochs, steps_per_epoch, additional_scheduler_kwargs=None, weight_decay=0, learning_rate_start_div_factor=25, learning_rate_end_div_factor=10000, percentage_cycle_increase=0.3, nesterov=False)	Create a Stochastic gradient descent optimizer for each of the dataset with step learning rate scheduler
`create_adam_optimizers_scheduler_one_cycle_lr_fn`(datasets, model, max_learning_rate, epochs, steps_per_epoch, additional_scheduler_kwargs=None, weight_decay=0, betas=(0.9, 0.999), eps=1e-08, learning_rate_start_div_factor=25, learning_rate_end_div_factor=10000, percentage_cycle_increase=0.3)	Create a ADAM optimizer for each of the dataset with step learning rate scheduler

trw.train.optimizers.create_scheduler_step_lr(optimizer, step_size=30, gamma=0.1)¶

Create a learning rate scheduler. Every step_size, the learning late will be multiplied by gamma

Parameters

optimizer – the optimizer
step_size – every number of epochs composing one step. Each step the learning rate will be decreased
gamma – apply this factor to the learning rate every time it is adjusted

Returns

a learning rate scheduler

trw.train.optimizers.create_optimizers_fn(datasets, model, optimizer_fn, scheduler_fn=None, per_step_scheduler_fn=None)¶

Create an optimizer and scheduler

Note

if model is an instance of`ModuleDict`, then the optimizer will only consider the parameters model[dataset_name].parameters() else model.parameters()

Parameters

datasets – a dictionary of dataset
model – the model. Should be a Module or a ModuleDict
optimizer_fn – the functor to instantiate the optimizer
scheduler_fn – the functor to instantiate the scheduler to be run by epoch. May be None, in that case there will be no schedule
per_step_scheduler_fn – the functor to instantiate scheduler to be run per-step (batch)

trw.train.optimizers.create_adam_optimizers_fn(datasets, model, learning_rate, weight_decay=0, betas=(0.9, 0.999), eps=1e-08, scheduler_fn=None, per_step_scheduler_fn=None)¶

Create an ADAM optimizer for each of the dataset with optional scheduler

Parameters

datasets – a dictionary of datasets
model – a model to optimize
learning_rate – the initial learning rate
weight_decay – the weight decay
scheduler_fn – a scheduler, or None
betas – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))
eps – term to add to denominator to avoid division by zero
per_step_scheduler_fn – the functor to instantiate scheduler to be run per-step (batch)

Returns

An optimizer

trw.train.optimizers.create_adam_optimizers_scheduler_step_lr_fn(datasets, model, learning_rate, step_size, gamma, weight_decay=0, betas=(0.9, 0.999))¶

Create an ADAM optimizer for each of the dataset with optional scheduler

Parameters

datasets – a dictionary of dataset
model – a model to optimize
learning_rate – the initial learning rate
step_size – the number of epoch composing a step. Each step the learning rate will be multiplied by gamma
gamma – the factor to apply to the learning rate every step
weight_decay – the weight decay
betas – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))

Returns

An optimizer with a step scheduler

trw.train.optimizers.create_sgd_optimizers_fn(datasets, model, learning_rate, momentum=0.9, weight_decay=0, nesterov=False, scheduler_fn=None, per_step_scheduler_fn=None)¶

Create a Stochastic gradient descent optimizer for each of the dataset with optional scheduler

Parameters

datasets – a dictionary of dataset
model – a model to optimize
learning_rate – the initial learning rate
scheduler_fn – a scheduler, or None
momentum – the momentum of the SGD
weight_decay – the weight decay
nesterov – enables Nesterov momentum
per_step_scheduler_fn – the functor to instantiate scheduler to be run per-step (batch)

Returns

An optimizer

trw.train.optimizers.create_sgd_optimizers_scheduler_step_lr_fn(datasets, model, learning_rate, step_size, gamma, weight_decay=0, momentum=0.9, nesterov=False)¶

Create a Stochastic gradient descent optimizer for each of the dataset with step learning rate scheduler

Parameters

datasets – a dictionary of dataset
model – a model to optimize
learning_rate – the initial learning rate
step_size – the number of epoch composing a step. Each step the learning rate will be multiplied by gamma
gamma – the factor to apply to the learning rate every step
weight_decay – the weight decay
nesterov – enables Nesterov momentum
momentum – the momentum of the SGD

Returns

An optimizer with a step scheduler

trw.train.optimizers.create_sgd_optimizers_scheduler_one_cycle_lr_fn(datasets, model, max_learning_rate, epochs, steps_per_epoch, additional_scheduler_kwargs=None, weight_decay=0, learning_rate_start_div_factor=25, learning_rate_end_div_factor=10000, percentage_cycle_increase=0.3, nesterov=False)¶

Create a Stochastic gradient descent optimizer for each of the dataset with step learning rate scheduler

Parameters

datasets – a dictionary of dataset
model – a model to optimize
max_learning_rate – the maximum learning rate
epochs – The number of epochs to train for
steps_per_epoch – The number of steps per epoch. If 0 or None, the schedule will be based on mumber of epochs only
learning_rate_start_div_factor – defines the initial learning rate for the first step as initial_learning = max_learning_rate / learning_rate_start_div_factor
learning_rate_end_div_factor – defines the end learning rate for the last step as final_learning_rate = max_learning_rate / learning_rate_start_div_factor / learning_rate_end_div_factor
percentage_cycle_increase – The percentage of the cycle (in number of steps) spent increasing the learning rate
additional_scheduler_kwargs – additional arguments provided to the scheduler
weight_decay – the weight decay
nesterov – enables Nesterov momentum
momentum – the momentum of the SGD

Returns

An optimizer with a step scheduler

trw.train.optimizers.create_adam_optimizers_scheduler_one_cycle_lr_fn(datasets, model, max_learning_rate, epochs, steps_per_epoch, additional_scheduler_kwargs=None, weight_decay=0, betas=(0.9, 0.999), eps=1e-08, learning_rate_start_div_factor=25, learning_rate_end_div_factor=10000, percentage_cycle_increase=0.3)¶

Create a ADAM optimizer for each of the dataset with step learning rate scheduler

Parameters

datasets – a dictionary of dataset
model – a model to optimize
max_learning_rate – the maximum learning rate
epochs – The number of epochs to train for
steps_per_epoch – The number of steps per epoch. If 0 or None, the schedule will be based on mumber of epochs only
learning_rate_start_div_factor – defines the initial learning rate for the first step as initial_learning = learning_rate_start_multiplier * max_learning_rate
learning_rate_end_div_factor – defines the end learning rate for the last step as final_learning_rate = max_learning_rate / learning_rate_start_div_factor / learning_rate_end_div_factor
percentage_cycle_increase – The percentage of the cycle (in number of steps) spent increasing the learning rate
additional_scheduler_kwargs – additional arguments provided to the scheduler
weight_decay – the weight decay
betas – betas of the ADAM optimizer
eps – eps of the ADAM optimizer

Returns

An optimizer with a step scheduler

trw.train.optimizers¶

Module Contents¶

Functions¶

`trw.train.optimizers`¶