trw.train.sequence_async_reservoir

Module Contents

Classes

Performance

SequenceAsyncReservoir

This sequence will asynchronously process data and keep a reserve of loaded samples

SequenceAsyncReservoirIterator

Iterate through the SequenceAsyncReservoir sequence

class trw.train.sequence_async_reservoir.Performance
add(self, time_elapsed)
get_average_time(self)
class trw.train.sequence_async_reservoir.SequenceAsyncReservoir(source_split, max_reservoir_samples, function_to_run, *, min_reservoir_samples=1, nb_workers=1, max_jobs_at_once=None, reservoir_sampler=None, collate_fn=sequence.remove_nested_list, maximum_number_of_samples_per_epoch=None, max_reservoir_replacement_size=None)

Bases: trw.train.sequence.Sequence

This sequence will asynchronously process data and keep a reserve of loaded samples

The idea is to have long loading processes work in the background while still using as efficiently as possible the data that is currently loaded. The data is slowly being replaced by freshly loaded data over time.

Jobs are started and results retrieved at the beginning of each epoch

This sequence can be interrupted (e.g., after a certain number of batches have been returned). When the sequence is restarted, the reservoir will not be emptied.

subsample(self, nb_samples)

Sub-sample a sequence to a fixed number of samples.

The purpose is to obtain a smaller sequence, this is particularly useful for the export of augmentations, samples.

Parameters

nb_samples – the number of samples desired in the original sequence

Returns

a subsampled Sequence

reservoir_size(self)
Returns

The current number of samples in the reservoir

subsample_uids(self, uids, uids_name, new_sampler=None)

Sub-sample a sequence to samples with specified UIDs.

Parameters
  • uids (list) – the uids. If new_sampler keeps the ordering, then the samples of the resampled sequence should follow uids ordering

  • uids_name (str) – the name of the UIDs

  • new_sampler (Sampler) – the sampler to be used for the subsampler sequence. If None, re-use the existing

Returns

a subsampled Sequence

initializer(self)
fill_queue(self)

Fill the input queue of jobs to be completed

_retrieve_results_and_fill_queue(self)

Retrieve results from the output queue

_wait_for_job_completion(self)

Block the processing until we have enough result in the reservoir

__iter__(self)
Returns

An iterator of batches

close(self)

Finish and join the existing pool processes

class trw.train.sequence_async_reservoir.SequenceAsyncReservoirIterator(base_sequence, reservoir_sampler)

Bases: trw.train.sequence.SequenceIterator

Iterate through the SequenceAsyncReservoir sequence

_reset_iter_reservoir(self)
__next__(self)
Returns

The next batch of data

close(self)

Special method to close and clean the resources of the sequence