`trw.datasets`¶

Submodules¶

Package Contents¶

Classes¶

TinyImageNet

Tiny ImageNet data set available from http://cs231n.stanford.edu/tiny-imagenet-200.zip.

Functions¶

`create_mnist_dataset`(batch_size: int = 1000, root: str = None, transforms: List[trw.transforms.Transform] = None, nb_workers: int = 5, data_processing_batch_size: int = 200, normalize_0_1: bool = False, select_classes_train: Optional[Sequence[int]] = None, select_classes_test: Optional[Sequence[int]] = None) → Tuple[trw.basic_typing.Datasets, trw.basic_typing.DatasetsInfo]	param batch_size
`create_mnist_cluttered_datasset`(batch_size: int = 1000, cluttered_size: trw.basic_typing.ShapeX = (64, 64), clutter_window: trw.basic_typing.ShapeX = (6, 6), nb_clutter_windows: int = 16, root: Optional[str] = None, train_transforms: List[trw.transforms.Transform] = None, test_transforms: List[trw.transforms.Transform] = None, nb_workers: int = 5, data_processing_batch_size: int = 200, normalize_0_1: bool = False) → Tuple[trw.basic_typing.Datasets, trw.basic_typing.DatasetsInfo]	param batch_size
`create_cifar10_dataset`(batch_size: int = 300, root: Optional[str] = None, transform_train: Optional[List[trw.transforms.Transform]] = None, transform_valid: Optional[List[trw.transforms.Transform]] = None, nb_workers: int = 2, data_processing_batch_size: int = None, normalize_0_1: bool = True) → trw.basic_typing.Datasets
`create_voc_detection_dataset`(root: str = None, transform_train: Optional[List[trw.transforms.Transform]] = None, transform_valid: Optional[List[trw.transforms.Transform]] = None, nb_workers: int = 2, batch_size: int = 1, data_subsampling_fraction_train: float = 1.0, data_subsampling_fraction_valid: float = 1.0, train_split: str = 'train', valid_split: str = 'val', year: typing_extensions.Literal[2007, 2012] = '2007') → trw.basic_typing.Datasets	PASCAL VOC detection challenge
`create_voc_segmentation_dataset`(batch_size: int = 40, root: Optional[str] = None, transform_train: Optional[List[trw.transforms.Transform]] = default_voc_transforms(), transform_valid: Optional[List[trw.transforms.Transform]] = None, nb_workers: int = 2, year: typing_extensions.Literal[2007, 2012] = '2012') → trw.basic_typing.Datasets	Create the VOC segmentation dataset
`create_cityscapes_dataset`(batch_size: int = 32, root: Optional[str] = None, transform_train: Optional[List[trw.transforms.Transform]] = None, transform_valid: Optional[List[trw.transforms.Transform]] = None, nb_workers: int = 4, target_type: typing_extensions.Literal[semantic] = 'semantic') → trw.basic_typing.Datasets	Load the cityscapes dataset. This requires to register on their website https://www.cityscapes-dataset.com/
`create_facades_dataset`(root: str = None, batch_size: int = 32, normalize_0_1: bool = True, transforms_train: Optional[List[trw.transforms.Transform]] = None, nb_workers=0, url: str = 'https://people.eecs.berkeley.edu/~tinghuiz/projects/pix2pix/datasets/facades.tar.gz') → trw.basic_typing.Datasets
`create_tiny_imagenet_dataset`(batch_size: int, num_images_per_class: int = 500, transforms_train: List[trw.transforms.Transform] = None, transforms_valid: List[trw.transforms.Transform] = None, nb_workers: int = 4, root: Optional[str] = None) → Tuple[trw.basic_typing.Datasets, trw.basic_typing.DatasetsInfo]
`create_name_nationality_dataset`(url: str = 'https://download.pytorch.org/tutorial/data.zip', root: Optional[str] = None, valid_ratio: float = 0.1, seed: int = 0, batch_size: int = 1) → trw.basic_typing.Datasets
`create_decathlon_dataset`(task_name: str, root: str = None, transform_train: trw.transforms.Transform = None, transform_valid: trw.transforms.Transform = None, nb_workers: int = 4, valid_ratio: float = 0.2, batch_size: int = 1, remove_patient_transform: bool = False) → trw.basic_typing.Datasets	Create a task of the medical decathlon dataset.
`create_cycle_gan_dataset`(dataset_name: cycle_gan_dataset, batch_size: int = 32, root: Optional[str] = None, url: str = 'https://people.eecs.berkeley.edu/~taesung_park/CycleGAN/datasets/', transform_train: Optional[List[trw.transforms.Transform]] = None, transform_valid: Optional[List[trw.transforms.Transform]] = None, aligned: bool = False, loading_batch_size: int = 4, nb_workers: int = 4) → trw.basic_typing.Datasets	Datasets used for image to image translation (domain A to domain B).
`create_fake_symbols_datasset`(nb_samples: int, image_shape: trw.basic_typing.ShapeX, dataset_name: str, shapes_fn: ShapeCreator, ratio_valid: float = 0.2, nb_classes_at_once: Optional[int] = None, global_scale_factor: float = 1.0, normalize_0_1: bool = True, noise_fn: Callable[[numpy.ndarray], numpy.ndarray] = functools.partial(_noisy, noise_type='poisson'), max_classes: Optional[int] = None, batch_size: int = 64, background: int = 255) → trw.basic_typing.Datasets	Create artificial 2D for classification and segmentation problems
`_random_location`(image_shape: numpy.ndarray, figure_shape) → numpy.ndarray
`_random_color`() → numpy.ndarray
`_add_shape`(imag, mask, shape, shapes_added, scale_factor, color, min_overlap_distance=30)
`_create_image`(shape, objects, nb_classes_at_once=None, max_classes=None, background=255)	param shape the shape of an image [height, width]
`_noisy`(image: numpy.ndarray, noise_type: typing_extensions.Literal[_noisy.gauss, poisson, s&p, speckle]) → numpy.ndarray	param image a numpy image (float) in range [0..255]
`create_fake_symbols_2d_dataset`(nb_samples: int, image_shape: trw.basic_typing.ShapeX, ratio_valid: float = 0.2, nb_classes_at_once: Optional[int] = None, global_scale_factor: float = 1.0, normalize_0_1: bool = True, noise_fn: Callable[[numpy.ndarray], numpy.ndarray] = functools.partial(_noisy, noise_type='poisson'), shapes_fn: trw.datasets.dataset_fake_symbols.ShapeCreator = default_shapes_2d, max_classes: Optional[int] = None, batch_size: int = 64, background: int = 255, dataset_name: str = 'fake_symbols_2d') → trw.basic_typing.Datasets	Create artificial 2D for classification and segmentation problems
`default_shapes_2d`(global_scale_factor=1.0)
`create_fake_symbols_3d_dataset`(nb_samples: int, image_shape: trw.basic_typing.ShapeX, ratio_valid: float = 0.2, nb_classes_at_once: Optional[int] = None, global_scale_factor: float = 1.0, normalize_0_1: bool = True, noise_fn: Callable[[numpy.ndarray], numpy.ndarray] = functools.partial(_noisy, noise_type='poisson'), shapes_fn: trw.datasets.dataset_fake_symbols.ShapeCreator = default_shapes_3d, max_classes: Optional[int] = None, batch_size: int = 64, background: int = 255, dataset_name: str = 'fake_symbols_3d') → trw.basic_typing.Datasets	Create artificial 2D for classification and segmentation problems
`default_shapes_3d`(global_scale_factor=1.0)

trw.datasets.create_mnist_dataset(batch_size: int = 1000, root: str = None, transforms: List[trw.transforms.Transform] = None, nb_workers: int = 5, data_processing_batch_size: int = 200, normalize_0_1: bool = False, select_classes_train: Optional[Sequence[int]] = None, select_classes_test: Optional[Sequence[int]] = None) → Tuple[trw.basic_typing.Datasets, trw.basic_typing.DatasetsInfo]¶

Parameters

batch_size –
root –
transforms –
nb_workers –
data_processing_batch_size –
normalize_0_1 –
select_classes_train – a subset of classes to be selected for the training split
select_classes_test – a subset of classes to be selected for the test split

Returns:

trw.datasets.create_mnist_cluttered_datasset(batch_size: int = 1000, cluttered_size: trw.basic_typing.ShapeX = (64, 64), clutter_window: trw.basic_typing.ShapeX = (6, 6), nb_clutter_windows: int = 16, root: Optional[str] = None, train_transforms: List[trw.transforms.Transform] = None, test_transforms: List[trw.transforms.Transform] = None, nb_workers: int = 5, data_processing_batch_size: int = 200, normalize_0_1: bool = False) → Tuple[trw.basic_typing.Datasets, trw.basic_typing.DatasetsInfo]¶

Parameters

batch_size –
cluttered_size – the size of the final image
root –
clutter_window – the size of the random windows to create the clutter
nb_clutter_windows – the number of clutter windows added to the image
train_transforms – the transform function applied on the training batches
test_transforms – the transform function applied on the test batches
nb_workers – the number of workers to preprocess the dataset
data_processing_batch_size – the number of samples each worker process at once
normalize_0_1 – if True, the pixels will be in range [0..1]

Returns

datasets

trw.datasets.create_cifar10_dataset(batch_size: int = 300, root: Optional[str] = None, transform_train: Optional[List[trw.transforms.Transform]] = None, transform_valid: Optional[List[trw.transforms.Transform]] = None, nb_workers: int = 2, data_processing_batch_size: int = None, normalize_0_1: bool = True) → trw.basic_typing.Datasets¶

trw.datasets.create_voc_detection_dataset(root: str = None, transform_train: Optional[List[trw.transforms.Transform]] = None, transform_valid: Optional[List[trw.transforms.Transform]] = None, nb_workers: int = 2, batch_size: int = 1, data_subsampling_fraction_train: float = 1.0, data_subsampling_fraction_valid: float = 1.0, train_split: str = 'train', valid_split: str = 'val', year: typing_extensions.Literal[2007, 2012] = '2007') → trw.basic_typing.Datasets¶

PASCAL VOC detection challenge

Notes

Batch size is always 1 since we need to sample from the image various anchors, locations depending on the task (so each sample should be post-processed by a custom transform)

trw.datasets.create_voc_segmentation_dataset(batch_size: int = 40, root: Optional[str] = None, transform_train: Optional[List[trw.transforms.Transform]] = default_voc_transforms(), transform_valid: Optional[List[trw.transforms.Transform]] = None, nb_workers: int = 2, year: typing_extensions.Literal[2007, 2012] = '2012') → trw.basic_typing.Datasets¶

Create the VOC segmentation dataset

Parameters

batch_size – the number of samples per batch
root – the root of the dataset
transform_train – the transform to apply on each batch of data of the training data
transform_valid – the transform to apply on each batch of data of the validation data
nb_workers – the number of worker process to pre-process the batches
year – the version of the dataset

Returns

a datasets with dataset voc2012 and splits train, valid.

trw.datasets.create_cityscapes_dataset(batch_size: int = 32, root: Optional[str] = None, transform_train: Optional[List[trw.transforms.Transform]] = None, transform_valid: Optional[List[trw.transforms.Transform]] = None, nb_workers: int = 4, target_type: typing_extensions.Literal[semantic] = 'semantic') → trw.basic_typing.Datasets¶

Load the cityscapes dataset. This requires to register on their website https://www.cityscapes-dataset.com/ and manually download the dataset.

The dataset is composed of 3 parts: gtCoarse, gtFine, leftImg8bit. Download each package and unzip in a folder (e.g., cityscapes)

Parameters

batch_size –
root – the folder containing the 3 unzipped cityscapes data gtCoarse, gtFine, leftImg8bit
transform_train – the transform to apply on the training batches
transform_valid – the transform to apply on the validation batches
nb_workers – the number of workers for each split allocated to the data loading and processing
target_type – the segmentation task

Returns

a dict of splits. Each split is a trw.train.Sequence

trw.datasets.create_facades_dataset(root: str = None, batch_size: int = 32, normalize_0_1: bool = True, transforms_train: Optional[List[trw.transforms.Transform]] = None, nb_workers=0, url: str = 'https://people.eecs.berkeley.edu/~tinghuiz/projects/pix2pix/datasets/facades.tar.gz') → trw.basic_typing.Datasets¶

class trw.datasets.TinyImageNet(root, split='train', num_images_per_class=500)¶

Bases: torch.utils.data.Dataset

Tiny ImageNet data set available from http://cs231n.stanford.edu/tiny-imagenet-200.zip.

Notes

The test valid is discarded since we do not have the test labels

__len__(self)¶

__getitem__(self, index)¶

static read_image(path)¶

trw.datasets.create_tiny_imagenet_dataset(batch_size: int, num_images_per_class: int = 500, transforms_train: List[trw.transforms.Transform] = None, transforms_valid: List[trw.transforms.Transform] = None, nb_workers: int = 4, root: Optional[str] = None) → Tuple[trw.basic_typing.Datasets, trw.basic_typing.DatasetsInfo]¶

trw.datasets.create_name_nationality_dataset(url: str = 'https://download.pytorch.org/tutorial/data.zip', root: Optional[str] = None, valid_ratio: float = 0.1, seed: int = 0, batch_size: int = 1) → trw.basic_typing.Datasets¶

trw.datasets.create_decathlon_dataset(task_name: str, root: str = None, transform_train: trw.transforms.Transform = None, transform_valid: trw.transforms.Transform = None, nb_workers: int = 4, valid_ratio: float = 0.2, batch_size: int = 1, remove_patient_transform: bool = False) → trw.basic_typing.Datasets¶

Create a task of the medical decathlon dataset.

The dataset is available here http://medicaldecathlon.com/ with accompanying publication: https://arxiv.org/abs/1902.09063

Parameters

task_name – the name of the task
root – the root folder where the data will be created and possibly downloaded
transform_train – a function that take a batch of training data and return a transformed batch
transform_valid – a function that take a batch of valid data and return a transformed batch
nb_workers – the number of workers used for the preprocessing
valid_ratio – the ratio of validation data
batch_size – the batch size
remove_patient_transform – if True, remove the affine transformation attached to the voxels

Returns

a dictionary of datasets

trw.datasets.create_cycle_gan_dataset(dataset_name: cycle_gan_dataset, batch_size: int = 32, root: Optional[str] = None, url: str = 'https://people.eecs.berkeley.edu/~taesung_park/CycleGAN/datasets/', transform_train: Optional[List[trw.transforms.Transform]] = None, transform_valid: Optional[List[trw.transforms.Transform]] = None, aligned: bool = False, loading_batch_size: int = 4, nb_workers: int = 4) → trw.basic_typing.Datasets¶

Datasets used for image to image translation (domain A to domain B).

Parameters

dataset_name – the name of the dataset
batch_size – the size of each batch
root – the root path where to store the dataset
url – specify the URL from which the dataset is downloaded
transform_train – transform applied to train dataset
transform_valid – transform applied to valid dataset
aligned – if True, the images A and B will be considered aligned. If False, B will be randomly sampled from the list of available images in B
nb_workers – the number of workers to process the images
loading_batch_size – the number of images loaded by a worker

Returns

a dataset

trw.datasets.create_fake_symbols_datasset(nb_samples: int, image_shape: trw.basic_typing.ShapeX, dataset_name: str, shapes_fn: ShapeCreator, ratio_valid: float = 0.2, nb_classes_at_once: Optional[int] = None, global_scale_factor: float = 1.0, normalize_0_1: bool = True, noise_fn: Callable[[numpy.ndarray], numpy.ndarray] = functools.partial(_noisy, noise_type='poisson'), max_classes: Optional[int] = None, batch_size: int = 64, background: int = 255) → trw.basic_typing.Datasets¶

Create artificial 2D for classification and segmentation problems

This dataset will randomly create shapes at random location & color with a segmentation map.

Parameters

nb_samples – the number of samples to be generated
image_shape – the shape of an image [height, width]
ratio_valid – the ratio of samples to be used for the validation split
nb_classes_at_once – the number of classes to be included in each sample. If None, all the classes will be included
global_scale_factor – the scale of the shapes to generate
noise_fn – a function to create noise in the image
shapes_fn – the function to create the different shapes
normalize_0_1 – if True, the data will be normalized (i.e., image & position will be in range [0..1])
max_classes – the total number of classes available
batch_size – the size of the batch for the dataset
background – the background value of the sample (before normalization if normalize_0_1 is True)
dataset_name – the name of the returned dataset

Returns

a dict containing the dataset fake_symbols_2d with train and valid splits with features image, mask, classification, <shape_name>_center

trw.datasets._random_location(image_shape: numpy.ndarray, figure_shape) → numpy.ndarray¶

trw.datasets._random_color() → numpy.ndarray¶

trw.datasets._add_shape(imag, mask, shape, shapes_added, scale_factor, color, min_overlap_distance=30)¶

trw.datasets._create_image(shape, objects, nb_classes_at_once=None, max_classes=None, background=255)¶

Parameters

shape – the shape of an image [height, width]
nb_classes_at_once – the number of classes to be included in each sample. If None, all the classes will be included
max_classes – the maximum number of classes to be used. If None, all classes can be used, else a random subset

Returns

image, mask and shape information

trw.datasets._noisy(image: numpy.ndarray, noise_type: typing_extensions.Literal[_noisy.gauss, poisson, s & p, speckle]) → numpy.ndarray¶

Parameters

image – a numpy image (float) in range [0..255]
noise_type – the type of noise. Must be one of:
noise. (* 'gauss' Gaussian-distributed additive) –
data. (* 'poisson' Poisson-distributed noise generated from the) –
1. (* 's&p' Replaces random pixels with 0 or) –
n*image (* 'speckle' Multiplicative noise using out = image +) – uniform noise with specified mean & variance
is (where n) – uniform noise with specified mean & variance

Returns

noisy image

trw.datasets.create_fake_symbols_2d_dataset(nb_samples: int, image_shape: trw.basic_typing.ShapeX, ratio_valid: float = 0.2, nb_classes_at_once: Optional[int] = None, global_scale_factor: float = 1.0, normalize_0_1: bool = True, noise_fn: Callable[[numpy.ndarray], numpy.ndarray] = functools.partial(_noisy, noise_type='poisson'), shapes_fn: trw.datasets.dataset_fake_symbols.ShapeCreator = default_shapes_2d, max_classes: Optional[int] = None, batch_size: int = 64, background: int = 255, dataset_name: str = 'fake_symbols_2d') → trw.basic_typing.Datasets¶

Create artificial 2D for classification and segmentation problems

This dataset will randomly create shapes at random location & color with a segmentation map.

Parameters

nb_samples – the number of samples to be generated
image_shape – the shape of an image [height, width]
ratio_valid – the ratio of samples to be used for the validation split
nb_classes_at_once – the number of classes to be included in each sample. If None, all the classes will be included
global_scale_factor – the scale of the shapes to generate
noise_fn – a function to create noise in the image
shapes_fn – the function to create the different shapes
normalize_0_1 – if True, the data will be normalized (i.e., image & position will be in range [0..1])
max_classes – the total number of classes available
batch_size – the size of the batch for the dataset
background – the background value of the sample (before normalization if normalize_0_1 is True)
dataset_name – the name of the returned dataset

Returns

a dict containing the dataset fake_symbols_2d with train and valid splits with features image, mask, classification, <shape_name>_center

trw.datasets.default_shapes_2d(global_scale_factor=1.0)¶

trw.datasets.create_fake_symbols_3d_dataset(nb_samples: int, image_shape: trw.basic_typing.ShapeX, ratio_valid: float = 0.2, nb_classes_at_once: Optional[int] = None, global_scale_factor: float = 1.0, normalize_0_1: bool = True, noise_fn: Callable[[numpy.ndarray], numpy.ndarray] = functools.partial(_noisy, noise_type='poisson'), shapes_fn: trw.datasets.dataset_fake_symbols.ShapeCreator = default_shapes_3d, max_classes: Optional[int] = None, batch_size: int = 64, background: int = 255, dataset_name: str = 'fake_symbols_3d') → trw.basic_typing.Datasets¶

Create artificial 2D for classification and segmentation problems

This dataset will randomly create shapes at random location & color with a segmentation map.

Parameters

nb_samples – the number of samples to be generated
image_shape – the shape of an image [height, width]
ratio_valid – the ratio of samples to be used for the validation split
nb_classes_at_once – the number of classes to be included in each sample. If None, all the classes will be included
global_scale_factor – the scale of the shapes to generate
noise_fn – a function to create noise in the image
shapes_fn – the function to create the different shapes
normalize_0_1 – if True, the data will be normalized (i.e., image & position will be in range [0..1])
max_classes – the total number of classes available
batch_size – the size of the batch for the dataset
background – the background value of the sample (before normalization if normalize_0_1 is True)
dataset_name – the name of the returned dataset

Returns

a dict containing the dataset fake_symbols_2d with train and valid splits with features image, mask, classification, <shape_name>_center

trw.datasets.default_shapes_3d(global_scale_factor=1.0)¶

trw.datasets¶

Submodules¶

Package Contents¶

Classes¶

Functions¶

`trw.datasets`¶