trw.datasets

Submodules

Package Contents

Classes

TinyImageNet

Tiny ImageNet data set available from http://cs231n.stanford.edu/tiny-imagenet-200.zip.

Functions

create_mnist_dataset(batch_size: int = 1000, root: str = None, transforms: List[trw.transforms.Transform] = None, nb_workers: int = 5, data_processing_batch_size: int = 200, normalize_0_1: bool = False, select_classes_train: Optional[Sequence[int]] = None, select_classes_test: Optional[Sequence[int]] = None) → Tuple[trw.basic_typing.Datasets, trw.basic_typing.DatasetsInfo]

param batch_size

create_mnist_cluttered_datasset(batch_size: int = 1000, cluttered_size: trw.basic_typing.ShapeX = (64, 64), clutter_window: trw.basic_typing.ShapeX = (6, 6), nb_clutter_windows: int = 16, root: Optional[str] = None, train_transforms: List[trw.transforms.Transform] = None, test_transforms: List[trw.transforms.Transform] = None, nb_workers: int = 5, data_processing_batch_size: int = 200, normalize_0_1: bool = False) → Tuple[trw.basic_typing.Datasets, trw.basic_typing.DatasetsInfo]

param batch_size

create_cifar10_dataset(batch_size: int = 300, root: Optional[str] = None, transform_train: Optional[List[trw.transforms.Transform]] = None, transform_valid: Optional[List[trw.transforms.Transform]] = None, nb_workers: int = 2, data_processing_batch_size: int = None, normalize_0_1: bool = True) → trw.basic_typing.Datasets

create_voc_detection_dataset(root: str = None, transform_train: Optional[List[trw.transforms.Transform]] = None, transform_valid: Optional[List[trw.transforms.Transform]] = None, nb_workers: int = 2, batch_size: int = 1, data_subsampling_fraction_train: float = 1.0, data_subsampling_fraction_valid: float = 1.0, train_split: str = 'train', valid_split: str = 'val', year: typing_extensions.Literal[2007, 2012] = '2007') → trw.basic_typing.Datasets

PASCAL VOC detection challenge

create_voc_segmentation_dataset(batch_size: int = 40, root: Optional[str] = None, transform_train: Optional[List[trw.transforms.Transform]] = default_voc_transforms(), transform_valid: Optional[List[trw.transforms.Transform]] = None, nb_workers: int = 2, year: typing_extensions.Literal[2007, 2012] = '2012') → trw.basic_typing.Datasets

Create the VOC segmentation dataset

create_cityscapes_dataset(batch_size: int = 32, root: Optional[str] = None, transform_train: Optional[List[trw.transforms.Transform]] = None, transform_valid: Optional[List[trw.transforms.Transform]] = None, nb_workers: int = 4, target_type: typing_extensions.Literal[semantic] = 'semantic') → trw.basic_typing.Datasets

Load the cityscapes dataset. This requires to register on their website https://www.cityscapes-dataset.com/

create_facades_dataset(root: str = None, batch_size: int = 32, normalize_0_1: bool = True, transforms_train: Optional[List[trw.transforms.Transform]] = None, nb_workers=0, url: str = 'https://people.eecs.berkeley.edu/~tinghuiz/projects/pix2pix/datasets/facades.tar.gz') → trw.basic_typing.Datasets

create_tiny_imagenet_dataset(batch_size: int, num_images_per_class: int = 500, transforms_train: List[trw.transforms.Transform] = None, transforms_valid: List[trw.transforms.Transform] = None, nb_workers: int = 4, root: Optional[str] = None) → Tuple[trw.basic_typing.Datasets, trw.basic_typing.DatasetsInfo]

create_name_nationality_dataset(url: str = 'https://download.pytorch.org/tutorial/data.zip', root: Optional[str] = None, valid_ratio: float = 0.1, seed: int = 0, batch_size: int = 1) → trw.basic_typing.Datasets

create_decathlon_dataset(task_name: str, root: str = None, transform_train: trw.transforms.Transform = None, transform_valid: trw.transforms.Transform = None, nb_workers: int = 4, valid_ratio: float = 0.2, batch_size: int = 1, remove_patient_transform: bool = False) → trw.basic_typing.Datasets

Create a task of the medical decathlon dataset.

create_cycle_gan_dataset(dataset_name: cycle_gan_dataset, batch_size: int = 32, root: Optional[str] = None, url: str = 'https://people.eecs.berkeley.edu/~taesung_park/CycleGAN/datasets/', transform_train: Optional[List[trw.transforms.Transform]] = None, transform_valid: Optional[List[trw.transforms.Transform]] = None, aligned: bool = False, loading_batch_size: int = 4, nb_workers: int = 4) → trw.basic_typing.Datasets

Datasets used for image to image translation (domain A to domain B).

create_fake_symbols_datasset(nb_samples: int, image_shape: trw.basic_typing.ShapeX, dataset_name: str, shapes_fn: ShapeCreator, ratio_valid: float = 0.2, nb_classes_at_once: Optional[int] = None, global_scale_factor: float = 1.0, normalize_0_1: bool = True, noise_fn: Callable[[numpy.ndarray], numpy.ndarray] = functools.partial(_noisy, noise_type='poisson'), max_classes: Optional[int] = None, batch_size: int = 64, background: int = 255) → trw.basic_typing.Datasets

Create artificial 2D for classification and segmentation problems

_random_location(image_shape: numpy.ndarray, figure_shape) → numpy.ndarray

_random_color() → numpy.ndarray

_add_shape(imag, mask, shape, shapes_added, scale_factor, color, min_overlap_distance=30)

_create_image(shape, objects, nb_classes_at_once=None, max_classes=None, background=255)

param shape

the shape of an image [height, width]

_noisy(image: numpy.ndarray, noise_type: typing_extensions.Literal[_noisy.gauss, poisson, s&p, speckle]) → numpy.ndarray

param image

a numpy image (float) in range [0..255]

create_fake_symbols_2d_dataset(nb_samples: int, image_shape: trw.basic_typing.ShapeX, ratio_valid: float = 0.2, nb_classes_at_once: Optional[int] = None, global_scale_factor: float = 1.0, normalize_0_1: bool = True, noise_fn: Callable[[numpy.ndarray], numpy.ndarray] = functools.partial(_noisy, noise_type='poisson'), shapes_fn: trw.datasets.dataset_fake_symbols.ShapeCreator = default_shapes_2d, max_classes: Optional[int] = None, batch_size: int = 64, background: int = 255, dataset_name: str = 'fake_symbols_2d') → trw.basic_typing.Datasets

Create artificial 2D for classification and segmentation problems

default_shapes_2d(global_scale_factor=1.0)

create_fake_symbols_3d_dataset(nb_samples: int, image_shape: trw.basic_typing.ShapeX, ratio_valid: float = 0.2, nb_classes_at_once: Optional[int] = None, global_scale_factor: float = 1.0, normalize_0_1: bool = True, noise_fn: Callable[[numpy.ndarray], numpy.ndarray] = functools.partial(_noisy, noise_type='poisson'), shapes_fn: trw.datasets.dataset_fake_symbols.ShapeCreator = default_shapes_3d, max_classes: Optional[int] = None, batch_size: int = 64, background: int = 255, dataset_name: str = 'fake_symbols_3d') → trw.basic_typing.Datasets

Create artificial 2D for classification and segmentation problems

default_shapes_3d(global_scale_factor=1.0)

trw.datasets.create_mnist_dataset(batch_size: int = 1000, root: str = None, transforms: List[trw.transforms.Transform] = None, nb_workers: int = 5, data_processing_batch_size: int = 200, normalize_0_1: bool = False, select_classes_train: Optional[Sequence[int]] = None, select_classes_test: Optional[Sequence[int]] = None) Tuple[trw.basic_typing.Datasets, trw.basic_typing.DatasetsInfo]
Parameters
  • batch_size

  • root

  • transforms

  • nb_workers

  • data_processing_batch_size

  • normalize_0_1

  • select_classes_train – a subset of classes to be selected for the training split

  • select_classes_test – a subset of classes to be selected for the test split

Returns:

trw.datasets.create_mnist_cluttered_datasset(batch_size: int = 1000, cluttered_size: trw.basic_typing.ShapeX = (64, 64), clutter_window: trw.basic_typing.ShapeX = (6, 6), nb_clutter_windows: int = 16, root: Optional[str] = None, train_transforms: List[trw.transforms.Transform] = None, test_transforms: List[trw.transforms.Transform] = None, nb_workers: int = 5, data_processing_batch_size: int = 200, normalize_0_1: bool = False) Tuple[trw.basic_typing.Datasets, trw.basic_typing.DatasetsInfo]
Parameters
  • batch_size

  • cluttered_size – the size of the final image

  • root

  • clutter_window – the size of the random windows to create the clutter

  • nb_clutter_windows – the number of clutter windows added to the image

  • train_transforms – the transform function applied on the training batches

  • test_transforms – the transform function applied on the test batches

  • nb_workers – the number of workers to preprocess the dataset

  • data_processing_batch_size – the number of samples each worker process at once

  • normalize_0_1 – if True, the pixels will be in range [0..1]

Returns

datasets

trw.datasets.create_cifar10_dataset(batch_size: int = 300, root: Optional[str] = None, transform_train: Optional[List[trw.transforms.Transform]] = None, transform_valid: Optional[List[trw.transforms.Transform]] = None, nb_workers: int = 2, data_processing_batch_size: int = None, normalize_0_1: bool = True) trw.basic_typing.Datasets
trw.datasets.create_voc_detection_dataset(root: str = None, transform_train: Optional[List[trw.transforms.Transform]] = None, transform_valid: Optional[List[trw.transforms.Transform]] = None, nb_workers: int = 2, batch_size: int = 1, data_subsampling_fraction_train: float = 1.0, data_subsampling_fraction_valid: float = 1.0, train_split: str = 'train', valid_split: str = 'val', year: typing_extensions.Literal[2007, 2012] = '2007') trw.basic_typing.Datasets

PASCAL VOC detection challenge

Notes

  • Batch size is always 1 since we need to sample from the image various anchors, locations depending on the task (so each sample should be post-processed by a custom transform)

trw.datasets.create_voc_segmentation_dataset(batch_size: int = 40, root: Optional[str] = None, transform_train: Optional[List[trw.transforms.Transform]] = default_voc_transforms(), transform_valid: Optional[List[trw.transforms.Transform]] = None, nb_workers: int = 2, year: typing_extensions.Literal[2007, 2012] = '2012') trw.basic_typing.Datasets

Create the VOC segmentation dataset

Parameters
  • batch_size – the number of samples per batch

  • root – the root of the dataset

  • transform_train – the transform to apply on each batch of data of the training data

  • transform_valid – the transform to apply on each batch of data of the validation data

  • nb_workers – the number of worker process to pre-process the batches

  • year – the version of the dataset

Returns

a datasets with dataset voc2012 and splits train, valid.

trw.datasets.create_cityscapes_dataset(batch_size: int = 32, root: Optional[str] = None, transform_train: Optional[List[trw.transforms.Transform]] = None, transform_valid: Optional[List[trw.transforms.Transform]] = None, nb_workers: int = 4, target_type: typing_extensions.Literal[semantic] = 'semantic') trw.basic_typing.Datasets

Load the cityscapes dataset. This requires to register on their website https://www.cityscapes-dataset.com/ and manually download the dataset.

The dataset is composed of 3 parts: gtCoarse, gtFine, leftImg8bit. Download each package and unzip in a folder (e.g., cityscapes)

Parameters
  • batch_size

  • root – the folder containing the 3 unzipped cityscapes data gtCoarse, gtFine, leftImg8bit

  • transform_train – the transform to apply on the training batches

  • transform_valid – the transform to apply on the validation batches

  • nb_workers – the number of workers for each split allocated to the data loading and processing

  • target_type – the segmentation task

Returns

a dict of splits. Each split is a trw.train.Sequence

trw.datasets.create_facades_dataset(root: str = None, batch_size: int = 32, normalize_0_1: bool = True, transforms_train: Optional[List[trw.transforms.Transform]] = None, nb_workers=0, url: str = 'https://people.eecs.berkeley.edu/~tinghuiz/projects/pix2pix/datasets/facades.tar.gz') trw.basic_typing.Datasets
class trw.datasets.TinyImageNet(root, split='train', num_images_per_class=500)

Bases: torch.utils.data.Dataset

Tiny ImageNet data set available from http://cs231n.stanford.edu/tiny-imagenet-200.zip.

Notes

The test valid is discarded since we do not have the test labels

__len__(self)
__getitem__(self, index)
static read_image(path)
trw.datasets.create_tiny_imagenet_dataset(batch_size: int, num_images_per_class: int = 500, transforms_train: List[trw.transforms.Transform] = None, transforms_valid: List[trw.transforms.Transform] = None, nb_workers: int = 4, root: Optional[str] = None) Tuple[trw.basic_typing.Datasets, trw.basic_typing.DatasetsInfo]
trw.datasets.create_name_nationality_dataset(url: str = 'https://download.pytorch.org/tutorial/data.zip', root: Optional[str] = None, valid_ratio: float = 0.1, seed: int = 0, batch_size: int = 1) trw.basic_typing.Datasets
trw.datasets.create_decathlon_dataset(task_name: str, root: str = None, transform_train: trw.transforms.Transform = None, transform_valid: trw.transforms.Transform = None, nb_workers: int = 4, valid_ratio: float = 0.2, batch_size: int = 1, remove_patient_transform: bool = False) trw.basic_typing.Datasets

Create a task of the medical decathlon dataset.

The dataset is available here http://medicaldecathlon.com/ with accompanying publication: https://arxiv.org/abs/1902.09063

Parameters
  • task_name – the name of the task

  • root – the root folder where the data will be created and possibly downloaded

  • transform_train – a function that take a batch of training data and return a transformed batch

  • transform_valid – a function that take a batch of valid data and return a transformed batch

  • nb_workers – the number of workers used for the preprocessing

  • valid_ratio – the ratio of validation data

  • batch_size – the batch size

  • remove_patient_transform – if True, remove the affine transformation attached to the voxels

Returns

a dictionary of datasets

trw.datasets.create_cycle_gan_dataset(dataset_name: cycle_gan_dataset, batch_size: int = 32, root: Optional[str] = None, url: str = 'https://people.eecs.berkeley.edu/~taesung_park/CycleGAN/datasets/', transform_train: Optional[List[trw.transforms.Transform]] = None, transform_valid: Optional[List[trw.transforms.Transform]] = None, aligned: bool = False, loading_batch_size: int = 4, nb_workers: int = 4) trw.basic_typing.Datasets

Datasets used for image to image translation (domain A to domain B).

Parameters
  • dataset_name – the name of the dataset

  • batch_size – the size of each batch

  • root – the root path where to store the dataset

  • url – specify the URL from which the dataset is downloaded

  • transform_train – transform applied to train dataset

  • transform_valid – transform applied to valid dataset

  • aligned – if True, the images A and B will be considered aligned. If False, B will be randomly sampled from the list of available images in B

  • nb_workers – the number of workers to process the images

  • loading_batch_size – the number of images loaded by a worker

Returns

a dataset

trw.datasets.create_fake_symbols_datasset(nb_samples: int, image_shape: trw.basic_typing.ShapeX, dataset_name: str, shapes_fn: ShapeCreator, ratio_valid: float = 0.2, nb_classes_at_once: Optional[int] = None, global_scale_factor: float = 1.0, normalize_0_1: bool = True, noise_fn: Callable[[numpy.ndarray], numpy.ndarray] = functools.partial(_noisy, noise_type='poisson'), max_classes: Optional[int] = None, batch_size: int = 64, background: int = 255) trw.basic_typing.Datasets

Create artificial 2D for classification and segmentation problems

This dataset will randomly create shapes at random location & color with a segmentation map.

Parameters
  • nb_samples – the number of samples to be generated

  • image_shape – the shape of an image [height, width]

  • ratio_valid – the ratio of samples to be used for the validation split

  • nb_classes_at_once – the number of classes to be included in each sample. If None, all the classes will be included

  • global_scale_factor – the scale of the shapes to generate

  • noise_fn – a function to create noise in the image

  • shapes_fn – the function to create the different shapes

  • normalize_0_1 – if True, the data will be normalized (i.e., image & position will be in range [0..1])

  • max_classes – the total number of classes available

  • batch_size – the size of the batch for the dataset

  • background – the background value of the sample (before normalization if normalize_0_1 is True)

  • dataset_name – the name of the returned dataset

Returns

a dict containing the dataset fake_symbols_2d with train and valid splits with features image, mask, classification, <shape_name>_center

trw.datasets._random_location(image_shape: numpy.ndarray, figure_shape) numpy.ndarray
trw.datasets._random_color() numpy.ndarray
trw.datasets._add_shape(imag, mask, shape, shapes_added, scale_factor, color, min_overlap_distance=30)
trw.datasets._create_image(shape, objects, nb_classes_at_once=None, max_classes=None, background=255)
Parameters
  • shape – the shape of an image [height, width]

  • nb_classes_at_once – the number of classes to be included in each sample. If None, all the classes will be included

  • max_classes – the maximum number of classes to be used. If None, all classes can be used, else a random subset

Returns

image, mask and shape information

trw.datasets._noisy(image: numpy.ndarray, noise_type: typing_extensions.Literal[_noisy.gauss, poisson, s & p, speckle]) numpy.ndarray
Parameters
  • image – a numpy image (float) in range [0..255]

  • noise_type – the type of noise. Must be one of:

  • noise. (* 'gauss' Gaussian-distributed additive) –

  • data. (* 'poisson' Poisson-distributed noise generated from the) –

  • 1. (* 's&p' Replaces random pixels with 0 or) –

  • n*image (* 'speckle' Multiplicative noise using out = image +) – uniform noise with specified mean & variance

  • is (where n) – uniform noise with specified mean & variance

Returns

noisy image

trw.datasets.create_fake_symbols_2d_dataset(nb_samples: int, image_shape: trw.basic_typing.ShapeX, ratio_valid: float = 0.2, nb_classes_at_once: Optional[int] = None, global_scale_factor: float = 1.0, normalize_0_1: bool = True, noise_fn: Callable[[numpy.ndarray], numpy.ndarray] = functools.partial(_noisy, noise_type='poisson'), shapes_fn: trw.datasets.dataset_fake_symbols.ShapeCreator = default_shapes_2d, max_classes: Optional[int] = None, batch_size: int = 64, background: int = 255, dataset_name: str = 'fake_symbols_2d') trw.basic_typing.Datasets

Create artificial 2D for classification and segmentation problems

This dataset will randomly create shapes at random location & color with a segmentation map.

Parameters
  • nb_samples – the number of samples to be generated

  • image_shape – the shape of an image [height, width]

  • ratio_valid – the ratio of samples to be used for the validation split

  • nb_classes_at_once – the number of classes to be included in each sample. If None, all the classes will be included

  • global_scale_factor – the scale of the shapes to generate

  • noise_fn – a function to create noise in the image

  • shapes_fn – the function to create the different shapes

  • normalize_0_1 – if True, the data will be normalized (i.e., image & position will be in range [0..1])

  • max_classes – the total number of classes available

  • batch_size – the size of the batch for the dataset

  • background – the background value of the sample (before normalization if normalize_0_1 is True)

  • dataset_name – the name of the returned dataset

Returns

a dict containing the dataset fake_symbols_2d with train and valid splits with features image, mask, classification, <shape_name>_center

trw.datasets.default_shapes_2d(global_scale_factor=1.0)
trw.datasets.create_fake_symbols_3d_dataset(nb_samples: int, image_shape: trw.basic_typing.ShapeX, ratio_valid: float = 0.2, nb_classes_at_once: Optional[int] = None, global_scale_factor: float = 1.0, normalize_0_1: bool = True, noise_fn: Callable[[numpy.ndarray], numpy.ndarray] = functools.partial(_noisy, noise_type='poisson'), shapes_fn: trw.datasets.dataset_fake_symbols.ShapeCreator = default_shapes_3d, max_classes: Optional[int] = None, batch_size: int = 64, background: int = 255, dataset_name: str = 'fake_symbols_3d') trw.basic_typing.Datasets

Create artificial 2D for classification and segmentation problems

This dataset will randomly create shapes at random location & color with a segmentation map.

Parameters
  • nb_samples – the number of samples to be generated

  • image_shape – the shape of an image [height, width]

  • ratio_valid – the ratio of samples to be used for the validation split

  • nb_classes_at_once – the number of classes to be included in each sample. If None, all the classes will be included

  • global_scale_factor – the scale of the shapes to generate

  • noise_fn – a function to create noise in the image

  • shapes_fn – the function to create the different shapes

  • normalize_0_1 – if True, the data will be normalized (i.e., image & position will be in range [0..1])

  • max_classes – the total number of classes available

  • batch_size – the size of the batch for the dataset

  • background – the background value of the sample (before normalization if normalize_0_1 is True)

  • dataset_name – the name of the returned dataset

Returns

a dict containing the dataset fake_symbols_2d with train and valid splits with features image, mask, classification, <shape_name>_center

trw.datasets.default_shapes_3d(global_scale_factor=1.0)