trw.datasets
¶
Submodules¶
trw.datasets.cifar10
trw.datasets.cityscapes
trw.datasets.cyclegan
trw.datasets.dataset_fake_symbols
trw.datasets.dataset_fake_symbols_2d
trw.datasets.dataset_fake_symbols_3d
trw.datasets.facades
trw.datasets.medical_decathlon
trw.datasets.mnist
trw.datasets.mnist_cluttered
trw.datasets.name_nationality
trw.datasets.tiny_imagenet
trw.datasets.utils
trw.datasets.voc
Package Contents¶
Classes¶
Tiny ImageNet data set available from http://cs231n.stanford.edu/tiny-imagenet-200.zip. |
Functions¶
|
|
|
|
|
|
|
PASCAL VOC detection challenge |
|
Create the VOC segmentation dataset |
|
Load the cityscapes dataset. This requires to register on their website https://www.cityscapes-dataset.com/ |
|
|
|
|
|
|
|
Create a task of the medical decathlon dataset. |
|
Datasets used for image to image translation (domain A to domain B). |
|
Create artificial 2D for classification and segmentation problems |
|
|
|
|
|
|
|
|
|
|
|
Create artificial 2D for classification and segmentation problems |
|
|
|
Create artificial 2D for classification and segmentation problems |
|
- trw.datasets.create_mnist_dataset(batch_size: int = 1000, root: str = None, transforms: List[trw.transforms.Transform] = None, nb_workers: int = 5, data_processing_batch_size: int = 200, normalize_0_1: bool = False, select_classes_train: Optional[Sequence[int]] = None, select_classes_test: Optional[Sequence[int]] = None) Tuple[trw.basic_typing.Datasets, trw.basic_typing.DatasetsInfo] ¶
- Parameters
batch_size –
root –
transforms –
nb_workers –
data_processing_batch_size –
normalize_0_1 –
select_classes_train – a subset of classes to be selected for the training split
select_classes_test – a subset of classes to be selected for the test split
Returns:
- trw.datasets.create_mnist_cluttered_datasset(batch_size: int = 1000, cluttered_size: trw.basic_typing.ShapeX = (64, 64), clutter_window: trw.basic_typing.ShapeX = (6, 6), nb_clutter_windows: int = 16, root: Optional[str] = None, train_transforms: List[trw.transforms.Transform] = None, test_transforms: List[trw.transforms.Transform] = None, nb_workers: int = 5, data_processing_batch_size: int = 200, normalize_0_1: bool = False) Tuple[trw.basic_typing.Datasets, trw.basic_typing.DatasetsInfo] ¶
- Parameters
batch_size –
cluttered_size – the size of the final image
root –
clutter_window – the size of the random windows to create the clutter
nb_clutter_windows – the number of clutter windows added to the image
train_transforms – the transform function applied on the training batches
test_transforms – the transform function applied on the test batches
nb_workers – the number of workers to preprocess the dataset
data_processing_batch_size – the number of samples each worker process at once
normalize_0_1 – if True, the pixels will be in range [0..1]
- Returns
datasets
- trw.datasets.create_cifar10_dataset(batch_size: int = 300, root: Optional[str] = None, transform_train: Optional[List[trw.transforms.Transform]] = None, transform_valid: Optional[List[trw.transforms.Transform]] = None, nb_workers: int = 2, data_processing_batch_size: int = None, normalize_0_1: bool = True) trw.basic_typing.Datasets ¶
- trw.datasets.create_voc_detection_dataset(root: str = None, transform_train: Optional[List[trw.transforms.Transform]] = None, transform_valid: Optional[List[trw.transforms.Transform]] = None, nb_workers: int = 2, batch_size: int = 1, data_subsampling_fraction_train: float = 1.0, data_subsampling_fraction_valid: float = 1.0, train_split: str = 'train', valid_split: str = 'val', year: typing_extensions.Literal[2007, 2012] = '2007') trw.basic_typing.Datasets ¶
PASCAL VOC detection challenge
Notes
Batch size is always 1 since we need to sample from the image various anchors, locations depending on the task (so each sample should be post-processed by a custom transform)
- trw.datasets.create_voc_segmentation_dataset(batch_size: int = 40, root: Optional[str] = None, transform_train: Optional[List[trw.transforms.Transform]] = default_voc_transforms(), transform_valid: Optional[List[trw.transforms.Transform]] = None, nb_workers: int = 2, year: typing_extensions.Literal[2007, 2012] = '2012') trw.basic_typing.Datasets ¶
Create the VOC segmentation dataset
- Parameters
batch_size – the number of samples per batch
root – the root of the dataset
transform_train – the transform to apply on each batch of data of the training data
transform_valid – the transform to apply on each batch of data of the validation data
nb_workers – the number of worker process to pre-process the batches
year – the version of the dataset
- Returns
a datasets with dataset voc2012 and splits train, valid.
- trw.datasets.create_cityscapes_dataset(batch_size: int = 32, root: Optional[str] = None, transform_train: Optional[List[trw.transforms.Transform]] = None, transform_valid: Optional[List[trw.transforms.Transform]] = None, nb_workers: int = 4, target_type: typing_extensions.Literal[semantic] = 'semantic') trw.basic_typing.Datasets ¶
Load the cityscapes dataset. This requires to register on their website https://www.cityscapes-dataset.com/ and manually download the dataset.
The dataset is composed of 3 parts: gtCoarse, gtFine, leftImg8bit. Download each package and unzip in a folder (e.g., cityscapes)
- Parameters
batch_size –
root – the folder containing the 3 unzipped cityscapes data gtCoarse, gtFine, leftImg8bit
transform_train – the transform to apply on the training batches
transform_valid – the transform to apply on the validation batches
nb_workers – the number of workers for each split allocated to the data loading and processing
target_type – the segmentation task
- Returns
a dict of splits. Each split is a
trw.train.Sequence
- trw.datasets.create_facades_dataset(root: str = None, batch_size: int = 32, normalize_0_1: bool = True, transforms_train: Optional[List[trw.transforms.Transform]] = None, nb_workers=0, url: str = 'https://people.eecs.berkeley.edu/~tinghuiz/projects/pix2pix/datasets/facades.tar.gz') trw.basic_typing.Datasets ¶
- class trw.datasets.TinyImageNet(root, split='train', num_images_per_class=500)¶
Bases:
torch.utils.data.Dataset
Tiny ImageNet data set available from http://cs231n.stanford.edu/tiny-imagenet-200.zip.
Notes
The test valid is discarded since we do not have the test labels
- __len__(self)¶
- __getitem__(self, index)¶
- static read_image(path)¶
- trw.datasets.create_tiny_imagenet_dataset(batch_size: int, num_images_per_class: int = 500, transforms_train: List[trw.transforms.Transform] = None, transforms_valid: List[trw.transforms.Transform] = None, nb_workers: int = 4, root: Optional[str] = None) Tuple[trw.basic_typing.Datasets, trw.basic_typing.DatasetsInfo] ¶
- trw.datasets.create_name_nationality_dataset(url: str = 'https://download.pytorch.org/tutorial/data.zip', root: Optional[str] = None, valid_ratio: float = 0.1, seed: int = 0, batch_size: int = 1) trw.basic_typing.Datasets ¶
- trw.datasets.create_decathlon_dataset(task_name: str, root: str = None, transform_train: trw.transforms.Transform = None, transform_valid: trw.transforms.Transform = None, nb_workers: int = 4, valid_ratio: float = 0.2, batch_size: int = 1, remove_patient_transform: bool = False) trw.basic_typing.Datasets ¶
Create a task of the medical decathlon dataset.
The dataset is available here http://medicaldecathlon.com/ with accompanying publication: https://arxiv.org/abs/1902.09063
- Parameters
task_name – the name of the task
root – the root folder where the data will be created and possibly downloaded
transform_train – a function that take a batch of training data and return a transformed batch
transform_valid – a function that take a batch of valid data and return a transformed batch
nb_workers – the number of workers used for the preprocessing
valid_ratio – the ratio of validation data
batch_size – the batch size
remove_patient_transform – if
True
, remove the affine transformation attached to the voxels
- Returns
a dictionary of datasets
- trw.datasets.create_cycle_gan_dataset(dataset_name: cycle_gan_dataset, batch_size: int = 32, root: Optional[str] = None, url: str = 'https://people.eecs.berkeley.edu/~taesung_park/CycleGAN/datasets/', transform_train: Optional[List[trw.transforms.Transform]] = None, transform_valid: Optional[List[trw.transforms.Transform]] = None, aligned: bool = False, loading_batch_size: int = 4, nb_workers: int = 4) trw.basic_typing.Datasets ¶
Datasets used for image to image translation (domain A to domain B).
- Parameters
dataset_name – the name of the dataset
batch_size – the size of each batch
root – the root path where to store the dataset
url – specify the URL from which the dataset is downloaded
transform_train – transform applied to train dataset
transform_valid – transform applied to valid dataset
aligned – if True, the images A and B will be considered aligned. If False, B will be randomly sampled from the list of available images in B
nb_workers – the number of workers to process the images
loading_batch_size – the number of images loaded by a worker
- Returns
a dataset
- trw.datasets.create_fake_symbols_datasset(nb_samples: int, image_shape: trw.basic_typing.ShapeX, dataset_name: str, shapes_fn: ShapeCreator, ratio_valid: float = 0.2, nb_classes_at_once: Optional[int] = None, global_scale_factor: float = 1.0, normalize_0_1: bool = True, noise_fn: Callable[[numpy.ndarray], numpy.ndarray] = functools.partial(_noisy, noise_type='poisson'), max_classes: Optional[int] = None, batch_size: int = 64, background: int = 255) trw.basic_typing.Datasets ¶
Create artificial 2D for classification and segmentation problems
This dataset will randomly create shapes at random location & color with a segmentation map.
- Parameters
nb_samples – the number of samples to be generated
image_shape – the shape of an image [height, width]
ratio_valid – the ratio of samples to be used for the validation split
nb_classes_at_once – the number of classes to be included in each sample. If None, all the classes will be included
global_scale_factor – the scale of the shapes to generate
noise_fn – a function to create noise in the image
shapes_fn – the function to create the different shapes
normalize_0_1 – if True, the data will be normalized (i.e., image & position will be in range [0..1])
max_classes – the total number of classes available
batch_size – the size of the batch for the dataset
background – the background value of the sample (before normalization if normalize_0_1 is True)
dataset_name – the name of the returned dataset
- Returns
a dict containing the dataset fake_symbols_2d with train and valid splits with features image, mask, classification, <shape_name>_center
- trw.datasets._random_location(image_shape: numpy.ndarray, figure_shape) numpy.ndarray ¶
- trw.datasets._random_color() numpy.ndarray ¶
- trw.datasets._add_shape(imag, mask, shape, shapes_added, scale_factor, color, min_overlap_distance=30)¶
- trw.datasets._create_image(shape, objects, nb_classes_at_once=None, max_classes=None, background=255)¶
- Parameters
shape – the shape of an image [height, width]
nb_classes_at_once – the number of classes to be included in each sample. If None, all the classes will be included
max_classes – the maximum number of classes to be used. If None, all classes can be used, else a random subset
- Returns
image, mask and shape information
- trw.datasets._noisy(image: numpy.ndarray, noise_type: typing_extensions.Literal[_noisy.gauss, poisson, s & p, speckle]) numpy.ndarray ¶
- Parameters
image – a numpy image (float) in range [0..255]
noise_type – the type of noise. Must be one of:
noise. (* 'gauss' Gaussian-distributed additive) –
data. (* 'poisson' Poisson-distributed noise generated from the) –
1. (* 's&p' Replaces random pixels with 0 or) –
n*image (* 'speckle' Multiplicative noise using out = image +) – uniform noise with specified mean & variance
is (where n) – uniform noise with specified mean & variance
- Returns
noisy image
- trw.datasets.create_fake_symbols_2d_dataset(nb_samples: int, image_shape: trw.basic_typing.ShapeX, ratio_valid: float = 0.2, nb_classes_at_once: Optional[int] = None, global_scale_factor: float = 1.0, normalize_0_1: bool = True, noise_fn: Callable[[numpy.ndarray], numpy.ndarray] = functools.partial(_noisy, noise_type='poisson'), shapes_fn: trw.datasets.dataset_fake_symbols.ShapeCreator = default_shapes_2d, max_classes: Optional[int] = None, batch_size: int = 64, background: int = 255, dataset_name: str = 'fake_symbols_2d') trw.basic_typing.Datasets ¶
Create artificial 2D for classification and segmentation problems
This dataset will randomly create shapes at random location & color with a segmentation map.
- Parameters
nb_samples – the number of samples to be generated
image_shape – the shape of an image [height, width]
ratio_valid – the ratio of samples to be used for the validation split
nb_classes_at_once – the number of classes to be included in each sample. If None, all the classes will be included
global_scale_factor – the scale of the shapes to generate
noise_fn – a function to create noise in the image
shapes_fn – the function to create the different shapes
normalize_0_1 – if True, the data will be normalized (i.e., image & position will be in range [0..1])
max_classes – the total number of classes available
batch_size – the size of the batch for the dataset
background – the background value of the sample (before normalization if normalize_0_1 is True)
dataset_name – the name of the returned dataset
- Returns
a dict containing the dataset fake_symbols_2d with train and valid splits with features image, mask, classification, <shape_name>_center
- trw.datasets.default_shapes_2d(global_scale_factor=1.0)¶
- trw.datasets.create_fake_symbols_3d_dataset(nb_samples: int, image_shape: trw.basic_typing.ShapeX, ratio_valid: float = 0.2, nb_classes_at_once: Optional[int] = None, global_scale_factor: float = 1.0, normalize_0_1: bool = True, noise_fn: Callable[[numpy.ndarray], numpy.ndarray] = functools.partial(_noisy, noise_type='poisson'), shapes_fn: trw.datasets.dataset_fake_symbols.ShapeCreator = default_shapes_3d, max_classes: Optional[int] = None, batch_size: int = 64, background: int = 255, dataset_name: str = 'fake_symbols_3d') trw.basic_typing.Datasets ¶
Create artificial 2D for classification and segmentation problems
This dataset will randomly create shapes at random location & color with a segmentation map.
- Parameters
nb_samples – the number of samples to be generated
image_shape – the shape of an image [height, width]
ratio_valid – the ratio of samples to be used for the validation split
nb_classes_at_once – the number of classes to be included in each sample. If None, all the classes will be included
global_scale_factor – the scale of the shapes to generate
noise_fn – a function to create noise in the image
shapes_fn – the function to create the different shapes
normalize_0_1 – if True, the data will be normalized (i.e., image & position will be in range [0..1])
max_classes – the total number of classes available
batch_size – the size of the batch for the dataset
background – the background value of the sample (before normalization if normalize_0_1 is True)
dataset_name – the name of the returned dataset
- Returns
a dict containing the dataset fake_symbols_2d with train and valid splits with features image, mask, classification, <shape_name>_center
- trw.datasets.default_shapes_3d(global_scale_factor=1.0)¶