Module pysimt.datasets

A dataset in pysimt inherits from torch.nn.Dataset and is designed to read and expose a specific type of corpus.

  • A dataset class name should end with the Dataset suffix.
  • The __init__ method should include **kwargs for other possible arguments.
  • The __getitem__ and __len__ methods should be implemented.
  • A static method to_torch(batch, **kwargs) is automatically used when preparing the batch tensor during forward-pass.

Please see pysimt.datasets.TextDataset to get an idea on how to implement a new dataset.

Expand source code
"""
A dataset in `pysimt` inherits from `torch.nn.Dataset` and is designed
to read and expose a specific type of corpus.

* A dataset class name should end with the `Dataset` suffix.
* The `__init__` method should include `**kwargs` for other possible arguments.
* The `__getitem__` and `__len__` methods should be implemented.
* A static method `to_torch(batch, **kwargs)` is automatically used when
  preparing the batch tensor during forward-pass.

Please see `pysimt.datasets.TextDataset` to get an idea on how to implement
a new dataset.

"""

from .numpy import NumpyDataset
from .text import TextDataset
from .objdet import ObjectDetectionsDataset


# Second the selector function
def get_dataset(type_):
    return {
        'numpy': NumpyDataset,
        'text': TextDataset,
        'objectdetections': ObjectDetectionsDataset,
    }[type_.lower()]


# Should always be at the end
from .multimodal import MultimodalDataset   # noqa

Sub-modules

pysimt.datasets.base
pysimt.datasets.collate
pysimt.datasets.imagefolder
pysimt.datasets.kaldi
pysimt.datasets.multimodal
pysimt.datasets.numpy
pysimt.datasets.objdet
pysimt.datasets.text

Functions

def get_dataset(type_)
Expand source code
def get_dataset(type_):
    return {
        'numpy': NumpyDataset,
        'text': TextDataset,
        'objectdetections': ObjectDetectionsDataset,
    }[type_.lower()]