Tensorflow Datasets

目录

<!DOCTYPE html>

TensorFlow_Datasets

TensorFlow Datasets

In this reading notebook, we will take a look at the tensorflow-datasets library.

We have previously made use of the tf.keras.datasets package, which gave us access to a variety of useful datasets such as the IMDB movie dataset, the CIFAR-100 small image classification dataset, and the MNIST handwritten digits dataset.

The tensorflow-datasets library gives us another means of accessing a variety of useful datasets.

Installation

The tensorflow-datasets library is installed independently of TensorFlow itself. It can be installed using pip, by running the following command in a terminal (assuming that tensorflow has already been installed):

pip install tensorflow-datasets

Listing the Available Datasets

The list_builders function can be used to list the available datasets.

In [ ]:
# List available datasets

import tensorflow_datasets as tfds

tfds.list_builders()

Loading a Dataset

Loading a particular dataset is straightforward; simply use the load function, specifying the name and any other keyword arguments. In the example code below, we demonstrate how to load the kmnist dataset using the function.

As stated on the documentation page, running the function with split=None (the default) returns a dictionary of the splits, with the keys test and train.

In [ ]:
# Load the mnist_corrupted dataset

kmnist  = tfds.load(name="kmnist", split=None)
kmnist_train = kmnist['train']
kmnist_test = kmnist['test']
In [ ]:
# View some examples from the dataset

import matplotlib.pyplot as plt
%matplotlib inline
import tensorflow as tf

fig, axes = plt.subplots(3, 3, figsize=(8, 8))
fig.subplots_adjust(hspace=0.2, wspace=0.1)

for i, (elem, ax) in enumerate(zip(kmnist_train, axes.flat)):
    image = tf.squeeze(elem['image'])
    label = elem['label']
    
    ax.imshow(image, cmap='gray')
    ax.text(0.7, -0.12, f'Digit = {label}', ha='right',
            transform=ax.transAxes, color='black')
    ax.set_xticks([])
    ax.set_yticks([])