<!DOCTYPE html>
TensorFlow Datasets¶
In this reading notebook, we will take a look at the tensorflow-datasets
library.
We have previously made use of the tf.keras.datasets
package, which gave us access to a variety of useful datasets such as the IMDB movie dataset, the CIFAR-100 small image classification dataset, and the MNIST handwritten digits dataset.
The tensorflow-datasets
library gives us another means of accessing a variety of useful datasets.
Installation¶
The tensorflow-datasets
library is installed independently of TensorFlow itself. It can be installed using pip, by running the following command in a terminal (assuming that tensorflow
has already been installed):
pip install tensorflow-datasets
Listing the Available Datasets¶
The list_builders
function can be used to list the available datasets.
# List available datasets
import tensorflow_datasets as tfds
tfds.list_builders()
Loading a Dataset¶
Loading a particular dataset is straightforward; simply use the load
function, specifying the name
and any other keyword arguments. In the example code below, we demonstrate how to load the kmnist
dataset using the function.
As stated on the documentation page, running the function with split=None
(the default) returns a dictionary of the splits, with the keys test
and train
.
# Load the mnist_corrupted dataset
kmnist = tfds.load(name="kmnist", split=None)
kmnist_train = kmnist['train']
kmnist_test = kmnist['test']
# View some examples from the dataset
import matplotlib.pyplot as plt
%matplotlib inline
import tensorflow as tf
fig, axes = plt.subplots(3, 3, figsize=(8, 8))
fig.subplots_adjust(hspace=0.2, wspace=0.1)
for i, (elem, ax) in enumerate(zip(kmnist_train, axes.flat)):
image = tf.squeeze(elem['image'])
label = elem['label']
ax.imshow(image, cmap='gray')
ax.text(0.7, -0.12, f'Digit = {label}', ha='right',
transform=ax.transAxes, color='black')
ax.set_xticks([])
ax.set_yticks([])