Adding weight initialisers

目录

<!DOCTYPE html>

Weight Initializers

Weight and bias initialisers

In this reading we investigate different ways to initialise weights and biases in the layers of neural networks.

In [1]:
%matplotlib inline
import tensorflow as tf
import pandas as pd
print(tf.__version__)
2.0.0

Default weights and biases

In the models we have worked with so far, we have not specified the initial values of the weights and biases in each layer of our neural networks.

The default values of the weights and biases in TensorFlow depend on the type of layers we are using.

For example, in a Dense layer, the biases are set to zero (zeros) by default, while the weights are set according to glorot_uniform, the Glorot uniform initialiser.

The Glorot uniform initialiser draws the weights uniformly at random from the closed interval $[-c,c]$, where $$c = \sqrt{\frac{6}{n_{input}+n_{output}}}$$

and $n_{input}$ and $n_{output}$ are the number of inputs to, and outputs from the layer respectively.

Initialising your own weights and biases

We often would like to initialise our own weights and biases, and TensorFlow makes this process quite straightforward.

When we construct a model in TensorFlow, each layer has optional arguments kernel_initialiser and bias_initialiser, which are used to set the weights and biases respectively.

If a layer has no weights or biases (e.g. it is a max pooling layer), then trying to set either kernel_initialiser or bias_initialiser will throw an error.

Let's see an example, which uses some of the different initialisations available in Keras.

In [10]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense, Conv1D, MaxPooling1D 
In [16]:
# Construct a model

model = Sequential([
    Conv1D(filters=16, kernel_size=3, input_shape=(128, 64), kernel_initializer='random_uniform', bias_initializer="zeros", activation='relu'),
    MaxPooling1D(pool_size=4),
    Flatten(),
    Dense(64, kernel_initializer='he_uniform', bias_initializer='ones', activation='relu'),
])

As the following example illustrates, we can also instantiate initialisers in a slightly different manner, allowing us to set optional arguments of the initialisation method.

In [17]:
model.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv1d_2 (Conv1D)            (None, 126, 16)           3088      
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 31, 16)            0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 496)               0         
_________________________________________________________________
dense_9 (Dense)              (None, 64)                31808     
=================================================================
Total params: 34,896
Trainable params: 34,896
Non-trainable params: 0
_________________________________________________________________
In [18]:
# Add some layers to our model

model.add(Dense(64, 
                kernel_initializer=tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.05), 
                bias_initializer=tf.keras.initializers.Constant(value=0.4), 
                activation='relu'),)

model.add(Dense(8, 
                kernel_initializer=tf.keras.initializers.Orthogonal(gain=1.0, seed=None), 
                bias_initializer=tf.keras.initializers.Constant(value=0.4), 
                activation='relu'))

Custom weight and bias initialisers

It is also possible to define your own weight and bias initialisers. Initializers must take in two arguments, the shape of the tensor to be initialised, and its dtype.

Here is a small example, which also shows how you can use your custom initializer in a layer.

In [19]:
import tensorflow.keras.backend as K
In [20]:
# Define a custom initializer

def my_init(shape, dtype=None):
    return K.random_normal(shape, dtype=dtype)

model.add(Dense(64, kernel_initializer=my_init))

Let's take a look at the summary of our finalised model.

In [21]:
# Print the model summary

model.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv1d_2 (Conv1D)            (None, 126, 16)           3088      
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 31, 16)            0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 496)               0         
_________________________________________________________________
dense_9 (Dense)              (None, 64)                31808     
_________________________________________________________________
dense_10 (Dense)             (None, 64)                4160      
_________________________________________________________________
dense_11 (Dense)             (None, 8)                 520       
_________________________________________________________________
dense_12 (Dense)             (None, 64)                576       
=================================================================
Total params: 40,152
Trainable params: 40,152
Non-trainable params: 0
_________________________________________________________________

Visualising the initialised weights and biases

Finally, we can see the effect of our initialisers on the weights and biases by plotting histograms of the resulting values. Compare these plots with the selected initialisers for each layer above.

In [22]:
import matplotlib.pyplot as plt
In [23]:
# Plot histograms of weight and bias values

fig, axes = plt.subplots(5, 2, figsize=(12,16))
fig.subplots_adjust(hspace=0.5, wspace=0.5)

# Filter out the pooling and flatten layers, that don't have any weights
weight_layers = [layer for layer in model.layers if len(layer.weights) > 0]

for i, layer in enumerate(weight_layers):
    for j in [0, 1]:
        axes[i, j].hist(layer.weights[j].numpy().flatten(), align='left')
        axes[i, j].set_title(layer.weights[j].name)