Medical Imaging is one of the major applications of ML and the clinically-relevant task of metastatic breast cancer detection can be framed as a straight-forward binary image classification task. In this post we architect a custom CNN model for cancer detection using Tensorflow/Keras. In a following post we will train and evaluate it and predefined architectures using the PatchCamelyon dataset.

Architecture

We saw in a previous post that most modern CNN models trace there heritage back to the VGG architecture. It remains a good idea to follow the general architectural principles of the VGG models as a start. The modular structure of the architecture is easy to understand and implement.

VGG Blocks

The basic building block of VGG is a sequence of (i) a convolutional layer with small 3×3 filters, (ii) a nonlinearity such as a ReLU, (iii) a pooling layer. These blocks are repeated where the number of filters in each block is increased with the depth of the network such as 32, 64, 128. Padding is used on the convolutional layers to ensure the height and width of the output feature maps matches the inputs. This defines the feature detector part of the model. A three-block VGG-style architecture can be defined in Keras as follows:

# VGG-Block 1:
model = Sequential()
model.add(Conv2D(32, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))

# VGG-Block 2:
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))

# VGG-Block 3:
model.add(Conv2D(128, (3, 3),padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))

This must be coupled with a classification head that interprets the features and makes a class prediction. First, the feature maps output from the feature extraction part of the model must be flattened. We can then interpret them with one or more fully connected layers, and then output a prediction. The output layer must have 2 nodes for binary classification and use the sigmoid activation function.

# Head:
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dense(2))
model.add(Activation("sigmoid"))

Dropout Regularization

Dropout is a technique that randomly drops nodes out of the network in an attempt to regularize and stop overfitting. It has a regularizing effect as the remaining nodes must adapt to compensate for the removed nodes. Dropout can be added to the model by adding new Dropout layers, where the dropout rate of nodes is specified as a parameter. No fixed rules exist for adding Dropout to a model and in this case, we will add Dropout layers after each max pooling layer and after the flatten layer, and use a fixed dropout rate of 25%.

# VGG-Block 1:
model.add(Conv2D(32, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

# VGG-Block 2:
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

# VGG-Block 3:
model.add(Conv2D(128, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))

# Head:
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dense(2))
model.add(Activation("sigmoid"))

Batch Normalization

Getting a network to converge in a reasonable amount of time can be difficult. Batch normalization is a popular technique that consistently accelerates the convergence of deep networks. For each training iteration it normalizes the inputs of the layer by subtracting their mean and dividing by their standard deviation where both are estimated based on the statistics of the current minibatch. It then we applies a scale coefficient and a scale offset.

# VGG-Block 1:
model.add(Conv2D(32, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=-1))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

# VGG-Block 2:
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=-1))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

# VGG-Block 3:
model.add(Conv2D(128, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=-1))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))

# Head:
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dense(2))
model.add(Activation("sigmoid"))

Depthwise Separable Convolution

A spatial separable convolution simply divides a kernel into two, smaller kernels e.g. divide a 3x3 kernel into a 3x1 and 1x3 kernel. Depthwise separable convolutions also considers image depth (RGB channels) and not just with the spatial dimensions. By doing this the number of multiplications is greatly reduced and the number of trainable parameters goes down making it an attractive option for smaller datasets.

# VGG-Block 1:
model.add(SeparableConv2D(32, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=-1))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

# VGG-Block 2:
model.add(SeparableConv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=-1))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

# VGG-Block 3:
model.add(SeparableConv2D(128, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=-1))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))

# Head:
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dense(2))
model.add(Activation("sigmoid"))

Applying the model

Although the architecture was descibed above using the Keras Sequential API it was actually battle tested using the Keras Functional API and wrapped in a class. For completeness the code is pasted below or can be see the project it is part of at github.

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import SeparableConv2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Input, Dense, Dropout
from tensorflow.keras.layers import Flatten, Activation
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import GlobalAveragePooling2D
from tensorflow.keras.layers import GlobalMaxPooling2D
from tensorflow.keras.layers import Concatenate

class CancerNet(tf.keras.Model):

    def __init__(self, classes=1, **kwargs):
        super(CancerNet, self).__init__()

        chanDim = -1

        # VGG-Block 1:
        self.block1_conv = SeparableConv2D(32, (3, 3), padding="same")
        self.block1_act = Activation("relu")
        self.block1_norm = BatchNormalization(axis=chanDim)
        self.block1_pool = MaxPooling2D(pool_size=(2, 2))
        self.block1_drop = Dropout(0.25)

        # VGG-Block 2:
        self.block2_conv1 = SeparableConv2D(64, (3, 3), padding="same")
        self.block2_act1 = Activation("relu")
        self.block2_norm1 = BatchNormalization(axis=chanDim)
        self.block2_conv2 = SeparableConv2D(64, (3, 3), padding="same")
        self.block2_act2 = Activation("relu")
        self.block2_norm2 = BatchNormalization(axis=chanDim)
        self.block2_pool = MaxPooling2D(pool_size=(2, 2))
        self.block2_drop = Dropout(0.25)

        # VGG-Block 3:
        self.block3_conv1 = SeparableConv2D(128, (3, 3), padding="same")
        self.block3_act1 = Activation("relu")
        self.block3_norm1 = BatchNormalization(axis=chanDim)
        self.block3_conv2 = SeparableConv2D(128, (3, 3), padding="same")
        self.block3_act2 = Activation("relu")
        self.block3_norm2 = BatchNormalization(axis=chanDim)
        self.block3_conv3 = SeparableConv2D(128, (3, 3), padding="same")
        self.block3_act3 = Activation("relu")
        self.block3_norm3 = BatchNormalization(axis=chanDim)
        self.block3_pool = MaxPooling2D(pool_size=(2, 2))
        self.block3_drop = Dropout(0.25)

        # Head:
        self.flatten = Flatten()
        self.dropout = Dropout(0.5)
        self.dense = Dense(256, activation='relu')
        self.block5_dense = Dense(classes)
        self.block5_classifier = Activation("sigmoid")


    def call(self, inputs, **kwargs):


        # VGG-Block 1:
        x = self.block1_conv(inputs)
        x = self.block1_act(x)
        x = self.block1_norm(x)
        x = self.block1_pool(x)
        x = self.block1_drop(x)

        # VGG-Block 2:
        x = self.block2_conv1(x)
        x = self.block2_act1(x)
        x = self.block2_norm1(x)
        x = self.block2_conv2(x)
        x = self.block2_act2(x)
        x = self.block2_norm2(x)
        x = self.block2_pool(x)
        x = self.block2_drop(x)

        # VGG-Block 3:
        x = self.block3_conv1(x)
        x = self.block3_act1(x)
        x = self.block3_norm1(x)
        x = self.block3_conv2(x)
        x = self.block3_act2(x)
        x = self.block3_norm2(x)
        x = self.block3_conv3(x)
        x = self.block3_act3(x)
        x = self.block3_norm3(x)
        x = self.block3_pool(x)
        x = self.block3_drop(x)

        # Head:
        x = self.flatten(x)
        x = self.dropout(x)
        x = self.dense(x)
        x = self.block5_dense(x)
        return self.block5_classifier(x)