Training functions

Now it’s time to write the method that will be called inside every loop turn. We need two of them : one for train the model and another for validate is accuracy, then test it.

Important

It’s possible to use two different function for evaluate the model during validation and test phase, but isn’t recommended since use the same evaluation method allow us to compute a confidence for our network.

Train

All we have to do here is to put our sample (now turned into tensor, remember here) inside a device to be calculated, compute a loss and retropropagate the gradient through our network. In Pytorch we can do that easily with one loop and few instructions. Here we write this function inside training/train.py.

def train(model, loader, f_loss, optimizer, device):

    # Switch the model to "train mode"
    model.train()

    for (inputs, targets) in enumerate(loader):
        inputs, targets = inputs.to(device), targets.to(device)

        # Compute the forward pass through the network up to the loss
        outputs = model(inputs)
        loss = f_loss(outputs, targets)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

Wait … where is my model?

If you go further in this tutorial, you will notices that we never give at the loss object a reference to the model or its parameters. Moreover we use two unexplained method calls : model.train() and optimizer.zero_grad(). For have a good understanding of all of this, we must talk about Autograd.

Note

This part is important only if you keep using Pytorch in your project and only concern an implementation problem.

Autograd

Autograd is a Pytorch module of automatic differentiation, which allow us to compute tensor’s gradient. At the loading of the torch core modules, it create a graph of functions called grad_fn, which represent the data of our tensor (inside our device) in the form of an acyclic graph where the inputs are the leaves and the outputs the roots. Everytime a tensor with the flag requires_grad (true by default), activated is submitted to an operation, it’s updated inside the graph. That is why we have to put our tensor to devices before computation. All our nn.Module object have a reference to this graph, so when we use optimizer.zero_grad(), all the previously computed gradients are sets to 0, here in order to not accumulate the gradient through our iterations. Same for the loss object, when call backward() method, all computed gradient is retropropagated through the previously used tensors. You can take a look at this good example if you want to know more about Autograd and graph_fn.

Note

model.train() give as instructions to the model to consider is special layers, such as batchNorm or Dropout layers, useful for the training but not for inference. This mode is activated by default, but later we will deactivate it inside our evaluation loop, so we must ensure that the train mode is switched on before the train loop.

Eval

For the evaluation function, we will keep a similar structure. But remember just above: when doing inference, we want avoid non-useful computation. So we will deactivate the model’s training mode and indicate to the autograd module that we don’t want to track our next operations and calculate the gradient of our tensors. For that, we write our piece of code under no_grad context.

We write this function inside training/evaluation.py.

def test(model, loader, f_loss, device):

    model.eval()
    with torch.no_grad():


        N = 0
        tot_loss, correct = 0.0, 0.0
        for i, (inputs, targets) in enumerate(loader):

            inputs, targets = inputs.to(device), targets.to(device)
            outputs = model(inputs)

            # We accumulate the exact number of processed samples
            N += inputs.shape[0]

            # We accumulate the loss considering
            # The multipliation by inputs.shape[0] is due to the fact
            # that our loss criterion is averaging over its samples
            tot_loss += inputs.shape[0] * f_loss(outputs, targets).item()

            predicted_targets = outputs.argmax(dim=1)
            correct += (predicted_targets == targets).sum().item()

        return tot_loss / N, correct / N

When we evaluate a model, we want known more than only the loss score. Here we also compute the accuracy score.

Tip

For more clarity when the training is running, we can add a progress bar as below, with a built-in JAW method.

from jaw.utils.progress_bar import progress_bar

...
for i, (inputs, targets) in enumerate(loader):

    ...
    progress_bar(i, len(loader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
                        % (tot_loss/(i+1), 100.*correct/N, correct, N))

return tot_loss / N, correct / N