I was reviewing the train function in the quickstart-pytorch example and noticed a logical issue in how the average training loss is calculated.

In the current implementation(task.py), running_loss is initialized outside the epoch loop and accumulates the loss across all epochs. However, when calculating the average at the end, it is divided only by len(trainloader) (the number of batches in a single epoch).

While the default configuration uses epochs = 1 (making the current calculation correct), if a user increases the epochs hyperparameter, the reported avg_trainloss will be scaled up by a factor of epochs.

1 Like

hi @lefthook , thanks for posting this here!

This is definitely something that we should take a look at, could you open an issue for this on our GitHub and refer to the examples that are relevant?

I noticed the same thing and didn’t see an issue yet, so I opened one and included a PR if helpful

1 Like

thank you! @junsimons

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.