Error while running evaluate function to calculate the loss on validation set before training

Hi team,
I have verified that the model is instantiated, and val_loader is created, yet I get this error. Any help is appreciated. TIA.

Does the val_percent (0.15) and batch size (126) have anything to do with the error?

1 Like

My best guess is that the error originates somewhere in the model forward method or you initialize the datasets incorrectly.

1 Like

Hi @Sebgolos
Thanks for looking into it!

I was able to rectify the mistake I made. It was over here:
image

Instead of defining output_cols as a list, I previously created a string ‘charges’. That inturn messed up with the output_size variable.
image

Instead of getting (5,1), I was getting (5,7). Here 7 was len(‘charges’). How silly of me!

Anyways! I thank you for your time and response! Have a good one!!

1 Like

Yeah, just been looking at the older versions of notebook and found this like a minute ago :smiley: Since you fixed it on yourself, then no need to fix that anymore :stuck_out_tongue:

Although I would change the loss function to L1 loss and then try to use Adam optimizer.

2 Likes

Thank you for the suggestion @Sebgolos!!

Ah, I’ve been wondering why’s my ‘val_loss’ too high! Here’s the answer I suppose. I will make the amends.

Appreciate it, Seb!

1 Like

The choice here would be more inclined to avoid getting NaN.

Also val_loss is just a number used mainly by the training phase, what’s important is the predictions.
I see people fighting over who gets the lower loss, which isn’t that important overall :stuck_out_tongue:

If you had for example some magic-awesome-overlord loss function, that would result in error of hundreds of thousands, but still be able to force the model to learn the weights very well, that the predictions got very close to the targets, then it would mean nothing that the final loss was let’s say 967494.54.

1 Like

Thank you for your great insights @Sebgolos

May I ask your opinion on the hyper-parameters used in training the model? What would be the appropriate numbers for epoch and learning rate? In the forum, I see people using various numbers ranging from 1000-30000 for epoch and lr of 1e-2 to 1e-7 some even used 5e-7.

Also, now that we are iterating from history1 to 5, do you suggest to start from a higher number to the lowest or vice-versa for both epoch and learning rates?

Thank you!

As for learning rate, I would apply a decreasing one here. 1e-1 for the first training, 1e-2 for the second etc. If you are adventurous you might try to use a scheduler so it will change the learning rate as the training progresses. But this would be sort of like shooting a fly with a canon :stuck_out_tongue:

The number of epochs - no idea, just whatever seems good for the model. In my case I’ve used 1000 epochs for each training phase.

I’m not sure if I understand what you mean by iterating of the history variables. history1 is the first training phase, history2 the second one. And I would plot them in this order :stuck_out_tongue:

2 Likes

I thought I was iterating through the history variable as the val_loss seemed to be carry-forwarded from history1 - history2, when initiated with different epochs and learning rate.

I just got started with PyTorch, I guess it will take some time for me to make sense of this black-box (feels to me, as of now)!

Also, I took the liberty to look into your notebook for assignment-2. I see that you’ve used nn.sequential() while creating class InsuranceModel(). I looked up the PyTorch documentation which says that it’s a sequential container. Is this what they call ‘adding a layer’? Is this similar to Pipelines in scikit-learn?

TIA!

1 Like

Well, I’ve been playing around in the latter versions of the notebook. I’ve used nn.Sequential to have 2-layer model, but with no activation function (to keep it as linear regression, since composition of two linear layers gives a linear transformation). No idea about scikit-learn though so no idea if it’s the same.

The loss seems to be carried from the previous training because you actually train the same model, just with a different learning rate and/or epochs. If you would run the same cell multiple cells the loss would be still the same as in the previous training phase (because the model “stays” inside the notebook and if it’s updated by any cell, then the changes remain).
I’ve seen people do this when optimizing the model (running different cells multiple times to train the model without changing the hyperparams) before deciding the final epochs and learning rates.

1 Like

Hey @Sebgolos

I’m trying to visualize how val_loss changes with num of epochs. I see that history1 is a list of dictionaries. I have no clue how to plot it though. TIA

Well, you would have to turn this into a list of val_loss values. You can then iterate over this list with no problem.

I would try to do something like this (written from memory, no responsibility for mistakes :smiley:):
history1 = [h['val_loss'] for h in history1]
Should in theory contain only the losses now.

You could also modify the fit function instead, so it appends the loss instead of dictionary.

When you have the list representing losses, you can just plot it with matplotlib :stuck_out_tongue:

Thank you @Sebgolos

Worked without a flaw! :grinning: