The dataset is kinda small to be honest (the “base” examples have been augmented).
You’re creating a validation set, out of the initial dataset, splitting it into actual training and validation examples.
This is ok, but have a look at the images:
The only difference between many of them is their size.
So, let’s say you split the initial dataset into 80%/20% training/validation.
But there are so many identical images, different only by their size, that the model starts to OVERFIT, because the images in validation set are identical to these from training set (they just have a different size).
While the dataset augmentation is a nice technique, this one gives a false assumption when you create validation set from it, that the model performs very well.
Since it’s been overfitting, it didn’t generalize at all what the sign represents, it just memorized the training examples.