Training data set preparation for CNN when input and output both are images

Hi all,
I am trying to set up a model for CNN (Resnet). I have tried to use the code for CIFAR10 image recognition and modify it to my purpose. In CIFAR10 case, the input is an image and output is a label (or probability). In my case, the input is a 480x640 image and the output is also a 480x640 image.
Any suggested approach or reference to similar code to preparing the dataset/ setting up the model would be great.
Many thanks in advance.

What is (or would be) the purpose of this model?

For me it sounds like you’re trying to implement an autoencoder.