How do we use nn.MaxUnpool2d

I am trying to build an autoencoder model using maxpool2d and maxunpool2d but I am not sure how to use MaxUnpool2d with a sequential model.

In my decoder when I try to do the max unpooling, it complains that it needs the indices from the max pooling call but that is called part of the sequential model so not sure what to do here.

This is the error message I am gettting:
TypeError: forward() missing 1 required positional argument: ‘indices’

Below is my model code:

class AutoencoderMaxPooling(nn.Module):
def init(self):

    self.encoder = nn.Sequential(
        nn.Conv2d(1, 16, 3, padding=1), # -> N, 16, 14, 14
        nn.MaxPool2d(2, 2),
        nn.Conv2d(16, 32, 3, padding=1), # -> N, 32, 7, 7
        nn.MaxPool2d(2, 2),
        nn.Conv2d(32, 64, 7) # -> N, 64, 1, 1
    self.decoder = nn.Sequential(
        # nn.ReLU(),
        # nn.ConvTranspose2d(32, 16, 3, stride=2, padding=1, output_padding=1),
        # nn.ReLU(),
        # nn.ConvTranspose2d(16, 1, 3, stride=2, padding=1, output_padding=1), 
        # nn.Sigmoid()

def forward(self, x):
    encoded = self.encoder(x)
    decoded = self.decoder(encoded)
    return decoded


I doubt this will work the way you intend to.

First, your pooling and unpooling layers have different arguments → you pool with 2x2 filter (stride 2), and you try to unpool with 3x3 (stride 3). Even if you had the indices, then it wouldn’t work.

Secondly, you have an additional layers between pool and unpool layers. According to docs MaxUnpool2d accepts output from MaxPool2d, so the latter would have to be the last layer in your encoder.

Solution (probably?)

If you would make MaxPool2d last layer of your encoder, along with return_indices set to True, then your encoder would have two outputs: the result and indices.
You could unpack this inside forward method (as long as the Sequential handles it correctly).
The bigger problem is that you need to specify this in MaxUnpool2d which means you would have to remove this layer from Sequential module, apply it, and then decode as always. This is not a big trouble, but makes the model a bit more fragmented.

Also if you go this way your autoencoder would probably have input-dependent latent space size. This usually leads to good results, but the generation of new images is just “meh” (because of the locality of the features in latent space).

Ok make sense.
I saw the same thing when I check the docs, we have to call MaxPool2d with “return_indices=True” and then pass the indices back to MaxUnpool2d.
I guess this usecase is not what maxunpool2d is used for so I will stick with deconvolution approach for now.

Thanks sebastian.

Found an example on kaggle where they did what I was saying, use max pooling to reduce the dimensions and then just deconvolution to reconstruct the image: