Course-project Zero to GAN's

I have downloaded a downloaded a dataset from Kaggle and uploaded in google colab . But I am facing following two major problems with the dataset:

  1. The dataset do not have train and test folders instead it has only 3 folders for images of 3 classes.
  2. even though the images are of three diffrent classes all of them have same labels that is 0

Can someone tell me what should i do?

A link to this dataset would be useful.

I don’t see any problem with this, if you have different folders for different images then you can just build train/validation/test set from this. I think ImageFolder would be helpful here.

No idea what you mean that each image has the same label. Do you mean filenames or an external file?
If you have external file, then this file contains all information you would need to process and prepare this dataset for training.

by same label I mean after converting each image in pytorch tensor get a tensor and a label for each image label indicates to which class the image belongs to so in my case i am getting the same label for each image.

for image,label in dataset[0]:
print(label);

this label I am talking about.

Still, I have no idea what this dataset is.

No idea how you load the images currently.

If you already have labels, I would check if they are really all 0. Perhaps the way you load this dataset already gives you correct labels.

OK.
The dataset is COVID19 radiography dataset which is avalilavble on kaggle.
It has 3 folders - >

  1. COVID-19 (contains 1143 images).
  2. NORMAL (contains 1341 images).
  3. Viral Pneumonia (contains 1345 images).
    To download this dataset I followed steps given in this blogpost :point_right: https://medium.com/analytics-vidhya/how-to-fetch-kaggle-datasets-into-google-colab-ea682569851a
    Then I created a dataset using following code :
    dataset = ImageFolder(data_dir_path, transform = ToTensor())

then I checked the length of dataset using len funtion . I found it to be 3829 which is the total number of images in the three folders mentioined above.

Now the main problem , when i checked the label of each image I found that all of them has same label equal to zero .

for i in range(0, len(dataset)):
for image, label in dataset[i]:
print(label)

output : 0 , 0, 0, 0, 0, 0, … 0

I HOPE THIS MAKES IT CLEAR WHAT THE ISSUE IS

I’ve decided to take a look at your notebook.

I had a feeling you’ve passed a wrong path as the directory (since there’s no other option to make a mistake here).

I see that you first make a dataset out of folder, which is one level higher than actual dataset. So your dataset really has only one class: the class representing the dataset folder, not the folders with images.

Next you make new variable covid19_ds with valid path, but next you again use dataset variable, without ever touching the covid19_ds

To confirm my suspicion I’ve made a dataset with incorrect path:

and a correct one:

Thanks a lot.
I realised my mistake .
Again thank you :grin:

In the same covid19 radiography dataset we have 3 different sizes of images :

  1. 256 * 256
  2. 1024 * 1024
  3. 331 * 331
    so how to overcome this?
    how to make their sizes equal?
    and if we do so then will it affect the performance since all the images look a lot similar.
1 Like

transforms.Resize() to transform into identical size images
or
nn.AdaptiveMaxPool2d(1) as the layer before fully connected part of the model.

1 Like

class Cifar10CnnModel(ImageClassificationBase):
def init(self):
super().init()
self.network = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, padding=1),
nn.ReLU(),
nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: 64 x 16 x 16

        nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
        nn.ReLU(),
        nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1),
        nn.ReLU(),
        nn.MaxPool2d(2, 2), # output: 128 x 8 x 8

        nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
        nn.ReLU(),
        nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1),
        nn.ReLU(),
        nn.MaxPool2d(2, 2), # output: 256 x 4 x 4

        nn.Flatten(), 
        nn.Linear(256*4*4, 1024),
        nn.ReLU(),
        nn.Linear(1024, 256),
        nn.ReLU(),
        nn.Linear(256, 3))
    
def forward(self, xb):
    return self.network(xb)

:point_up_2:This is my model for images classification using CNN

and I am getting this error :point_down:

RuntimeError Traceback (most recent call last)
in ()
1 for images, labels in train_dl:
2 print(images.shape)
----> 3 out = model(images)
4 print(out.shape)
5 print(out[0])

6 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
–> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

in forward(self, xb)
29
30 def forward(self, xb):
—> 31 return self.network(xb)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
–> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py in forward(self, input)
115 def forward(self, input):
116 for module in self:
–> 117 input = module(input)
118 return input
119

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
–> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/linear.py in forward(self, input)
91
92 def forward(self, input: Tensor) -> Tensor:
—> 93 return F.linear(input, self.weight, self.bias)
94
95 def extra_repr(self) -> str:

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
1688 if input.dim() == 2 and bias is not None:
1689 # fused op is marginally faster
-> 1690 ret = torch.addmm(bias, input, weight.t())
1691 else:
1692 output = input.matmul(weight.t())

RuntimeError: mat1 dim 1 must match mat2 dim 0

Can someone tell me where am i going wrong?