Help for my projet on Computer Vision / OpenCV

Hello all,

I would like to set up an algorithm that can detect the appearance of a pop-up on the local PC.

I have in my possession a test, training and validation dataset

I have from you a support to start my model ie what type of model used? Which algorithm used?

At the end of the algorithm, As soon as a pop-up corresponding to an anomaly is displayed, it must be detected and correctly classified by our module.

If someone from this community has already worked on this kind of subject, thank you for enlightening me and we can discuss. Or propose me the links who can help me. Thanks

Thank for all

You need to set a clear goal for the project, because from the description I already see two tasks mentioned:

You want to detect if the pop-up showed up
OR
You want to detect the type of the pop-up which showed up.

If you have the dataset, then you can start with some simple model with convolutional modules, cross-entropy as a loss.
The number of outputs depends on the number of possible classes, which in turn depend on the task.

You also need to consider these problems:

  • do the images always have the same resolution
  • bigger resolution means bigger memory consumption → use of max pooling is a must
1 Like

You could start with this, from one of the courses.

The resolution of your images will be much bigger tho (I suppose at least), so you might need to increase the number of modules.

There are 3 MaxPool2d modules used in this model, so each image dimension gets smaller at least 8 times (23).
Assuming 1920x1080 resolution, the images will have a resolution of 240x135 before being flattened. They will also have 256 channels.
This will give around 8.3 * 106 inputs to the fully connected layer. This is a bit too much for training on home computers.

Adding even one additional MaxPool2d will result in the inputs to decrease to 120x67x256 (I’m assuming you won’t change the number of channels) → around 2 million inputs. Still a lot, but becoming manageable with GPU which has a lot of memory.

As I’ve mentioned anyway, the resolution might be a problem, so you have to decide on it as well.