How can i use cnn for audio classification? Is it similar to NLP? Can i use cnn for NLP? If, yes how?
You would have to use 1D convolution, which slides over an audio input.
The problem with this approach is that the more convolutions you have (or the bigger the window you use), the smaller the final conv layer (before FC) becomes. You would have to find what’s the shortest audio length, and then adjust the model so that the final layer never becomes to small. Then you can use adaptive pooling over channels to get a same-length vectors which you feed into FC layer, that classifies the audio.