How does a convolutional layer decrease channel count?

Increasing number of filters increases number of channels, but how can it decrease?
another question, does each filter get applied to the output of the last filter or the original input?

Your input = 3 channels

Your first conv layer: accepts 3 channels, outputs 5 channels (increasing number of channels):
Inside: for each output channel there’s a filter that looks at 3 channels from the input. There are 5 filters (not a single 3 → 5 channels filter). Each filter produces feature map which has 1 channel. Since there are 5 filters, your output will have 5 channels.

The result has now 5 channels.

Your second conv layers: accepts 5 channels, outputs 2 channels (decreasing number of channels):
Inside: for each output channel there’s a filter that looks at 5 channels from the input. There are 2 filters (not a single 5 → 2 channels filter). Each filter produces feature map which has 1 channel. Since there are 2 filters, your output will have 2 channels.

The result has now 2 channels.

Not sure what you mean by original input, but the input to a single conv layer remains the same for all filters. There are no chains like x → filter1 → filter2 → output. It’s more like x → filter1 → filter1_output, x → filter2 → filter2_output and then you concatenate along channels dimension, giving you the final output.

So2 filters get applied to the 5 channels then the result of each filter gets averaged?

No.

I’ve said concatenated.

You mean Added? I don’t get what you mean by concatenate

concatenate('a', 'b', 'c') == 'abc'

You could also call it “stacking together”

so each element becomes sort of a list?

Assuming BCHW order
5 tensors: (16, 1, 64, 64)
Become: (16, 5, 64, 64)