Pandas | value_counts()

Hello,
I’m hoping someone could please help me with the understanding of counting unique valued elements in a Data Frame. I have a df structured like this; (" Imagine these are winning lotto tickets each day and I want to know…, what is the frequency of each unique values occurrence across the entire df and or by specific dates; days/months/years? ")

In [ ]: df
Out[ ]:
date col1 col2 col3
0 10/10/2021 10 3 5
1 10/09/2021 8 1 2
2 10/08/2021 10 3 3
3 10/07/2021 6 5 1
4 10/06/2021 6 10 7
5 10/05/2021 6 9 8
6 10/04/2021 6 7 8
7 10/03/2021 6 7 8
8 10/02/2021 4 7 9
9 10/01/2021 1 7 9

I have learned and understand that I can use .apply & pd.Series.value_counts
In [ ]: total_counts = df[[‘col1’, ‘col2’, ‘col3’]].apply(pd.Series.value_counts)
total_counts
Out [ ]:

    col1	  col2    col3

1 1.0 1.0 1.0
2 NaN NaN 1.0
3 NaN 2.0 1.0
4 1.0 NaN NaN
5 NaN 1.0 1.0
6 5.0 NaN NaN
7 NaN 4.0 1.0
8 1.0 NaN 3.0
9 NaN 1.0 2.0
10 2.0 1.0 NaN

This code does give me the desired outcome that I’m looking for. I can clearly see the unique values and their counts across multiple columns.
unique_value | Counts
1 3
2 1
3 3
4 1
5 2
6 5
7 5
8 4
9 3
10 3
(“But this is a small frame, with only 10 unique values, 3 columns and 10 rows!, What if the df is 40 unique values, 5 columns and 5,000 rows?” It would result in being ineligible.)

Question One (Aggregation);
As there is no index for the unique values in the output, how can I aggregate this? How to create a column for or groupby if there is no index? How to make a reference to saying I have three (1’s)? five(7’s) and two(5’s) in this Dateset? ex; Say I want to plot these in a Dataviz and show for the month of March I had 100(6’s), 50(1’s), 25(4’s)? If I plot the current code as is,it makes a reference to the columns and not the unique values that I’m interested in.

test_bar

Question Two (Reference);
I think this just maybe a continuation of the first question but let’s say I want to reference unique values and ask which max/min value occurred this month/year? What is the percentage of this values occurrence over a specified timeline? How to run Statistical questions on the Unique values?

Question Three (Merging);
“As if I’m not in enough pain and confusion already at the moment”
(“Let’s say for the sake of argument, I’m sadistic”) An the above df only represents a single location. Would following the Lesson #4 achieve this? I have tried merging multiple df’s in a spreadsheet but as soon as you ‘Concat’ or ‘Text_Join’ numerical_values into a single cell, it converts to TEXT! Pandas reads this as an object. I’ve tried to convert object with .astype(Float) but with no success.

I hope I was able to convey what i’m trying to achieve here. I’m sorry if this was lengthy, I’m very new to programming/data_analysis/data_science/python/pandas etc…
Any reference or links is very much appreciated
—Cheers!

You can convert the series from .value_counts() into a dataframe using .to_frame().
df.value_counts().to_frame()

this will provide you with an index, you can also sort the series first and then convert it into a dataframe If you wish to have them in some order.

For 2nd Ques

Once sorted, you can get the max/min of the list like any other dataframe.
and can have graphs(like pie chart) to display percentage of vaue_counts() series.

Thank you for your response. I learned that what I was looking for was .stack()
So I could plot the desired results across multiple columns. I was definitely way over thinking this.

totals = df[[‘col1’,‘col2’,‘col3’]].stack().value_counts()
total.plot.barh()

stacked_plot

1 Like