source
value_counts_plus
value_counts_plus (data, dropna=False, show_top=10, sort_others=False,
style=True, size=10, thousands=',', decimal='.',
name='data', background_gradient='cividis')
Provide a few ways of showing counts of values of items in series
.
data
dropna
bool
False
Whether or not to drop missing values.
show_top
int
10
How many of the top rows to display.
sort_others
bool
False
Whether or not to place “Others” in the bottom (default) or in its sorted order position.
style
bool
True
Whether or not to style values for easier reading. If set to True
the result would not be a DataFrame, and cannot be further manipulated. Set the value to False
to get aDataFrame as the return value.
size
int
10
The size in points of the font of the table. This results in the whole table being resized.
thousands
str
,
The character to use to separate thousands if style=True
. Defaults to,
but you can change to .
or space, or any oher character you want.
decimal
str
.
The character to use to display decimal number if style=True
. Defaults to.
but you can change to ,
or any oher character you want.
name
str
data
The name of the column that you want displayed in the final table. It appears in the caption and defaults to “data”.
background_gradient
str
cividis
Returns
pandas.io.formats.style.Styler
A DataFrame showing counts based on the provided arguments
Counting a list of random colors - default
import pandas as pd
import plotly.express as px
import random
import numpy as np
import matplotlib as mpl
import plotly.express as px
colors = list (mpl.colors.cnames.keys())
colors = random.choices(colors, weights= [0.9 , 0.04 , 0.05 , 0.09 ]* 37 , k= 10_000 )
colors += [np.nan for i in range (240 )]
colors[:20 ]
['goldenrod',
'darkmagenta',
'cyan',
'olivedrab',
'navajowhite',
'papayawhip',
'salmon',
'mediumslateblue',
'azure',
'plum',
'indigo',
'peru',
'goldenrod',
'red',
'brown',
'lightcoral',
'palegoldenrod',
'dodgerblue',
'dodgerblue',
'cyan']
value_counts_plus(colors)
Counts of data
data
count
cum. count
%
cum. %
1
lightseagreen
258
258
2.5%
2.5%
2
burlywood
255
513
2.5%
5.0%
3
cyan
252
765
2.5%
7.5%
4
slategray
252
1,017
2.5%
9.9%
5
lightsteelblue
251
1,268
2.5%
12.4%
6
aliceblue
245
1,513
2.4%
14.8%
7
steelblue
243
1,756
2.4%
17.1%
8
azure
242
1,998
2.4%
19.5%
9
sienna
242
2,240
2.4%
21.9%
10
indigo
241
2,481
2.4%
24.2%
11
Others:
7,759
10,240
75.8%
100.0%
Changing the number of displayed rows with show_top
value_counts_plus(colors, show_top= 15 )
Counts of data
data
count
cum. count
%
cum. %
1
lightseagreen
258
258
2.5%
2.5%
2
burlywood
255
513
2.5%
5.0%
3
cyan
252
765
2.5%
7.5%
4
slategray
252
1,017
2.5%
9.9%
5
lightsteelblue
251
1,268
2.5%
12.4%
6
aliceblue
245
1,513
2.4%
14.8%
7
steelblue
243
1,756
2.4%
17.1%
8
azure
242
1,998
2.4%
19.5%
9
sienna
242
2,240
2.4%
21.9%
10
indigo
241
2,481
2.4%
24.2%
11
nan
240
2,721
2.3%
26.6%
12
goldenrod
237
2,958
2.3%
28.9%
13
dodgerblue
232
3,190
2.3%
31.2%
14
mediumblue
231
3,421
2.3%
33.4%
15
lightgreen
231
3,652
2.3%
35.7%
16
Others:
6,588
10,240
64.3%
100.0%
Sorting “Others:”
value_counts_plus(colors, sort_others= True )
Counts of data
data
count
cum. count
%
cum. %
1
Others:
7,759
7,759
75.8%
75.8%
2
lightseagreen
258
8,017
2.5%
78.3%
3
burlywood
255
8,272
2.5%
80.8%
4
cyan
252
8,524
2.5%
83.2%
5
slategray
252
8,776
2.5%
85.7%
6
lightsteelblue
251
9,027
2.5%
88.2%
7
aliceblue
245
9,272
2.4%
90.5%
8
steelblue
243
9,515
2.4%
92.9%
9
azure
242
9,757
2.4%
95.3%
10
sienna
242
9,999
2.4%
97.6%
11
indigo
241
10,240
2.4%
100.0%
Changing the name of the data and caption title with name
value_counts_plus(
colors,
name= 'Status codes' )
Counts of Status codes
Status codes
count
cum. count
%
cum. %
1
lightseagreen
258
258
2.5%
2.5%
2
burlywood
255
513
2.5%
5.0%
3
cyan
252
765
2.5%
7.5%
4
slategray
252
1,017
2.5%
9.9%
5
lightsteelblue
251
1,268
2.5%
12.4%
6
aliceblue
245
1,513
2.4%
14.8%
7
steelblue
243
1,756
2.4%
17.1%
8
azure
242
1,998
2.4%
19.5%
9
sienna
242
2,240
2.4%
21.9%
10
indigo
241
2,481
2.4%
24.2%
11
Others:
7,759
10,240
75.8%
100.0%
Use the function on DataFrame
s to get multi-level counts
Check counts of the first two directories’ combinations
value_counts_plus(urldf[['dir_1' , 'dir_2' ]], name= 'apple.com URL directories<code>dir_1/dir_2</code>' )
Counts of apple.com URL directoriesdir_1/dir_2
dir_1
dir_2
count
cum. count
%
cum. %
1
today
event
12,118
12,118
4.1%
4.1%
2
xf
shop
7,539
19,657
2.5%
6.6%
3
ca
shop
7,450
27,107
2.5%
9.2%
4
de
shop
7,201
34,308
2.4%
11.6%
5
uk
shop
7,080
41,388
2.4%
14.0%
6
au
shop
6,915
48,303
2.3%
16.3%
7
it
shop
6,875
55,178
2.3%
18.6%
8
es
shop
6,789
61,967
2.3%
20.9%
9
fr
shop
6,716
68,683
2.3%
23.2%
10
sg
shop
6,714
75,397
2.3%
25.5%
11
Others:
220,613
296,010
74.5%
100.0%
Filter for a country (de) and show counts of combinations of dir_2
and dir_3
value_counts_plus(urldf[urldf['dir_1' ].eq('de' )][['dir_2' , 'dir_3' ]], name= 'apple.com/de URL directories' , show_top= 20 )
Counts of apple.com/de URL directories
dir_2
dir_3
count
cum. count
%
cum. %
1
shop
accessories
1,964
1,964
25.6%
25.6%
2
shop
product
1,866
3,830
24.3%
49.9%
3
shop
buy-watch
865
4,695
11.3%
61.2%
4
shop
refurbished
464
5,159
6.0%
67.2%
5
shop
mac
461
5,620
6.0%
73.2%
6
shop
ipad
424
6,044
5.5%
78.7%
7
shop
iphone
419
6,463
5.5%
84.2%
8
shop
watch
299
6,762
3.9%
88.1%
9
shop
buy-iphone
131
6,893
1.7%
89.8%
10
tv-pr
originals
106
6,999
1.4%
91.2%
11
shop
buy-ipad
106
7,105
1.4%
92.5%
12
shop
buy-mac
91
7,196
1.2%
93.7%
13
shop
smart-home
56
7,252
0.7%
94.5%
14
business
success-stories
26
7,278
0.3%
94.8%
15
legal
sales-support
23
7,301
0.3%
95.1%
16
shop
help
16
7,317
0.2%
95.3%
17
legal
internet-services
10
7,327
0.1%
95.4%
18
legal
intellectual-property
10
7,337
0.1%
95.6%
19
support
products
9
7,346
0.1%
95.7%
20
education
k12
7
7,353
0.1%
95.8%
21
Others:
324
7,677
4.2%
100.0%
Change the size of table: size
value_counts_plus(colors, size= 5 )
Counts of data
data
count
cum. count
%
cum. %
1
lightseagreen
258
258
2.5%
2.5%
2
burlywood
255
513
2.5%
5.0%
3
cyan
252
765
2.5%
7.5%
4
slategray
252
1,017
2.5%
9.9%
5
lightsteelblue
251
1,268
2.5%
12.4%
6
aliceblue
245
1,513
2.4%
14.8%
7
steelblue
243
1,756
2.4%
17.1%
8
azure
242
1,998
2.4%
19.5%
9
sienna
242
2,240
2.4%
21.9%
10
indigo
241
2,481
2.4%
24.2%
11
Others:
7,759
10,240
75.8%
100.0%
value_counts_plus(colors, size= 20 )
Counts of data
data
count
cum. count
%
cum. %
1
lightseagreen
258
258
2.5%
2.5%
2
burlywood
255
513
2.5%
5.0%
3
cyan
252
765
2.5%
7.5%
4
slategray
252
1,017
2.5%
9.9%
5
lightsteelblue
251
1,268
2.5%
12.4%
6
aliceblue
245
1,513
2.4%
14.8%
7
steelblue
243
1,756
2.4%
17.1%
8
azure
242
1,998
2.4%
19.5%
9
sienna
242
2,240
2.4%
21.9%
10
indigo
241
2,481
2.4%
24.2%
11
Others:
7,759
10,240
75.8%
100.0%
Completely change the caption using an HTML string with set_caption
caption = '<h4>Status codes</h4>Top 5 values <a href="https://example.com">raw data</a>'
value_counts_plus(
colors,
name= 'Statuses' ,
show_top= 5 ).set_caption(caption)
Status codes Top 5 values raw data
Statuses
count
cum. count
%
cum. %
1
lightseagreen
258
258
2.5%
2.5%
2
burlywood
255
513
2.5%
5.0%
3
cyan
252
765
2.5%
7.5%
4
slategray
252
1,017
2.5%
9.9%
5
lightsteelblue
251
1,268
2.5%
12.4%
6
Others:
8,972
10,240
87.6%
100.0%
Removing table styling if you want a pure DataFrame
:
Counting in non-styled DataFrames is 0-based in case you want to further process it
Columns are displayed in a slightly different manner
value_counts_plus(colors, style= False )
data
count
cum_count
perc
cum_perc
0
lightseagreen
258
258
0.025195
0.025195
1
burlywood
255
513
0.024902
0.050098
2
cyan
252
765
0.024609
0.074707
3
slategray
252
1017
0.024609
0.099316
4
lightsteelblue
251
1268
0.024512
0.123828
5
aliceblue
245
1513
0.023926
0.147754
6
steelblue
243
1756
0.023730
0.171484
7
azure
242
1998
0.023633
0.195117
8
sienna
242
2240
0.023633
0.218750
9
indigo
241
2481
0.023535
0.242285
0
Others:
7759
10240
0.757715
1.000000
Counting continuous data
We can easily make continuous data discrete by binning them. One easy way to do that is the pandas.cut
function.
For example, if we want to count countries’ populations, we can first bin them under 50M intervals, and run value_counts_plus
.
gm = px.data.gapminder().query('year == 2007' )
gm['pop_interval' ] = pd.cut(gm['pop' ], range (0 , 1_500_000_000 , 50_000_000 ))
gm.sort_values('pop' , ascending= False ).head(15 )
country
continent
year
lifeExp
pop
gdpPercap
iso_alpha
iso_num
pop_interval
299
China
Asia
2007
72.961
1318683096
4959.114854
CHN
156
(1300000000, 1350000000]
707
India
Asia
2007
64.698
1110396331
2452.210407
IND
356
(1100000000, 1150000000]
1619
United States
Americas
2007
78.242
301139947
42951.653090
USA
840
(300000000, 350000000]
719
Indonesia
Asia
2007
70.650
223547000
3540.651564
IDN
360
(200000000, 250000000]
179
Brazil
Americas
2007
72.390
190010647
9065.800825
BRA
76
(150000000, 200000000]
1175
Pakistan
Asia
2007
65.483
169270617
2605.947580
PAK
586
(150000000, 200000000]
107
Bangladesh
Asia
2007
64.062
150448339
1391.253792
BGD
50
(150000000, 200000000]
1139
Nigeria
Africa
2007
46.859
135031164
2013.977305
NGA
566
(100000000, 150000000]
803
Japan
Asia
2007
82.603
127467972
31656.068060
JPN
392
(100000000, 150000000]
995
Mexico
Americas
2007
76.195
108700891
11977.574960
MEX
484
(100000000, 150000000]
1223
Philippines
Asia
2007
71.688
91077287
3190.481016
PHL
608
(50000000, 100000000]
1655
Vietnam
Asia
2007
74.249
85262356
2441.576404
VNM
704
(50000000, 100000000]
575
Germany
Europe
2007
79.406
82400996
32170.374420
DEU
276
(50000000, 100000000]
467
Egypt
Africa
2007
71.338
80264543
5581.180998
EGY
818
(50000000, 100000000]
515
Ethiopia
Africa
2007
52.947
76511887
690.805576
ETH
231
(50000000, 100000000]
value_counts_plus(
gm['pop_interval' ],
name= 'Population intervals - 2007' ,
background_gradient= 'RdBu' )
Counts of Population intervals - 2007
pop_interval
count
cum. count
%
cum. %
1
(0, 50000000]
120
120
84.5%
84.5%
2
(50000000, 100000000]
12
132
8.5%
93.0%
3
(100000000, 150000000]
3
135
2.1%
95.1%
4
(150000000, 200000000]
3
138
2.1%
97.2%
5
(1100000000, 1150000000]
1
139
0.7%
97.9%
6
(200000000, 250000000]
1
140
0.7%
98.6%
7
(300000000, 350000000]
1
141
0.7%
99.3%
8
(1300000000, 1350000000]
1
142
0.7%
100.0%
9
(950000000, 1000000000]
0
142
0.0%
100.0%
10
(1000000000, 1050000000]
0
142
0.0%
100.0%
11
Others:
0
142
0.0%
100.0%
Now we can see that there are 120 countries with a population less than 50M. You can play around with the bin interval to get an appropriate value for your data.
Changing the theme used with theme
value_counts_plus(colors, background_gradient= 'Greens' )
Counts of data
data
count
cum. count
%
cum. %
1
lightseagreen
258
258
2.5%
2.5%
2
burlywood
255
513
2.5%
5.0%
3
cyan
252
765
2.5%
7.5%
4
slategray
252
1,017
2.5%
9.9%
5
lightsteelblue
251
1,268
2.5%
12.4%
6
aliceblue
245
1,513
2.4%
14.8%
7
steelblue
243
1,756
2.4%
17.1%
8
azure
242
1,998
2.4%
19.5%
9
sienna
242
2,240
2.4%
21.9%
10
indigo
241
2,481
2.4%
24.2%
11
Others:
7,759
10,240
75.8%
100.0%
value_counts_plus(colors, background_gradient= 'cool' )
Counts of data
data
count
cum. count
%
cum. %
1
lightseagreen
258
258
2.5%
2.5%
2
burlywood
255
513
2.5%
5.0%
3
cyan
252
765
2.5%
7.5%
4
slategray
252
1,017
2.5%
9.9%
5
lightsteelblue
251
1,268
2.5%
12.4%
6
aliceblue
245
1,513
2.4%
14.8%
7
steelblue
243
1,756
2.4%
17.1%
8
azure
242
1,998
2.4%
19.5%
9
sienna
242
2,240
2.4%
21.9%
10
indigo
241
2,481
2.4%
24.2%
11
Others:
7,759
10,240
75.8%
100.0%
Convert the table to raw HTML for embedding in a blog or web page
print (value_counts_plus(colors).to_html()[:700 ])
<style type="text/css">
#T_c9396_row0_col1, #T_c9396_row0_col2, #T_c9396_row0_col3, #T_c9396_row0_col4, #T_c9396_row1_col1, #T_c9396_row1_col3, #T_c9396_row2_col1, #T_c9396_row2_col3, #T_c9396_row3_col1, #T_c9396_row3_col3, #T_c9396_row4_col1, #T_c9396_row4_col3, #T_c9396_row5_col1, #T_c9396_row5_col3, #T_c9396_row6_col1, #T_c9396_row6_col3, #T_c9396_row7_col1, #T_c9396_row7_col3, #T_c9396_row8_col1, #T_c9396_row8_col3, #T_c9396_row9_col1, #T_c9396_row9_col3 {
background-color: #00224e;
color: #f1f1f1;
}
#T_c9396_row1_col2, #T_c9396_row1_col4 {
background-color: #002758;
color: #f1f1f1;
}
#T_c9396_row2_col2, #T_c9396_row2_col4 {
background-color: #002c64;
color: #f1f1f1;
}
#T_c93