Racing Bar Chart

Get the top n values per period, and create a “racing” bar chart across periods.

source

racing_chart

 racing_chart (df, n=10, title='Racing Chart', frame_duration=500,
               theme='none', width=None, height=600, font_size=12)

Create a racing bar chart showing the top n values for each period.

Type Default Details
df pandas.DataFrame A DataFrame containing three columns for entity, metric, and period. The
names can be anything bu they have to be in this specific order.
n int 10 The number of top items to display for each period.
title str Racing Chart The title of the chart.
frame_duration int 500 The duration of each frame during animation, before transitioning to the
following frame, in milliseconds.
theme str none The theme of the chart. Any of the following values can be used:
width NoneType None The width of the chart in pixels.
height int 600 The height of the chart in pixels.
font_size int 12 The size of the fonts in the chart.
Returns plotly.graph_objects.Figure

Entity-Metric-Period DataFrame

Code
# import random
# [random.randint(10, 35) for i in range(20)]

df = pd.DataFrame({
    'entity': ['blue', 'green', 'yellow', 'red', 'orange'] * 4,
    'metric': [16, 35, 10, 25, 13, 35, 25, 25, 27, 19, 10, 18, 34, 20, 25, 20, 24, 25, 14, 21],
    'period': [i for i in range(1, 5) for x in range(5)],
}).sort_values(['period', 'metric'],
              ascending=[True, False]).reset_index(drop=True)
display_html(#'<div style="margin-left: 20%">' + 
             df
 .style
 .hide()
 .bar(subset=['metric'], color='lightgray')
 .background_gradient(subset=['period'], cmap='Blues')
 .to_html() 
             # + '</div>'
, raw=True)
entity metric period
green 35 1
red 25 1
blue 16 1
orange 13 1
yellow 10 1
blue 35 2
red 27 2
green 25 2
yellow 25 2
orange 19 2
yellow 34 3
orange 25 3
red 20 3
green 18 3
blue 10 3
yellow 25 4
green 24 4
orange 21 4
blue 20 4
red 14 4

Get the top three values for each period

racing_chart(df, n=3)

Important

The DataFrame supplied to the racing_chart function needs to have the three columns containing entity, metric, and period, in exactly this particular order. Their names don’t matter, but the order does

Entities

Some examples:

  • countries
  • URLs
  • keywords
  • product names

Metrics

  • clicks
  • impressions
  • sales
  • conversions
  • population
  • count

Period

  • days
  • months
  • weeks
  • quarters
  • years

Example: Google Search Console data

First three contries and months by clicks:

gsc = pd.read_csv('data/gsc_country_month_report.csv')
gsc['flag'] = [adviz.flag(cc) for cc in gsc['country']]
gsc.groupby('date').head(3).head(9)
country clicks impressions ctr position date flag
0 usa 185 11397 0.016232 38.121435 2022-01-31 🇺🇸
1 ind 157 6777 0.023167 33.000443 2022-01-31 🇮🇳
2 deu 69 1538 0.044863 20.850455 2022-01-31 🇩🇪
197 usa 230 12642 0.018193 32.848837 2022-02-28 🇺🇸
198 ind 178 7025 0.025338 31.743203 2022-02-28 🇮🇳
199 deu 67 1701 0.039389 20.089947 2022-02-28 🇩🇪
394 usa 285 15394 0.018514 36.662986 2022-03-31 🇺🇸
395 ind 185 7142 0.025903 30.034584 2022-03-31 🇮🇳
396 gbr 114 3029 0.037636 32.233080 2022-03-31 🇬🇧

Modifying the animation speed with animation_duration (in milliseconds)

racing_chart(
    gsc[['country', 'clicks', 'date']],
    frame_duration=1500,
    height=700,
    title='Google Search Console <b>mywebsite.com</b><br>clicks per month - top 10<br><b>frame_duration=1500</b>',
    theme='plotly_dark')

Make it faster and use flags instead of country codes

fig = racing_chart(
    gsc[['flag', 'clicks', 'date']],
    frame_duration=500,
    height=700,
    title='Google Search Console <b>mywebsite.com</b><br>clicks per month - top 10<br><b>frame_duration=500</b>',
    theme='plotly_dark')
fig.layout.yaxis.tickfont.size = 25
for frame in fig.frames:
    frame.data[0].marker.color = 'white'
fig.data[0].marker.color = 'white'
fig
queries = pd.read_csv('data/gsc_query_month_report.csv')
queries
query clicks impressions ctr position date
0 advertools 156 246 0.634146 1.016260 2022-01-31
1 advertools python 27 43 0.627907 1.000000 2022-01-31
2 python advertools 19 24 0.791667 1.000000 2022-01-31
3 python seo crawler 5 7 0.714286 1.000000 2022-01-31
4 advertools github 4 26 0.153846 3.846154 2022-01-31
... ... ... ... ... ... ...
30166 🤟 emoji 0 1 0.000000 43.000000 2022-12-31
30167 🥑 meaning in text 0 1 0.000000 45.000000 2022-12-31
30168 🥔 meaning in text 0 1 0.000000 38.000000 2022-12-31
30169 🥕 meaning in text 0 1 0.000000 36.000000 2022-12-31
30170 🫐 meaning in text 0 1 0.000000 27.000000 2022-12-31

30171 rows × 6 columns

Explore the top queries by month (impressions)

Queries can be long and take a lot of space, so we can set the left margin of the Figure object to a larger number to fit them.

fig = racing_chart(
    queries[['query', 'impressions', 'date']],
    title='Google Search Console <b>mywebsite.com</b><br>impressions per month - top 15',
    n=15,
    theme='seaborn',
    height=800)
fig.layout.margin.l = 250
fig

Filtering entities

Taking a look at the top queries is definitely interesting, but many times you may have tens or hundreds of thousands of queries, and you want to go deeper.

One way is to filter those based on some criterion.

For example, let’s see which are the top queries that contain “log” and see how we are doing on log file analysis queries:

fig = racing_chart(
    queries[queries['query'].str.contains('log')][['query', 'impressions', 'date']],
    title='Google Search Console <b>mywebsite.com</b> - queries containing "log"<br>impressions per month - top 15',
    n=15,
    theme='seaborn',
    height=800)
fig.layout.margin.l = 250
fig