Racing Bar Chart

Get the top n values per period, and create a “racing” bar chart across periods.

source

racing_chart

 racing_chart (df, n=10, title='Racing Chart', subtitle=None,
               frame_duration=500, template='none', width=None,
               height=600, font_size=12, **kwargs)

Create a racing bar chart showing the top n values for each period.

Type Default Details
df pandas.DataFrame A DataFrame containing three columns for entity, metric, and period. The
names can be anything bu they have to be in this specific order.
n int 10 The number of top items to display for each period.
title str Racing Chart The title of the chart.
subtitle NoneType None The subtitle of the chart.
frame_duration int 500 The duration of each frame during animation, before transitioning to the
following frame, in milliseconds.
template str none The template of the chart.
width NoneType None The width of the chart in pixels.
height int 600 The height of the chart in pixels.
font_size int 12 The size of the fonts in the chart.
kwargs VAR_KEYWORD
Returns plotly.graph_objects.Figure

Entity-Metric-Period DataFrame

Code
# import random
# [random.randint(10, 35) for i in range(20)]

df = (
    pd.DataFrame(
        {
            "entity": ["blue", "green", "yellow", "red", "orange"] * 4,
            "metric": [
                16,
                35,
                10,
                25,
                13,
                35,
                25,
                25,
                27,
                19,
                10,
                18,
                34,
                20,
                25,
                20,
                24,
                25,
                14,
                21,
            ],
            "period": [i for i in range(1, 5) for x in range(5)],
        }
    )
    .sort_values(["period", "metric"], ascending=[True, False])
    .reset_index(drop=True)
)  # noqa
display_html(  #'<div style="margin-left: 20%">' +
    df.style.hide()
    .bar(subset=["metric"], color="lightgray")
    .background_gradient(subset=["period"], cmap="Blues")
    .to_html(),
    # + '</div>'
    raw=True,
)
entity metric period
green 35 1
red 25 1
blue 16 1
orange 13 1
yellow 10 1
blue 35 2
red 27 2
green 25 2
yellow 25 2
orange 19 2
yellow 34 3
orange 25 3
red 20 3
green 18 3
blue 10 3
yellow 25 4
green 24 4
orange 21 4
blue 20 4
red 14 4

Get the top three values for each period

racing_chart(df, n=3)

Important

The DataFrame supplied to the racing_chart function needs to have the three columns containing entity, metric, and period, in exactly this particular order. Their names don’t matter, but the order does

Entities

Some examples:

  • countries
  • URLs
  • keywords
  • product names

Metrics

  • clicks
  • impressions
  • sales
  • conversions
  • population
  • count

Period

  • days
  • months
  • weeks
  • quarters
  • years

Example: Google Search Console data

First three contries and months by clicks:

gsc = pd.read_csv("data/gsc_country_month_report.csv")
gsc["flag"] = [adviz.flag(cc) for cc in gsc["country"]]
gsc.groupby("date").head(3).head(9)

KeyboardInterrupt

Modifying the animation speed with animation_duration (in milliseconds)

racing_chart(
    gsc[["country", "clicks", "date"]],
    frame_duration=1500,
    height=700,
    title="Google Search Console <b>mywebsite.com</b><br>clicks per month - top 10<br><b>frame_duration=1500</b>",
    template="plotly_dark",
)

Make it faster and use flags instead of country codes

fig = racing_chart(
    gsc[["flag", "clicks", "date"]],
    frame_duration=500,
    height=700,
    title="Google Search Console <b>mywebsite.com</b><br>clicks per month - top 10<br><b>frame_duration=500</b>",
    template="plotly_dark",
)
fig.layout.yaxis.tickfont.size = 25
for frame in fig.frames:
    frame.data[0].marker.color = "white"
fig.data[0].marker.color = "white"
fig
queries = pd.read_csv("data/gsc_query_month_report.csv")
queries
query clicks impressions ctr position date
0 advertools 156 246 0.634146 1.016260 2022-01-31
1 advertools python 27 43 0.627907 1.000000 2022-01-31
2 python advertools 19 24 0.791667 1.000000 2022-01-31
3 python seo crawler 5 7 0.714286 1.000000 2022-01-31
4 advertools github 4 26 0.153846 3.846154 2022-01-31
... ... ... ... ... ... ...
30166 🤟 emoji 0 1 0.000000 43.000000 2022-12-31
30167 🥑 meaning in text 0 1 0.000000 45.000000 2022-12-31
30168 🥔 meaning in text 0 1 0.000000 38.000000 2022-12-31
30169 🥕 meaning in text 0 1 0.000000 36.000000 2022-12-31
30170 🫐 meaning in text 0 1 0.000000 27.000000 2022-12-31

30171 rows × 6 columns

Explore the top queries by month (impressions)

Queries can be long and take a lot of space, so we can set the left margin of the Figure object to a larger number to fit them.

fig = racing_chart(
    queries[["query", "impressions", "date"]],
    title="Google Search Console <b>mywebsite.com</b><br>impressions per month - top 15",
    n=15,
    template="seaborn",
    height=800,
)
fig.layout.margin.l = 250
fig

Filtering entities

Taking a look at the top queries is definitely interesting, but many times you may have tens or hundreds of thousands of queries, and you want to go deeper.

One way is to filter those based on some criterion.

For example, let’s see which are the top queries that contain “log” and see how we are doing on log file analysis queries:

fig = racing_chart(
    queries[queries["query"].str.contains("log")][["query", "impressions", "date"]],
    title='Google Search Console <b>mywebsite.com</b> - queries containing "log"<br>impressions per month - top 15',
    n=15,
    template="seaborn",
    height=800,
)
fig.layout.margin.l = 250
fig