Create a racing bar chart showing the top n values for each period.
Type
Default
Details
df
pandas.DataFrame
A DataFrame containing three columns for entity, metric, and period. The names can be anything bu they have to be in this specific order.
n
int
10
The number of top items to display for each period.
title
str
Racing Chart
The title of the chart.
subtitle
NoneType
None
The subtitle of the chart.
frame_duration
int
500
The duration of each frame during animation, before transitioning to the following frame, in milliseconds.
template
str
none
The template of the chart.
width
NoneType
None
The width of the chart in pixels.
height
int
600
The height of the chart in pixels.
font_size
int
12
The size of the fonts in the chart.
kwargs
VAR_KEYWORD
Returns
plotly.graph_objects.Figure
Entity-Metric-Period DataFrame
Code
# import random# [random.randint(10, 35) for i in range(20)]df = ( pd.DataFrame( {"entity": ["blue", "green", "yellow", "red", "orange"] *4,"metric": [16,35,10,25,13,35,25,25,27,19,10,18,34,20,25,20,24,25,14,21, ],"period": [i for i inrange(1, 5) for x inrange(5)], } ) .sort_values(["period", "metric"], ascending=[True, False]) .reset_index(drop=True)) # noqadisplay_html( #'<div style="margin-left: 20%">' + df.style.hide() .bar(subset=["metric"], color="lightgray") .background_gradient(subset=["period"], cmap="Blues") .to_html(),# + '</div>' raw=True,)
entity
metric
period
green
35
1
red
25
1
blue
16
1
orange
13
1
yellow
10
1
blue
35
2
red
27
2
green
25
2
yellow
25
2
orange
19
2
yellow
34
3
orange
25
3
red
20
3
green
18
3
blue
10
3
yellow
25
4
green
24
4
orange
21
4
blue
20
4
red
14
4
Get the top three values for each period
racing_chart(df, n=3)
Important
The DataFrame supplied to the racing_chart function needs to have the three columns containing entity, metric, and period, in exactly this particular order. Their names don’t matter, but the order does
Entities
Some examples:
countries
URLs
keywords
product names
Metrics
clicks
impressions
sales
conversions
population
count
Period
days
months
weeks
quarters
years
Example: Google Search Console data
First three contries and months by clicks:
gsc = pd.read_csv("data/gsc_country_month_report.csv")gsc["flag"] = [adviz.flag(cc) for cc in gsc["country"]]gsc.groupby("date").head(3).head(9)
KeyboardInterrupt
Modifying the animation speed with animation_duration (in milliseconds)
racing_chart( gsc[["country", "clicks", "date"]], frame_duration=1500, height=700, title="Google Search Console <b>mywebsite.com</b><br>clicks per month - top 10<br><b>frame_duration=1500</b>", template="plotly_dark",)
Make it faster and use flags instead of country codes
fig = racing_chart( gsc[["flag", "clicks", "date"]], frame_duration=500, height=700, title="Google Search Console <b>mywebsite.com</b><br>clicks per month - top 10<br><b>frame_duration=500</b>", template="plotly_dark",)fig.layout.yaxis.tickfont.size =25for frame in fig.frames: frame.data[0].marker.color ="white"fig.data[0].marker.color ="white"fig
Queries can be long and take a lot of space, so we can set the left margin of the Figure object to a larger number to fit them.
fig = racing_chart( queries[["query", "impressions", "date"]], title="Google Search Console <b>mywebsite.com</b><br>impressions per month - top 15", n=15, template="seaborn", height=800,)fig.layout.margin.l =250fig
Filtering entities
Taking a look at the top queries is definitely interesting, but many times you may have tens or hundreds of thousands of queries, and you want to go deeper.
One way is to filter those based on some criterion.
For example, let’s see which are the top queries that contain “log” and see how we are doing on log file analysis queries: