Visualizing distributions: histograms and cumulative distributions
Create a dual distribution visualization of the desired column in a DataFrame. A histogram shows the distribution of observations, and bottom chart displays them all showing their cumulative distribution. Optionally, you can set hover_name to show the actual value of each observation on hover.
The ecdf function is a thin wrapper for the plotly.express function of the same name. It adds a few minor options to it.
It can be used to visualize how numeric variables are distributed, both using a histogram, as well the cumulative distribution (ecdf: empirical cumulative distribution function).
While the histogram shows us how many observations we have for each interval, the ecdf shows each observation and its particular position in the ranking order.
Note that you have access to all the parameters of the ecdf function, so please check them if you want to see what else can be modified with:
Mousing over any of the circles you see the query it represents, the value represented (impressions in this case), how many other observations are equal-to or below it as a percentage, and also the counts of observations above and below.
Let’s do the same with URLs in an XML sitemap. We can visualize the cumulative distribution of the loc tags, and give it more context by showing each URL when we mouseover. This becomes a rich report with a lot of data on each URL.
We can immediately see in the above chart that the content on this website spans the period September 2023 - June 2024. We can clearly see that most updates happened in the first periods by looking at the top histogram.
When we have a vertically looking set of dots, we know that there were many updates happening in a very short period of time. These are likely being updated in a batch.
With a simple option we can split and color the chart by the website segment.
I took the top five values in /dir_1/ and labelled all other values as “Others”.
By using facet_row="segment" we have six charts showing us the trend for each segment of the website separately.
The same applies to keywords, as it is crucial to know how they are distributed. We can also gain more insight after categorizing the keywords and applying the same technique we applied in the previous example.