Window Function Calls in Postgres - A Visual Introduction

December 31, 2020

What is a window function call? My explanation:

adds a column to query results that is calculated based on other rows in the table

Ok, that’s probably still not immediately understandable, so let’s walk through an example. Imagine you’ve been tracking expenses in Postgres:

description	category	created_at	cost
bus ride	transport	2020-01-01	3
lunch	food & drinks	2020-01-01	15
…

Now, say you wanted to see how much each expense compares to the average:

description	category	created_at	cost	avg
bus ride	…	…	3	17.5
lunch	…	…	15	17.5
…

You could use a subquery to calculate the average:

select *, (select avg(cost) from expenses)
from expenses;

But we could also be a little fancier and use a window function call:

select *, avg(cost) over()
from expenses;

Although we have some new syntax to understand (which we’ll get into later), that’s noticeably shorter! What just happened?

Hello, Window Frames

Let’s try to visualise the window function call. For each row, Postgres calculates the average cost of the rest of the table, which is the default “window frame”:

Now if that was all you could do with window function calls, I’d be disappointed too. Thankfully, it’s not! Aside from being shorter, there are some key features that are harder to replicate with subqueries:

calculations can be done over a subset of the table (i.e. a smaller window frame) instead of the whole table
in addition to the usual aggregation functions like avg or max, there are window functions, which can only be used as part of a window function call

Reframing the Window

Let’s try calculating over a subset. Knowing the average across all expenses wasn’t very helpful - a big purchase could skew the calculation, so let’s instead compare each expense to the category average:

select *, avg(cost) over(partition by category) as "avg in category"
from expenses;

Our result now looks something like:

description	category	created_at	cost	avg in category
tea break	food & drinks	…	5	24
lunch	food & drinks	…	15	24
…
taxi to home	transport	…	20	7.5
bus ride home	transport	…	3	7.5
…

That’s a bit more helpful! Let’s visualise this:

For each row, our new window frame includes only other rows with the same category, and the average is calculated based on those rows.

This is also a good time to take a brief look at the syntax we’ve used so far:

..., avg(cost) OVER(PARTITION BY category), ...

In the above:

avg(cost) is the aggregation function that we want to execute over the window frame - we could use other functions such as max or min
OVER is the keyword which tells Postgres we were using a window function call. Everything in parentheses specifies how our table will be divided into different window frames
in this case, we are using PARTITION BY category, which means we are dividing the table into different window frames based on their category

Window Functions

Now that we know a little more about the syntax, let’s look at the other hidden power of window functions calls… window functions themselves!

It’s coming to the end of the month, and you want to know the biggest expense in each category, for each day. There’s a window function for that - rank!

With rank, we can calculate the order of each row in it’s window frame (Note that the following query still returns every row in the table, not just the top expenses - we’ll fix that later):

select
  *,
  rank() over(
    partition by category, created_at
    order by cost desc
  )
from expenses;

Notice the additional syntax in our window function call:

... OVER(PARTITION BY category, created_at ORDER BY cost DESC) ...

we’ve added a second condition to PARTITION BY to further divide each window frame by the day each entry was created_at (this is very similar to GROUP BY)
there’s now an ORDER BY ... DESC that orders the rows within each frame from highest to lowest cost, so we’ll know that a rank of 1 would be the most expensive

Another quick visual:

So for each row, our window frame is other rows with the same category and created_at date, and we order the rows from highest to lowest cost.

The query gives us a result like:

description	category	created_at	cost	rank
groceries	food & drinks	2020-01-01	60	1
dinner	food & drinks	2020-01-01	35	2
lunch	food & drinks	2020-01-01	15	3
…
taxi	transport	2020-01-01	20	1
bus ride	transport	2020-01-01	3	2
bus ride	transport	2020-01-02	4	1
…

Now that we have a ranking, we could use it as a subquery and filter out expenses that are rank 2 or higher.

select *
from (
  -- our earlier query
  ...
) as "ranked_expenses"
where rank = 1;

Which leaves us with exactly what we want - nice!

description	category	created_at	cost	rank
groceries	food & drinks	2020-01-01	60	1
lunch	food & drinks	2020-01-02	15	1
taxi to home	transport	2020-01-01	20	1
bus ride home	transport	2020-01-02	4	1
…

I hoped that helped you grok window functions! If you want to play around with the data in this example, there’s an sqlfiddle - in case that doesn’t exist at the time of reading, you can also get the SQL to create the table and query the data from this gist.

In the next few articles, we’ll look into the individual window functions, starting with those for ranking rows.

Psst - if this was useful, consider sponsoring a coffee (or sushi) for me 🙇‍♂️:

Meta-Postgres: What constraints does a table have?

Window Functions, Visualised - Rankings

wasabigeek

Window Function Calls in Postgres - A Visual Introduction

Hello, Window Frames

Reframing the Window

Window Functions

Related Posts

Comments