Introduction to Window functions
Window functions operate on a set of rows and return a single aggregated value for each row. The term Window describes the set of rows in the database on which the function will operate.
Window (or Windowing) functions are a great way to get different perspectives on a set of data without having to make repeat calls to the server for that data. For example, we can gather the sum of a column and display it side-by-side with the detail-level data, such that “SalesAmount” and “SUM(SalesAmount)” can appear in the same row. We can also do analytical functions like PERCENT_RANK and ranking functions like ROW_NUMBER, all without altering the granularity of the result set or making additional trips to get the same source data again and again.
We define the Window (set of rows on which functions operates) using an OVER() clause. We will discuss more about the OVER() clause in the article below.
Types of Window functions
- Aggregate Window Functions
SUM(), MAX(), MIN(), AVG(). COUNT()
- Ranking Window Functions
RANK(), DENSE_RANK(), ROW_NUMBER(), NTILE()
- Value Window Functions
LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()
|window_function ( [ ALL ] expression ) OVER ( [ PARTITION BY partition_list ] [ ORDER BY order_list] )|
Window functions all use the OVER() clause, which is used to define how the function is evaluated. The OVER() clause accepts three different arguments:
- PARTITION BY: Defines the window (set of rows on which window function operates) for window functions. We need to provide a field or list of fields for the partition after PARTITION BY clause. Multiple fields need be separated by a comma as usual. If PARTITION BY is not specified, grouping will be done on entire table and values will be aggregated accordingly. Resets its counter every time the stated column(s) changes values.
- ORDER BY: Orders the rows the function will evaluate. This does not order the entire result set, only the way the function proceeds through the rows.
- OVER: Specifies the window clauses for aggregate functions.
- ROWS BETWEEN: Specifies how to further limit the rows evaluated by the function.
Let’s pretend we’re looking at simplified data from a weight-lifting competition. Here are some code examples (we’ll do them all in one SELECT statement because adding/removing window functions in no way changes the number of rows we get back):
/* ROW_NUMBER will list the number of the row, ordered by LiftID.
The counter will reset with every new combination of LiftDate and LiftPersonID */
, ROW_NUMBER() OVER (PARTITION BY LiftDate, LiftPersonID ORDER BY LiftID) AS LiftNumForToday
/* SUM will add up the weights lifted.
The first SUM will show the grand total for the entire result set.
The second SUM will show the total lift weight for that row's lift date.
The third SUM will show the total lift weight for that row's lift date and person. */
, SUM(LiftWeight) OVER () AS WeightGrandTotal
, SUM(LiftWeight) OVER (PARTITION BY LiftDate) AS WeightTotal
, SUM(LiftWeight) OVER (PARTITION BY LiftDate, LiftPersonID) AS PersonWeightTotal
/* AVG will show the average weight lifted.
The first AVG will show the average lift weight for that row's lift date.
The second AVG will show the average lift weight for that row's lift date and person. */
, AVG(LiftWeight) OVER (PARTITION BY LiftDate) AS PersonWeightAvg
, AVG(LiftWeight) OVER (PARTITION BY LiftDate, LiftPersonID) AS PersonDayWeightAvg
/* LAG and LEAD allow the current row to report on data in rows behind or ahead of it.
This LAG function will return the LiftWeight from 1 row behind it (in order of LiftID) and if no value is found, it will return 0 instead of NULL.
The LEAD function will get the LiftWeight from 3 rows ahead. Since we didn't specify the optional default value (like the "0" we gave the LAG function, it will return NULL if there is no row 3 rows ahead. */
, LAG(LiftWeight, 1, 0) OVER (ORDER BY LiftID) AS PrevLift
, LEAD(LiftWeight, 3) OVER (ORDER BY LiftID) AS NextLift
/* FIRST_VALUE AND LAST_VALUE will return the specified column's first and last value in the result set.
This FIRST_VALUE function will return the first LiftWeight in the result set.
This LAST_VALUE function will return the last LiftWeight in the result set.
WARNING: without the ROWS BETWEEN in the LAST_VALUE, you may get unexpected results.
, FIRST_VALUE(LiftWeight) OVER (ORDER BY LiftDate) AS FirstLift
, LAST_VALUE(LiftWeight) OVER (ORDER BY LiftDate ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LastLift
/* SUM using ROWS BETWEEN will narrow the scope evaluated by the window function.
The function will begin and end where the ROWS BETWEEN specify.
The first SUM will add all the LiftWeight values in rows up to and including the current row.
The second SUM will add all the LiftWeight values in rows between the current row and the 3 rows before it.
, SUM(LiftWeight) OVER (ORDER BY LiftDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS WeightRunningTotal
, SUM(LiftWeight) OVER (ORDER BY LiftDate ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS WeightSumLast4
Considerations for Window Functions
If you don’t have SQL Server 2012 or later, your window function cupboard is pretty bare; SQL Server 2005 through 2008 R2 only allowed PARTITION BY in the OVER clause of an aggregate function, and you got RANK() and ROW_NUMBER(). That was about it. If you’re a developer still on one of those earlier versions, this is a compelling case for moving to 2012 or later. Think how much time you could be saving not writing multiple CTEs and how much faster your queries will go.
Speaking of fast…
By avoiding round-trips to the server for the same data, we reduce I/O on those tables. If we’re hitting indexes, we can really whittle down the reads involved. There is a trade-off, but it’s usually a very favorable one. Window functions require SQL Server to construct the window and compute the function (shown as tasks like Window Spool, Segment, Sequence Project, and Compute Scalar). In doing so, it adds reads to the Worktable. Still, this is generally less expensive than going back to get the source data multiple times, aggregating if necessary, and joining it all together. Also, Worktable exists in tempdb, which — ideally — is on your fastest storage tier.
Finally, remember that the limitations you put on a window function — PARTITION BY, ORDER BY, or ROWS BETWEEN — are there to apply context to the window function and in no way apply to the result set as a whole. In other words, your SELECT statement will be unaffected by anything you tell a window function to do.