As Tom Kyte has often said, Analytic functions Rock and Roll!

Back when I first came across them just over 3 years ago, I thought they were the most confusing things ever to have been invented – I just could *not* get my head around the syntax. It wasn’t until I started answering questions at the OTN SQL and PL/SQL forum based on people’s real-life sample data and requirements that I started to make sense of it all.

So, firstly, what is an analytic function and what makes it different to an aggregate function? An aggregate function, such as min() or count(), produces a summary of the rows being worked on. An analytic fuction allows you to access information from other rows within the current row. It does not produce a summary – if you had 5 rows in your table, using an analytic function to find the count of all the rows would give you 5 rows back, whereas the aggregate version of sum would give you just 1.

Eg: With analytics:

SQL> with my_tab as (select 1 col1 from dual union all 2 select 2 col1 from dual union all 3 select 3 col1 from dual union all 4 select 4 col1 from dual union all 5 select 5 col1 from dual) 6 select col1, 7 sum(col1) over () sum_col1 8 from my_tab 9 / COL1 SUM_COL1 ---------- ---------- 1 15 2 15 3 15 4 15 5 15

With aggregate:

SQL> with my_tab as (select 1 col1 from dual union all 2 select 2 col1 from dual union all 3 select 3 col1 from dual union all 4 select 4 col1 from dual union all 5 select 5 col1 from dual) 6 select sum(col1) 7 from my_tab; SUM(COL1) ---------- 15

You can see there that both both functions correctly gave the sum of 1 to 5 as 15, but with the analtyic function, you could access each row in the table. This allows you to do such things as “What is the difference between each col1 and the sum of the total?” without having to query the table again:

With analytics:

SQL> with my_tab as (select 1 col1 from dual union all 2 select 2 col1 from dual union all 3 select 3 col1 from dual union all 4 select 4 col1 from dual union all 5 select 5 col1 from dual) 6 select col1, 7 sum(col1) over () sum_col1, 8 sum(col1) over () - col1 diff 9 from my_tab; COL1 SUM_COL1 DIFF ---------- ---------- ---------- 1 15 14 2 15 13 3 15 12 4 15 11 5 15 10

With aggregate:

SQL> with my_tab as (select 1 col1 from dual union all 2 select 2 col1 from dual union all 3 select 3 col1 from dual union all 4 select 4 col1 from dual union all 5 select 5 col1 from dual), 6 sum_mt as (select sum(col1) sum_col1 7 from my_tab) 8 select mt.col1, 9 smt.sum_col1, 10 smt.sum_col1 - mt.col1 diff 11 from my_tab mt, 12 sum_mt smt; COL1 SUM_COL1 DIFF ---------- ---------- ---------- 1 15 14 2 15 13 3 15 12 4 15 11 5 15 10

I won’t be going over each of the analytic functions – you could always visit the documentation to see what functions there are, but I did want to go through the syntax to try and make it a little easier to understand.

To do this, I am assuming that you, the reader, is familiar with the syntax of aggregate functions.

The basic structure of an analytic function is:

function_name(arguments) over (partition by columns order by columns)

The partition by clause is similar to the group by clause of an aggregate function. It simply says “I want the function to be done on rows partitioned (or grouped) by these columns”.

The order by clause simply tells the analytic function the order it is meant to process the rows within the groups mentioned in the partition by clause (if one is specified). The order of the data makes a great deal of difference to the results of the function, for example, if you wanted a running total. More on this later.

You can have an analytic function without a partition by or an order by clause – this simply says “apply across all the rows in the table at once”.

You can just have the partition by clause – “apply across all the rows in each group at once”.

You can just have the order by clause – “apply across all the rows in the table in this order”.

You can have both partition and order by clause – “apply across the rows in each group in this order”.

Eg.

with my_tab as (select 1 id, 10 value from dual union all select 1 id, 20 value from dual union all select 1 id, 30 value from dual union all select 2 id, 5 value from dual union all select 2 id, 15 value from dual union all select 3 id, 2 value from dual) select id, value, sum(value) over (order by id, value) running_tot_id_asc, sum(value) over (order by id desc, value) running_tot_id_desc, sum(value) over (partition by id) tot_by_id, sum(value) over (partition by id order by value) running_tot_by_id1, sum(value) over (partition by id order by value desc) running_tot_by_id2, sum(value) over () total_value from my_tab; ID VALUE RUNNING_TOT_ID_ASC RUNNING_TOT_ID_DESC TOT_BY_ID RUNNING_TOT_BY_ID1 RUNNING_TOT_BY_ID2 TOTAL_VALUE ---------- ---------- ------------------ ------------------- ---------- ------------------ ------------------ ----------- 1 10 10 32 60 10 60 82 1 20 30 52 60 30 50 82 1 30 60 82 60 60 30 82 2 5 65 7 20 5 20 82 2 15 80 22 20 20 15 82 3 2 82 2 2 2 2 82 6 rows selected.

You can hopefully see the difference that a) the inclusion/exclusion of the partition by / order by clauses affects the results and b) the final ordering of the results has on how the running totals are displayed. In reality, you wouldn’t calculate running totals with different ordering to how the final output would be output, I’ve just done that here for demonstration purposes.

I have used them to answer questions such as:

- What is the difference between the value in the previous row and in the current row?
- How can I retrieve rows from table2 that have the exact set of rows in table1 per some join conditions?
- How can I generate a running sequence over each group of rows in my resultset?

etc…

In short, if you’re not familiar with analytic functions yet, you would be wise to research and play around with them. If aggregate functions are the ratchet screwdrivers of the SQL toolbox, then analytic functions are the cordless power screwdrivers!

## Recent Comments