Home › Monthly Archives › February 2010

Oracle Tips and Tricks – Part 2: Analytic Functions

As Tom Kyte has often said, Analytic functions Rock and Roll!

Back when I first came across them just over 3 years ago, I thought they were the most confusing things ever to have been invented – I just could *not* get my head around the syntax. It wasn’t until I started answering questions at the OTN SQL and PL/SQL forum based on people’s real-life sample data and requirements that I started to make sense of it all.

So, firstly, what is an analytic function and what makes it different to an aggregate function? An aggregate function, such as min() or count(), produces a summary of the rows being worked on. An analytic fuction allows you to access information from other rows within the current row. It does not produce a summary – if you had 5 rows in your table, using an analytic function to find the count of all the rows would give you 5 rows back, whereas the aggregate version of sum would give you just 1.

Eg: With analytics:

SQL> with my_tab as (select 1 col1 from dual union all
  2                  select 2 col1 from dual union all
  3                  select 3 col1 from dual union all
  4                  select 4 col1 from dual union all
  5                  select 5 col1 from dual)
  6  select col1,
  7         sum(col1) over () sum_col1
  8  from   my_tab
  9  /

      COL1   SUM_COL1
---------- ----------
         1         15
         2         15
         3         15
         4         15
         5         15

With aggregate:

SQL> with my_tab as (select 1 col1 from dual union all
  2                  select 2 col1 from dual union all
  3                  select 3 col1 from dual union all
  4                  select 4 col1 from dual union all
  5                  select 5 col1 from dual)
  6  select sum(col1)
  7  from   my_tab;

 SUM(COL1)
----------
        15 

You can see there that both both functions correctly gave the sum of 1 to 5 as 15, but with the analtyic function, you could access each row in the table. This allows you to do such things as “What is the difference between each col1 and the sum of the total?” without having to query the table again:

With analytics:

SQL> with my_tab as (select 1 col1 from dual union all
  2                  select 2 col1 from dual union all
  3                  select 3 col1 from dual union all
  4                  select 4 col1 from dual union all
  5                  select 5 col1 from dual)
  6  select col1,
  7         sum(col1) over () sum_col1,
  8         sum(col1) over () - col1 diff
  9  from   my_tab;

      COL1   SUM_COL1       DIFF
---------- ---------- ----------
         1         15         14
         2         15         13
         3         15         12
         4         15         11
         5         15         10

With aggregate:

SQL> with my_tab as (select 1 col1 from dual union all
  2                  select 2 col1 from dual union all
  3                  select 3 col1 from dual union all
  4                  select 4 col1 from dual union all
  5                  select 5 col1 from dual),
  6       sum_mt as (select sum(col1) sum_col1
  7                  from   my_tab)
  8  select mt.col1,
  9         smt.sum_col1,
 10         smt.sum_col1 - mt.col1 diff
 11  from   my_tab mt,
 12         sum_mt smt;

      COL1   SUM_COL1       DIFF
---------- ---------- ----------
         1         15         14
         2         15         13
         3         15         12
         4         15         11
         5         15         10

I won’t be going over each of the analytic functions – you could always visit the documentation to see what functions there are, but I did want to go through the syntax to try and make it a little easier to understand.

To do this, I am assuming that you, the reader, is familiar with the syntax of aggregate functions.

The basic structure of an analytic function is:

function_name(arguments) over (partition by columns order by columns)

The partition by clause is similar to the group by clause of an aggregate function. It simply says “I want the function to be done on rows partitioned (or grouped) by these columns”.
The order by clause simply tells the analytic function the order it is meant to process the rows within the groups mentioned in the partition by clause (if one is specified). The order of the data makes a great deal of difference to the results of the function, for example, if you wanted a running total. More on this later.

You can have an analytic function without a partition by or an order by clause – this simply says “apply across all the rows in the table at once”.
You can just have the partition by clause – “apply across all the rows in each group at once”.
You can just have the order by clause – “apply across all the rows in the table in this order”.
You can have both partition and order by clause – “apply across the rows in each group in this order”.

Eg.

with my_tab as (select 1 id, 10 value from dual union all
                select 1 id, 20 value from dual union all
                select 1 id, 30 value from dual union all
                select 2 id, 5 value from dual union all
                select 2 id, 15 value from dual union all
                select 3 id, 2 value from dual)
select id,
       value,
       sum(value) over (order by id, value) running_tot_id_asc,
       sum(value) over (order by id desc, value) running_tot_id_desc,
       sum(value) over (partition by id) tot_by_id,
       sum(value) over (partition by id order by value) running_tot_by_id1,
       sum(value) over (partition by id order by value desc) running_tot_by_id2,
       sum(value) over () total_value
from   my_tab;

        ID      VALUE RUNNING_TOT_ID_ASC RUNNING_TOT_ID_DESC  TOT_BY_ID RUNNING_TOT_BY_ID1 RUNNING_TOT_BY_ID2 TOTAL_VALUE
---------- ---------- ------------------ ------------------- ---------- ------------------ ------------------ -----------
         1         10                 10                  32         60                 10                 60          82
         1         20                 30                  52         60                 30                 50          82
         1         30                 60                  82         60                 60                 30          82
         2          5                 65                   7         20                  5                 20          82
         2         15                 80                  22         20                 20                 15          82
         3          2                 82                   2          2                  2                  2          82

6 rows selected.

You can hopefully see the difference that a) the inclusion/exclusion of the partition by / order by clauses affects the results and b) the final ordering of the results has on how the running totals are displayed. In reality, you wouldn’t calculate running totals with different ordering to how the final output would be output, I’ve just done that here for demonstration purposes.

I have used them to answer questions such as:

  • What is the difference between the value in the previous row and in the current row?
  • How can I retrieve rows from table2 that have the exact set of rows in table1 per some join conditions?
  • How can I generate a running sequence over each group of rows in my resultset?
  • etc…

In short, if you’re not familiar with analytic functions yet, you would be wise to research and play around with them. If aggregate functions are the ratchet screwdrivers of the SQL toolbox, then analytic functions are the cordless power screwdrivers!

Oracle Tips and Tricks – Part 1 in a series (I hope!)

Over the past couple of years or so, I’ve been hanging round the OTN SQL and PL/SQL forum, trying to help out where I can. Originally, I started doing this as a way of gaining experience of different “real world” examples, so that I could consolidate my knowledge (such as how to work with analytic functions!) and also expose myself to a lot of different issues. And also, where possible, to help people!

Since then, I’ve learnt much more than I could ever have hoped to, from various tips and tricks that I use often, to things that I haven’t yet needed to use in my day-to-day job. Of this latter group, there are things that I haven’t got the set up or access to try out, but even though I haven’t used them, I *know* that they exist and can refer back to them if I ever do come across a situation that requires them.

I cannot recommend enough to people that if you want to consolidate your SQL and PL/SQL knowledge and improve your skills, start helping out on the OTN SQL and PL/SQL forum!

Anyway, I thought that I would start writing about some of the things that I have picked up from my fellow forum helpers (they know who they are! *waves*) in the hopes that more people will come across them.

Subquery Factoring

Subquery factoring has been around since Oracle 9i, and is an incredibly useful tool. This is one thing that I picked up from the forums that I use at least once every day whilst at work!

In essence, it’s a way of giving each subquery you use within a SELECT statement a name and giving the scope of that subquery to the whole query. Whilst you can name inline views, you have also restricted the scope of that subquery to the bit of the query that it’s in.

The format looks like:

with subq1 as (select ....),
     subq2 as (select ....),
     ...
     subqn as (select ....)
select ...
from   subq1,
       subq2,
       ...
       subqn
where  ....;

You can have one or multiple subqueries defined, and you can also refer to previous subqueries within later subqueries (eg. subq2 could select from subq1, etc). All subqueries must be selected from at least one somewhere in the overall query.

The beauty of subquery factoring is that if you have a subquery repeated throughout a query, like:

select 'first_query' name_col, col1, col3
from   (select col1, col2, sum(col3) col3
        from   a_table)
where  col2 != 'hello'
union all
select 'second_query' name_col, col1, col3
from   (select col1, col2, sum(col3) col3
        from   a_table)
where  col2 = 'hello';

where the subquery gets run twice, when you factor it out, it only gets run once and then accessed twice:

with my_sub_query as (select col1, col2, sum(col3) col3
                      from   a_table)
select 'first_query' name_col, col1, col3
from   my_sub_query
union all
select 'second_query' name_col, col1, col3
from   my_sub_query;

Oracle will quite happily create a Global Temporary Table behind the scenes to hold the data from the subquery if it thinks it is beneficial to do so.

As well as performance benefits, it helps to make the query easier to read by modularising the SQL statement. And as we all know, modularising your code and making it resuable is Good Practice (TM)!

As well as all those benefits, subquery factoring has become very popular as a means of simulating a table with data, like so:

with my_tab as (select 1 col1, 'A' col2, 'hi' col3 from dual union all
                select 2 col1, 'B' col2, 'bye' col3 from dual)
select *
from   my_tab;

This comes in very, very handy when needing to set up test cases that other people can run, as there’s no longer a need for them to create a test table – the data’s already there!