Home › Tag Archives › tips

Oracle Tips and Tricks – Part 3: Tabibitosan

Tabibitosan is a technique that I picked up from, guess where? Yup, the OTN Forums! It was Aketi Jyuuzou who first introduced me to the idea, and it’s a very useful technique for a certain set of problems. Aketi’s forum post has lots of information on it, but I just wanted to demonstrate a simple example here, to give you a taste of what you can use it for.

So, suppose we have a set of data, which is unique for id and dt:

        ID DT
---------- ----------
         1 01/01/2011
         1 02/01/2011
         1 03/01/2011
         1 04/01/2011
         1 05/01/2011
         1 10/01/2011
         1 11/01/2011
         1 12/01/2011
         1 14/01/2011
         1 15/01/2011
         1 16/01/2011
         2 01/01/2011
         2 02/01/2011
         2 03/01/2011
         2 04/01/2011
         2 06/01/2011
         2 07/01/2011
         2 11/01/2011
         2 12/01/2011
         2 13/01/2011
         2 28/01/2011
         2 30/01/2011
         2 31/01/2011
         2 01/02/2011

and you want to group the results to find the start and end dates of consecutive rows, eg:

        ID MIN_DT     MAX_DT    
---------- ---------- ----------
         1 01/01/2011 05/01/2011
         1 10/01/2011 12/01/2011
         1 14/01/2011 16/01/2011
         2 01/01/2011 04/01/2011
         2 06/01/2011 07/01/2011
         2 11/01/2011 13/01/2011
         2 28/01/2011 28/01/2011
         2 30/01/2011 01/02/2011

Where do we start? Well, the tabibitosan works by assigning a number to each of the rows in the resultset, either over the whole set or over the partitioned result sets – in our case, we’ll be doing this per each id. Then it assigns a different, consecutive number to the rows.

Because we’re using dates, we’ll label each row by converting those to a number in Julian format. We’ll also use the row_number() analytic function to label each row with a consecutive number for each id in ascending dt order. Finally, we’ll subtract one from the other:

with sample_data as (select 1 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('05/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('10/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('14/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('15/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('16/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('06/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('07/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('13/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('28/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('30/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('31/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('01/02/2011', 'dd/mm/yyyy') dt from dual)
select id,
       dt,
       to_number(to_char(dt, 'j')) main_rn,
       row_number() over (partition by id order by dt) partitioned_rn,
       to_number(to_char(dt, 'j'))
         - row_number() over (partition by id order by dt) grp
from   sample_data;

        ID TO_CHAR(DT,'DD/MM/YYYY')    MAIN_RN PARTITIONED_RN        GRP
---------- ------------------------ ---------- -------------- ----------
         1 01/01/2011                  2455563              1    2455562
         1 02/01/2011                  2455564              2    2455562
         1 03/01/2011                  2455565              3    2455562
         1 04/01/2011                  2455566              4    2455562
         1 05/01/2011                  2455567              5    2455562
         1 10/01/2011                  2455572              6    2455566
         1 11/01/2011                  2455573              7    2455566
         1 12/01/2011                  2455574              8    2455566
         1 14/01/2011                  2455576              9    2455567
         1 15/01/2011                  2455577             10    2455567
         1 16/01/2011                  2455578             11    2455567
         2 01/01/2011                  2455563              1    2455562
         2 02/01/2011                  2455564              2    2455562
         2 03/01/2011                  2455565              3    2455562
         2 04/01/2011                  2455566              4    2455562
         2 06/01/2011                  2455568              5    2455563
         2 07/01/2011                  2455569              6    2455563
         2 11/01/2011                  2455573              7    2455566
         2 12/01/2011                  2455574              8    2455566
         2 13/01/2011                  2455575              9    2455566
         2 28/01/2011                  2455590             10    2455580
         2 30/01/2011                  2455592             11    2455581
         2 31/01/2011                  2455593             12    2455581
         2 01/02/2011                  2455594             13    2455581

You can see that we now have the same number (the grp column) for rows with consecutive dt’s. Each group has a different grp value per each id. This is because as each dt increases by 1, so does the row number. The difference between the two values of that row and the previous row remains the same. As soon as the dt jumps by more than 1, the difference becomes greater, as the row number will only ever increase by 1.

Having identified the numbers that separate the rows into consecutive chunks, we can group the result set using them, to find the min and max dt’s for each set:

with sample_data as (select 1 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('05/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('10/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('14/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('15/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('16/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('06/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('07/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('13/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('28/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('30/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('31/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('01/02/2011', 'dd/mm/yyyy') dt from dual),
     tabibitosan as (select id,
                            dt,
                            to_number(to_char(dt, j))
                              - row_number() over (partition by id order by dt) grp
                     from   sample_data)
select id,
       min(dt) min_dt,
       max(dt) max_dt
from   tabibitosan
group by id, grp
order by id, grp;

        ID MIN_DT     MAX_DT    
---------- ---------- ----------
         1 01/01/2011 05/01/2011
         1 10/01/2011 12/01/2011
         1 14/01/2011 16/01/2011
         2 01/01/2011 04/01/2011
         2 06/01/2011 07/01/2011
         2 11/01/2011 13/01/2011
         2 28/01/2011 28/01/2011
         2 30/01/2011 01/02/2011

Neat, huh?

Whenever I see this kind of “grouping consecutive rows together” problem, the tabibitosan method immediately leaps to mind. Thanks Aketi!

Also, note how I’ve used both analytic functions and subquery factoring in this – both powerful tools in their own right, you can use them as building blocks to come up with queries that may look fairly complicated, but are easy to break down to see what each constituent part is doing. Pack a few tools in your SQL toolkit and you can easily combine them to build complex queries.

Oracle Tips and Tricks – Part 2: Analytic Functions

As Tom Kyte has often said, Analytic functions Rock and Roll!

Back when I first came across them just over 3 years ago, I thought they were the most confusing things ever to have been invented – I just could *not* get my head around the syntax. It wasn’t until I started answering questions at the OTN SQL and PL/SQL forum based on people’s real-life sample data and requirements that I started to make sense of it all.

So, firstly, what is an analytic function and what makes it different to an aggregate function? An aggregate function, such as min() or count(), produces a summary of the rows being worked on. An analytic fuction allows you to access information from other rows within the current row. It does not produce a summary – if you had 5 rows in your table, using an analytic function to find the count of all the rows would give you 5 rows back, whereas the aggregate version of sum would give you just 1.

Eg: With analytics:

SQL> with my_tab as (select 1 col1 from dual union all
  2                  select 2 col1 from dual union all
  3                  select 3 col1 from dual union all
  4                  select 4 col1 from dual union all
  5                  select 5 col1 from dual)
  6  select col1,
  7         sum(col1) over () sum_col1
  8  from   my_tab
  9  /

      COL1   SUM_COL1
---------- ----------
         1         15
         2         15
         3         15
         4         15
         5         15

With aggregate:

SQL> with my_tab as (select 1 col1 from dual union all
  2                  select 2 col1 from dual union all
  3                  select 3 col1 from dual union all
  4                  select 4 col1 from dual union all
  5                  select 5 col1 from dual)
  6  select sum(col1)
  7  from   my_tab;

 SUM(COL1)
----------
        15 

You can see there that both both functions correctly gave the sum of 1 to 5 as 15, but with the analtyic function, you could access each row in the table. This allows you to do such things as “What is the difference between each col1 and the sum of the total?” without having to query the table again:

With analytics:

SQL> with my_tab as (select 1 col1 from dual union all
  2                  select 2 col1 from dual union all
  3                  select 3 col1 from dual union all
  4                  select 4 col1 from dual union all
  5                  select 5 col1 from dual)
  6  select col1,
  7         sum(col1) over () sum_col1,
  8         sum(col1) over () - col1 diff
  9  from   my_tab;

      COL1   SUM_COL1       DIFF
---------- ---------- ----------
         1         15         14
         2         15         13
         3         15         12
         4         15         11
         5         15         10

With aggregate:

SQL> with my_tab as (select 1 col1 from dual union all
  2                  select 2 col1 from dual union all
  3                  select 3 col1 from dual union all
  4                  select 4 col1 from dual union all
  5                  select 5 col1 from dual),
  6       sum_mt as (select sum(col1) sum_col1
  7                  from   my_tab)
  8  select mt.col1,
  9         smt.sum_col1,
 10         smt.sum_col1 - mt.col1 diff
 11  from   my_tab mt,
 12         sum_mt smt;

      COL1   SUM_COL1       DIFF
---------- ---------- ----------
         1         15         14
         2         15         13
         3         15         12
         4         15         11
         5         15         10

I won’t be going over each of the analytic functions – you could always visit the documentation to see what functions there are, but I did want to go through the syntax to try and make it a little easier to understand.

To do this, I am assuming that you, the reader, is familiar with the syntax of aggregate functions.

The basic structure of an analytic function is:

function_name(arguments) over (partition by columns order by columns)

The partition by clause is similar to the group by clause of an aggregate function. It simply says “I want the function to be done on rows partitioned (or grouped) by these columns”.
The order by clause simply tells the analytic function the order it is meant to process the rows within the groups mentioned in the partition by clause (if one is specified). The order of the data makes a great deal of difference to the results of the function, for example, if you wanted a running total. More on this later.

You can have an analytic function without a partition by or an order by clause – this simply says “apply across all the rows in the table at once”.
You can just have the partition by clause – “apply across all the rows in each group at once”.
You can just have the order by clause – “apply across all the rows in the table in this order”.
You can have both partition and order by clause – “apply across the rows in each group in this order”.

Eg.

with my_tab as (select 1 id, 10 value from dual union all
                select 1 id, 20 value from dual union all
                select 1 id, 30 value from dual union all
                select 2 id, 5 value from dual union all
                select 2 id, 15 value from dual union all
                select 3 id, 2 value from dual)
select id,
       value,
       sum(value) over (order by id, value) running_tot_id_asc,
       sum(value) over (order by id desc, value) running_tot_id_desc,
       sum(value) over (partition by id) tot_by_id,
       sum(value) over (partition by id order by value) running_tot_by_id1,
       sum(value) over (partition by id order by value desc) running_tot_by_id2,
       sum(value) over () total_value
from   my_tab;

        ID      VALUE RUNNING_TOT_ID_ASC RUNNING_TOT_ID_DESC  TOT_BY_ID RUNNING_TOT_BY_ID1 RUNNING_TOT_BY_ID2 TOTAL_VALUE
---------- ---------- ------------------ ------------------- ---------- ------------------ ------------------ -----------
         1         10                 10                  32         60                 10                 60          82
         1         20                 30                  52         60                 30                 50          82
         1         30                 60                  82         60                 60                 30          82
         2          5                 65                   7         20                  5                 20          82
         2         15                 80                  22         20                 20                 15          82
         3          2                 82                   2          2                  2                  2          82

6 rows selected.

You can hopefully see the difference that a) the inclusion/exclusion of the partition by / order by clauses affects the results and b) the final ordering of the results has on how the running totals are displayed. In reality, you wouldn’t calculate running totals with different ordering to how the final output would be output, I’ve just done that here for demonstration purposes.

I have used them to answer questions such as:

  • What is the difference between the value in the previous row and in the current row?
  • How can I retrieve rows from table2 that have the exact set of rows in table1 per some join conditions?
  • How can I generate a running sequence over each group of rows in my resultset?
  • etc…

In short, if you’re not familiar with analytic functions yet, you would be wise to research and play around with them. If aggregate functions are the ratchet screwdrivers of the SQL toolbox, then analytic functions are the cordless power screwdrivers!

Oracle Tips and Tricks – Part 1 in a series (I hope!)

Over the past couple of years or so, I’ve been hanging round the OTN SQL and PL/SQL forum, trying to help out where I can. Originally, I started doing this as a way of gaining experience of different “real world” examples, so that I could consolidate my knowledge (such as how to work with analytic functions!) and also expose myself to a lot of different issues. And also, where possible, to help people!

Since then, I’ve learnt much more than I could ever have hoped to, from various tips and tricks that I use often, to things that I haven’t yet needed to use in my day-to-day job. Of this latter group, there are things that I haven’t got the set up or access to try out, but even though I haven’t used them, I *know* that they exist and can refer back to them if I ever do come across a situation that requires them.

I cannot recommend enough to people that if you want to consolidate your SQL and PL/SQL knowledge and improve your skills, start helping out on the OTN SQL and PL/SQL forum!

Anyway, I thought that I would start writing about some of the things that I have picked up from my fellow forum helpers (they know who they are! *waves*) in the hopes that more people will come across them.

Subquery Factoring

Subquery factoring has been around since Oracle 9i, and is an incredibly useful tool. This is one thing that I picked up from the forums that I use at least once every day whilst at work!

In essence, it’s a way of giving each subquery you use within a SELECT statement a name and giving the scope of that subquery to the whole query. Whilst you can name inline views, you have also restricted the scope of that subquery to the bit of the query that it’s in.

The format looks like:

with subq1 as (select ....),
     subq2 as (select ....),
     ...
     subqn as (select ....)
select ...
from   subq1,
       subq2,
       ...
       subqn
where  ....;

You can have one or multiple subqueries defined, and you can also refer to previous subqueries within later subqueries (eg. subq2 could select from subq1, etc). All subqueries must be selected from at least one somewhere in the overall query.

The beauty of subquery factoring is that if you have a subquery repeated throughout a query, like:

select 'first_query' name_col, col1, col3
from   (select col1, col2, sum(col3) col3
        from   a_table)
where  col2 != 'hello'
union all
select 'second_query' name_col, col1, col3
from   (select col1, col2, sum(col3) col3
        from   a_table)
where  col2 = 'hello';

where the subquery gets run twice, when you factor it out, it only gets run once and then accessed twice:

with my_sub_query as (select col1, col2, sum(col3) col3
                      from   a_table)
select 'first_query' name_col, col1, col3
from   my_sub_query
union all
select 'second_query' name_col, col1, col3
from   my_sub_query;

Oracle will quite happily create a Global Temporary Table behind the scenes to hold the data from the subquery if it thinks it is beneficial to do so.

As well as performance benefits, it helps to make the query easier to read by modularising the SQL statement. And as we all know, modularising your code and making it resuable is Good Practice (TM)!

As well as all those benefits, subquery factoring has become very popular as a means of simulating a table with data, like so:

with my_tab as (select 1 col1, 'A' col2, 'hi' col3 from dual union all
                select 2 col1, 'B' col2, 'bye' col3 from dual)
select *
from   my_tab;

This comes in very, very handy when needing to set up test cases that other people can run, as there’s no longer a need for them to create a test table – the data’s already there!