Home › Tag Archives › learning

Oracle Tips and Tricks – Part 3: Tabibitosan

Tabibitosan is a technique that I picked up from, guess where? Yup, the OTN Forums! It was Aketi Jyuuzou who first introduced me to the idea, and it’s a very useful technique for a certain set of problems. Aketi’s forum post has lots of information on it, but I just wanted to demonstrate a simple example here, to give you a taste of what you can use it for.

So, suppose we have a set of data, which is unique for id and dt:

        ID DT
---------- ----------
         1 01/01/2011
         1 02/01/2011
         1 03/01/2011
         1 04/01/2011
         1 05/01/2011
         1 10/01/2011
         1 11/01/2011
         1 12/01/2011
         1 14/01/2011
         1 15/01/2011
         1 16/01/2011
         2 01/01/2011
         2 02/01/2011
         2 03/01/2011
         2 04/01/2011
         2 06/01/2011
         2 07/01/2011
         2 11/01/2011
         2 12/01/2011
         2 13/01/2011
         2 28/01/2011
         2 30/01/2011
         2 31/01/2011
         2 01/02/2011

and you want to group the results to find the start and end dates of consecutive rows, eg:

        ID MIN_DT     MAX_DT    
---------- ---------- ----------
         1 01/01/2011 05/01/2011
         1 10/01/2011 12/01/2011
         1 14/01/2011 16/01/2011
         2 01/01/2011 04/01/2011
         2 06/01/2011 07/01/2011
         2 11/01/2011 13/01/2011
         2 28/01/2011 28/01/2011
         2 30/01/2011 01/02/2011

Where do we start? Well, the tabibitosan works by assigning a number to each of the rows in the resultset, either over the whole set or over the partitioned result sets – in our case, we’ll be doing this per each id. Then it assigns a different, consecutive number to the rows.

Because we’re using dates, we’ll label each row by converting those to a number in Julian format. We’ll also use the row_number() analytic function to label each row with a consecutive number for each id in ascending dt order. Finally, we’ll subtract one from the other:

with sample_data as (select 1 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('05/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('10/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('14/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('15/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('16/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('06/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('07/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('13/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('28/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('30/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('31/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('01/02/2011', 'dd/mm/yyyy') dt from dual)
select id,
       dt,
       to_number(to_char(dt, 'j')) main_rn,
       row_number() over (partition by id order by dt) partitioned_rn,
       to_number(to_char(dt, 'j'))
         - row_number() over (partition by id order by dt) grp
from   sample_data;

        ID TO_CHAR(DT,'DD/MM/YYYY')    MAIN_RN PARTITIONED_RN        GRP
---------- ------------------------ ---------- -------------- ----------
         1 01/01/2011                  2455563              1    2455562
         1 02/01/2011                  2455564              2    2455562
         1 03/01/2011                  2455565              3    2455562
         1 04/01/2011                  2455566              4    2455562
         1 05/01/2011                  2455567              5    2455562
         1 10/01/2011                  2455572              6    2455566
         1 11/01/2011                  2455573              7    2455566
         1 12/01/2011                  2455574              8    2455566
         1 14/01/2011                  2455576              9    2455567
         1 15/01/2011                  2455577             10    2455567
         1 16/01/2011                  2455578             11    2455567
         2 01/01/2011                  2455563              1    2455562
         2 02/01/2011                  2455564              2    2455562
         2 03/01/2011                  2455565              3    2455562
         2 04/01/2011                  2455566              4    2455562
         2 06/01/2011                  2455568              5    2455563
         2 07/01/2011                  2455569              6    2455563
         2 11/01/2011                  2455573              7    2455566
         2 12/01/2011                  2455574              8    2455566
         2 13/01/2011                  2455575              9    2455566
         2 28/01/2011                  2455590             10    2455580
         2 30/01/2011                  2455592             11    2455581
         2 31/01/2011                  2455593             12    2455581
         2 01/02/2011                  2455594             13    2455581

You can see that we now have the same number (the grp column) for rows with consecutive dt’s. Each group has a different grp value per each id. This is because as each dt increases by 1, so does the row number. The difference between the two values of that row and the previous row remains the same. As soon as the dt jumps by more than 1, the difference becomes greater, as the row number will only ever increase by 1.

Having identified the numbers that separate the rows into consecutive chunks, we can group the result set using them, to find the min and max dt’s for each set:

with sample_data as (select 1 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('05/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('10/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('14/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('15/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('16/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('06/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('07/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('13/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('28/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('30/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('31/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('01/02/2011', 'dd/mm/yyyy') dt from dual),
     tabibitosan as (select id,
                            dt,
                            to_number(to_char(dt, j))
                              - row_number() over (partition by id order by dt) grp
                     from   sample_data)
select id,
       min(dt) min_dt,
       max(dt) max_dt
from   tabibitosan
group by id, grp
order by id, grp;

        ID MIN_DT     MAX_DT    
---------- ---------- ----------
         1 01/01/2011 05/01/2011
         1 10/01/2011 12/01/2011
         1 14/01/2011 16/01/2011
         2 01/01/2011 04/01/2011
         2 06/01/2011 07/01/2011
         2 11/01/2011 13/01/2011
         2 28/01/2011 28/01/2011
         2 30/01/2011 01/02/2011

Neat, huh?

Whenever I see this kind of “grouping consecutive rows together” problem, the tabibitosan method immediately leaps to mind. Thanks Aketi!

Also, note how I’ve used both analytic functions and subquery factoring in this – both powerful tools in their own right, you can use them as building blocks to come up with queries that may look fairly complicated, but are easy to break down to see what each constituent part is doing. Pack a few tools in your SQL toolkit and you can easily combine them to build complex queries.

Oracle Tips and Tricks – Part 2: Analytic Functions

As Tom Kyte has often said, Analytic functions Rock and Roll!

Back when I first came across them just over 3 years ago, I thought they were the most confusing things ever to have been invented – I just could *not* get my head around the syntax. It wasn’t until I started answering questions at the OTN SQL and PL/SQL forum based on people’s real-life sample data and requirements that I started to make sense of it all.

So, firstly, what is an analytic function and what makes it different to an aggregate function? An aggregate function, such as min() or count(), produces a summary of the rows being worked on. An analytic fuction allows you to access information from other rows within the current row. It does not produce a summary – if you had 5 rows in your table, using an analytic function to find the count of all the rows would give you 5 rows back, whereas the aggregate version of sum would give you just 1.

Eg: With analytics:

SQL> with my_tab as (select 1 col1 from dual union all
  2                  select 2 col1 from dual union all
  3                  select 3 col1 from dual union all
  4                  select 4 col1 from dual union all
  5                  select 5 col1 from dual)
  6  select col1,
  7         sum(col1) over () sum_col1
  8  from   my_tab
  9  /

      COL1   SUM_COL1
---------- ----------
         1         15
         2         15
         3         15
         4         15
         5         15

With aggregate:

SQL> with my_tab as (select 1 col1 from dual union all
  2                  select 2 col1 from dual union all
  3                  select 3 col1 from dual union all
  4                  select 4 col1 from dual union all
  5                  select 5 col1 from dual)
  6  select sum(col1)
  7  from   my_tab;

 SUM(COL1)
----------
        15 

You can see there that both both functions correctly gave the sum of 1 to 5 as 15, but with the analtyic function, you could access each row in the table. This allows you to do such things as “What is the difference between each col1 and the sum of the total?” without having to query the table again:

With analytics:

SQL> with my_tab as (select 1 col1 from dual union all
  2                  select 2 col1 from dual union all
  3                  select 3 col1 from dual union all
  4                  select 4 col1 from dual union all
  5                  select 5 col1 from dual)
  6  select col1,
  7         sum(col1) over () sum_col1,
  8         sum(col1) over () - col1 diff
  9  from   my_tab;

      COL1   SUM_COL1       DIFF
---------- ---------- ----------
         1         15         14
         2         15         13
         3         15         12
         4         15         11
         5         15         10

With aggregate:

SQL> with my_tab as (select 1 col1 from dual union all
  2                  select 2 col1 from dual union all
  3                  select 3 col1 from dual union all
  4                  select 4 col1 from dual union all
  5                  select 5 col1 from dual),
  6       sum_mt as (select sum(col1) sum_col1
  7                  from   my_tab)
  8  select mt.col1,
  9         smt.sum_col1,
 10         smt.sum_col1 - mt.col1 diff
 11  from   my_tab mt,
 12         sum_mt smt;

      COL1   SUM_COL1       DIFF
---------- ---------- ----------
         1         15         14
         2         15         13
         3         15         12
         4         15         11
         5         15         10

I won’t be going over each of the analytic functions – you could always visit the documentation to see what functions there are, but I did want to go through the syntax to try and make it a little easier to understand.

To do this, I am assuming that you, the reader, is familiar with the syntax of aggregate functions.

The basic structure of an analytic function is:

function_name(arguments) over (partition by columns order by columns)

The partition by clause is similar to the group by clause of an aggregate function. It simply says “I want the function to be done on rows partitioned (or grouped) by these columns”.
The order by clause simply tells the analytic function the order it is meant to process the rows within the groups mentioned in the partition by clause (if one is specified). The order of the data makes a great deal of difference to the results of the function, for example, if you wanted a running total. More on this later.

You can have an analytic function without a partition by or an order by clause – this simply says “apply across all the rows in the table at once”.
You can just have the partition by clause – “apply across all the rows in each group at once”.
You can just have the order by clause – “apply across all the rows in the table in this order”.
You can have both partition and order by clause – “apply across the rows in each group in this order”.

Eg.

with my_tab as (select 1 id, 10 value from dual union all
                select 1 id, 20 value from dual union all
                select 1 id, 30 value from dual union all
                select 2 id, 5 value from dual union all
                select 2 id, 15 value from dual union all
                select 3 id, 2 value from dual)
select id,
       value,
       sum(value) over (order by id, value) running_tot_id_asc,
       sum(value) over (order by id desc, value) running_tot_id_desc,
       sum(value) over (partition by id) tot_by_id,
       sum(value) over (partition by id order by value) running_tot_by_id1,
       sum(value) over (partition by id order by value desc) running_tot_by_id2,
       sum(value) over () total_value
from   my_tab;

        ID      VALUE RUNNING_TOT_ID_ASC RUNNING_TOT_ID_DESC  TOT_BY_ID RUNNING_TOT_BY_ID1 RUNNING_TOT_BY_ID2 TOTAL_VALUE
---------- ---------- ------------------ ------------------- ---------- ------------------ ------------------ -----------
         1         10                 10                  32         60                 10                 60          82
         1         20                 30                  52         60                 30                 50          82
         1         30                 60                  82         60                 60                 30          82
         2          5                 65                   7         20                  5                 20          82
         2         15                 80                  22         20                 20                 15          82
         3          2                 82                   2          2                  2                  2          82

6 rows selected.

You can hopefully see the difference that a) the inclusion/exclusion of the partition by / order by clauses affects the results and b) the final ordering of the results has on how the running totals are displayed. In reality, you wouldn’t calculate running totals with different ordering to how the final output would be output, I’ve just done that here for demonstration purposes.

I have used them to answer questions such as:

  • What is the difference between the value in the previous row and in the current row?
  • How can I retrieve rows from table2 that have the exact set of rows in table1 per some join conditions?
  • How can I generate a running sequence over each group of rows in my resultset?
  • etc…

In short, if you’re not familiar with analytic functions yet, you would be wise to research and play around with them. If aggregate functions are the ratchet screwdrivers of the SQL toolbox, then analytic functions are the cordless power screwdrivers!

UKOUG Tech and EBS Conference – what have I learnt for next time?

So, UKOUG Tech and EBS 2009 Conference was the first conference I’ve ever been to. I headed out to it full of nerves over whether I’d get lost, have the right materials to take notes with, etc, so I thought I’d write a list of things I’d do differently next year. It may come in handy for next year’s first-timers, who knows?!

So, for my next UKOUG conference, I will:

  • make plans as to when to meet up with people I know, if they’re also going to be attending, especially for lunch and in the evenings
  • compare agendas with said people
  • talk to my optician re. what I can do to make it easier to see the slides; different glasses just for the long distance viewing, maybe?
  • make sure I have plenty of ibuprofen and paracetamol at hand, just in case!
  • pick up the conference goody-bag prior to my first presentation – do this by making sure you go round the entire exhibition hall, as this time, they were handed out at the very back!
  • not worry about having to buy a separate notebook; one is included in the conference goody-bag
  • try not to fill every available slot with a presentation; it will be information overload!
  • try to get enough sleep!
  • be prepared to be overwhelmed and try to take it in step when that happens
  • make much more effort to overcome my shyness and *talk* to more people, and especially the people I have talked to online that I really want to say hi to!

Things that I already did which turned out to be really good ideas:

  • wear comfortable shoes
  • read (and comment on!) lots of blogs throughout the year from different people (to a) gain a wider perspective and b) get familiar with the wider Oracle community)
  • become friends with someone who has been to previous conferences and knows a few of the regular attendees
  • come prepared to learn
  • follow the signs to the halls and pay attention to the most excellent and helpful ICC staff!
  • explore the area before the conference starts
  • Twitter like mad and use the appropriate hash tags! This is a useful means of introducing yourself to other conference attendees and also the wider Oracle community *{;-)
  • work out what sessions you’re going to attend prior to the conference start; having a plan of action for the day in advance helps a lot!
  • be prepared to enjoy yourself!

All in all, I had a great time and I learnt a lot. I also met lots of people that I’d never dreamed I would actually meet, and hopefully have begun broadening my network of contacts!

I would wholly recommend attending the UKOUG Tech and EBS Conference, and would happily go again next year, I get the opportunity to go again!

Learning things

So today, I attended Doug Burns’ “10g/11g Performance Analysis – How I learned to love pictures” presentation at work today. Very interesting; I want to get my hands on Grid Control now, and play around with the pretty graphs! It’s a shame that our area’s DBAs are both absent (one’s on holiday I think, the other is unfortunately unwell), so it’ll be a while before I get my grubby little paws on it!

I really enjoyed the presentation – it really didn’t feel like an hour had gone past, but it had *{:-( He kept threatening to keep going until 4 or 5pm on the subject, but I could have quite happily stayed and listened (and learnt!) for that long!

One interesting fact that I wasn’t aware of that cropped up, is that CPU’s only process one instruction at a time. Or at least, that’s how Oracle views it (oh, probably a highly simplified view of things there, but I read the wikipedia article on CPUs and a couple of other pages and I thought it best to stop before my brain leaked out of my ear!), in the case of the multi-instruction per CPU (ie. that just counts as X CPU’s).

I vaguely thought that CPU’s automatically handled several instructions at once, thereby serving several processes at once (ie. the concept of a CPU being 60% busy would be true). Instead, it’s either busy or not busy, and anything that arrives whilst the CPU is busy is put in a queue – akin to going down to the post office and joining the queue. “Please go to window 3 -> -> ->”!

Just goes to show how woefully lacking my general computing knowledge is! It took Doug a while to work out what I was asking about, because that’s such a basic, fundamental thing to know! Still, at least I know now! A day in which you learn something is never a waste!