Tabibitosan is a technique that I picked up from, guess where? Yup, the OTN Forums! It was Aketi Jyuuzou who first introduced me to the idea, and it’s a very useful technique for a certain set of problems. Aketi’s forum post has lots of information on it, but I just wanted to demonstrate a simple example here, to give you a taste of what you can use it for.
So, suppose we have a set of data, which is unique for id and dt:
ID DT ---------- ---------- 1 01/01/2011 1 02/01/2011 1 03/01/2011 1 04/01/2011 1 05/01/2011 1 10/01/2011 1 11/01/2011 1 12/01/2011 1 14/01/2011 1 15/01/2011 1 16/01/2011 2 01/01/2011 2 02/01/2011 2 03/01/2011 2 04/01/2011 2 06/01/2011 2 07/01/2011 2 11/01/2011 2 12/01/2011 2 13/01/2011 2 28/01/2011 2 30/01/2011 2 31/01/2011 2 01/02/2011
and you want to group the results to find the start and end dates of consecutive rows, eg:
ID MIN_DT MAX_DT ---------- ---------- ---------- 1 01/01/2011 05/01/2011 1 10/01/2011 12/01/2011 1 14/01/2011 16/01/2011 2 01/01/2011 04/01/2011 2 06/01/2011 07/01/2011 2 11/01/2011 13/01/2011 2 28/01/2011 28/01/2011 2 30/01/2011 01/02/2011
Where do we start? Well, the tabibitosan works by assigning a number to each of the rows in the resultset, either over the whole set or over the partitioned result sets – in our case, we’ll be doing this per each id. Then it assigns a different, consecutive number to the rows.
Because we’re using dates, we’ll label each row by converting those to a number in Julian format. We’ll also use the row_number() analytic function to label each row with a consecutive number for each id in ascending dt order. Finally, we’ll subtract one from the other:
with sample_data as (select 1 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all select 1 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all select 1 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all select 1 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all select 1 id, to_date('05/01/2011', 'dd/mm/yyyy') dt from dual union all select 1 id, to_date('10/01/2011', 'dd/mm/yyyy') dt from dual union all select 1 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all select 1 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all select 1 id, to_date('14/01/2011', 'dd/mm/yyyy') dt from dual union all select 1 id, to_date('15/01/2011', 'dd/mm/yyyy') dt from dual union all select 1 id, to_date('16/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('06/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('07/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('13/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('28/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('30/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('31/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('01/02/2011', 'dd/mm/yyyy') dt from dual) select id, dt, to_number(to_char(dt, 'j')) main_rn, row_number() over (partition by id order by dt) partitioned_rn, to_number(to_char(dt, 'j')) - row_number() over (partition by id order by dt) grp from sample_data; ID TO_CHAR(DT,'DD/MM/YYYY') MAIN_RN PARTITIONED_RN GRP ---------- ------------------------ ---------- -------------- ---------- 1 01/01/2011 2455563 1 2455562 1 02/01/2011 2455564 2 2455562 1 03/01/2011 2455565 3 2455562 1 04/01/2011 2455566 4 2455562 1 05/01/2011 2455567 5 2455562 1 10/01/2011 2455572 6 2455566 1 11/01/2011 2455573 7 2455566 1 12/01/2011 2455574 8 2455566 1 14/01/2011 2455576 9 2455567 1 15/01/2011 2455577 10 2455567 1 16/01/2011 2455578 11 2455567 2 01/01/2011 2455563 1 2455562 2 02/01/2011 2455564 2 2455562 2 03/01/2011 2455565 3 2455562 2 04/01/2011 2455566 4 2455562 2 06/01/2011 2455568 5 2455563 2 07/01/2011 2455569 6 2455563 2 11/01/2011 2455573 7 2455566 2 12/01/2011 2455574 8 2455566 2 13/01/2011 2455575 9 2455566 2 28/01/2011 2455590 10 2455580 2 30/01/2011 2455592 11 2455581 2 31/01/2011 2455593 12 2455581 2 01/02/2011 2455594 13 2455581
You can see that we now have the same number (the grp column) for rows with consecutive dt’s. Each group has a different grp value per each id. This is because as each dt increases by 1, so does the row number. The difference between the two values of that row and the previous row remains the same. As soon as the dt jumps by more than 1, the difference becomes greater, as the row number will only ever increase by 1.
Having identified the numbers that separate the rows into consecutive chunks, we can group the result set using them, to find the min and max dt’s for each set:
with sample_data as (select 1 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all select 1 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all select 1 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all select 1 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all select 1 id, to_date('05/01/2011', 'dd/mm/yyyy') dt from dual union all select 1 id, to_date('10/01/2011', 'dd/mm/yyyy') dt from dual union all select 1 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all select 1 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all select 1 id, to_date('14/01/2011', 'dd/mm/yyyy') dt from dual union all select 1 id, to_date('15/01/2011', 'dd/mm/yyyy') dt from dual union all select 1 id, to_date('16/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('06/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('07/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('13/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('28/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('30/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('31/01/2011', 'dd/mm/yyyy') dt from dual union all select 2 id, to_date('01/02/2011', 'dd/mm/yyyy') dt from dual), tabibitosan as (select id, dt, to_number(to_char(dt, j)) - row_number() over (partition by id order by dt) grp from sample_data) select id, min(dt) min_dt, max(dt) max_dt from tabibitosan group by id, grp order by id, grp; ID MIN_DT MAX_DT ---------- ---------- ---------- 1 01/01/2011 05/01/2011 1 10/01/2011 12/01/2011 1 14/01/2011 16/01/2011 2 01/01/2011 04/01/2011 2 06/01/2011 07/01/2011 2 11/01/2011 13/01/2011 2 28/01/2011 28/01/2011 2 30/01/2011 01/02/2011
Neat, huh?
Whenever I see this kind of “grouping consecutive rows together” problem, the tabibitosan method immediately leaps to mind. Thanks Aketi!
Also, note how I’ve used both analytic functions and subquery factoring in this – both powerful tools in their own right, you can use them as building blocks to come up with queries that may look fairly complicated, but are easy to break down to see what each constituent part is doing. Pack a few tools in your SQL toolkit and you can easily combine them to build complex queries.
Hi Boneist,
Your example doesn’t work when the dates cross a month or a year. You can fix that quite easily by changing the date format mask element ‘yyyymmdd’ to ‘j’.
Regards,
Rob van Wijk.
Thanks Rob; corrected now! There was no excuse for that mistake, really *hangs head* *{;-)