Home › Tag Archives › sql

Writing “complex” SQL – a how to, pt1b

As a corollary to the previous entry in this series, we can take the question further.

When I first heard the requirement:

I need to display the numbers 1-12 in 3 rows of four columns, like so:

1 2 3 4
5 6 7 8
9 10 11 12

my first reaction was “really? Just the numbers 1 to 12? You won’t be given other numbers or values to display in that particular way?”

The answer to that was “no”, so the solution I gave in the last post did the trick.

What if the answer had been “yes – I need to display <some other list of twelve bits of info> in that format”?

Thankfully, we’ve already done most of the work – we know how to display twelve things in 3 rows of 4 columns. The trick with the new requirement is identifying which values go in which place.

We’d do that by artificially labelling the rows with a number from 1 – 12.

Lets take our sample data as:

with sample_data as (select 12 col1 from dual union all
                     select -1 col1 from dual union all
                     select 3 col1 from dual union all
                     select 13 col1 from dual union all
                     select -20 col1 from dual union all
                     select 31 col1 from dual union all
                     select 0 col1 from dual union all
                     select -9 col1 from dual union all
                     select -2 col1 from dual union all
                     select 18 col1 from dual union all
                     select 7 col1 from dual union all
                     select 4 col1 from dual)
select col1
from   sample_data
order by col1;

which gives us:

      COL1
----------
       -20
        -9
        -2
        -1
         0
         3
         4
         7
        12
        13
        18
        31

Reaching into our SQL toolbox, we’ll use the ROW_NUMBER() analytic function to assign a number for each row:

with sample_data as (select 12 col1 from dual union all
                     select -1 col1 from dual union all
                     select 3 col1 from dual union all
                     select 13 col1 from dual union all
                     select -20 col1 from dual union all
                     select 31 col1 from dual union all
                     select 0 col1 from dual union all
                     select -9 col1 from dual union all
                     select -2 col1 from dual union all
                     select 18 col1 from dual union all
                     select 7 col1 from dual union all
                     select 4 col1 from dual)
select col1,
       row_number() over (order by col1) rn
from   sample_data
order by col1;

which gives us:

      COL1         RN
---------- ----------
       -20          1
        -9          2
        -2          3
        -1          4
         0          5
         3          6
         4          7
         7          8
        12          9
        13         10
        18         11
        31         12

Aha, now we’re back in familiar territory – we’re back to the numbers 1-12 again!

So, amending the solution from the previous post slightly, we can add it in to give us the results we want:

with sample_data as (select 12 col1 from dual union all
                     select -1 col1 from dual union all
                     select 3 col1 from dual union all
                     select 13 col1 from dual union all
                     select -20 col1 from dual union all
                     select 31 col1 from dual union all
                     select 0 col1 from dual union all
                     select -9 col1 from dual union all
                     select -2 col1 from dual union all
                     select 18 col1 from dual union all
                     select 7 col1 from dual union all
                     select 4 col1 from dual),
         results as (select col1,
                            ceil(row_number() over (order by col1)/4) grp,
                            mod(row_number() over (order by col1), 4) col_pos
                     from   sample_data)
select max(case when col_pos = 1 then col1 end) col1,
       max(case when col_pos = 2 then col1 end) col2,
       max(case when col_pos = 3 then col1 end) col3,
       max(case when col_pos = 0 then col1 end) col4
from   results
group by grp
order by grp;

giving us:

      COL1       COL2       COL3       COL4
---------- ---------- ---------- ----------
       -20         -9         -2         -1
         0          3          4          7
        12         13         18         31

If you had to adjust the format of the columns – eg. you wanted the resuls in 4 rows of 3 columns, or 6 rows of 2 columns, etc, then you’d have to amend the number you’re using in the ceil and mod functions in the “results” subquery to reflect the number of columns you’re expecting.

Then, in the main query, you’d have to extend or reduce the number of columns accordingly.

I haven’t restricted the number of rows to 12 in my queries above, because my sample data had at most 12 rows. If your requirement was to only display the first (or last!) 12 values of a larger set of data, you’d have to restrict the rows accordingly. I’d simply expose the row_number() as a colum in the “results” subquery and call it something like “rn”, then I’d use “and rn <= 12” in the main query.

Writing “complex” SQL – a how to, pt1

Not so long ago, someone on our work Oracle IM channel asked a question, and then several people were amazed at the solution provided. I’m not sure why; it doesn’t seem so difficult to me, but it strikes me that the practice I’ve had via the SQL & PL/SQL OTN Forum probably has a lot to do with that.

So, anyway, I thought I would start a little series on writing complicated sql, by taking you through the steps I would typically take in order to solve the requirement.

I’ll start with that simple request from my colleague:

I need to display the numbers 1-12 in 3 rows of four columns, like so:

1 2 3 4
5 6 7 8
9 10 11 12

Looking in my SQL toolbox, the very first thing I reach for is how to create the list of numbers from 1 to 12:

with nums as (select level rn
              from   dual
              connect by level <= 12)
select *
from   nums;

which gives us:

   RN
-----
    1
    2
    3
    4
    5
    6
    7
    8
    9
   10
   11
   12

Next thing, I know I need to pivot the results (I should add that I don't have 11g yet, so can't use the nice new functionality!), so now I know I need two things:

  • how to identify the row a number belongs to
  • how to identify the column a number belongs to

Let's take the first item - identifying the row each number belongs to. We know that the numbers 1-4 will go on the first row, 5-8 on the second and 9-12 on the third. I can acheive that by dividing each number by 4 and then take the ceiling value of it, like so:

with nums as (select level rn, ceil(level/4) grp
              from   dual
              connect by level <= 12)
select *
from   nums;

which gives us:

   RN        GRP
----- ----------
    1          1
    2          1
    3          1
    4          1
    5          2
    6          2
    7          2
    8          2
    9          3
   10          3
   11          3
   12          3

Next, I need to work out the column position each number belongs in - that's where the mod() function comes in handy!

with nums as (select level rn, ceil(level/4) grp, mod(level, 4) col_pos
              from   dual
              connect by level <= 12)
select *
from   nums;

which gives us:

   RN        GRP    COL_POS
----- ---------- ----------
    1          1          1
    2          1          2
    3          1          3
    4          1          0
    5          2          1
    6          2          2
    7          2          3
    8          2          0
    9          3          1
   10          3          2
   11          3          3
   12          3          0

So, having identified the row and the column position, we can now use pivoting to display the numbers in the desired format:

with nums as (select level rn, ceil(level/4) grp, mod(level, 4) col_pos
              from   dual
              connect by level <= 12)
select max(case when col_pos = 1 then rn end) col1,
       max(case when col_pos = 2 then rn end) col2,
       max(case when col_pos = 3 then rn end) col3,
       max(case when col_pos = 0 then rn end) col4
from   nums
group by grp;

Giving us:

 COL1       COL2       COL3       COL4
----- ---------- ---------- ----------
    1          2          3          4
    5          6          7          8
    9         10         11         12

Et voilà!

Oracle Tips and Tricks – Part 3: Tabibitosan

Tabibitosan is a technique that I picked up from, guess where? Yup, the OTN Forums! It was Aketi Jyuuzou who first introduced me to the idea, and it’s a very useful technique for a certain set of problems. Aketi’s forum post has lots of information on it, but I just wanted to demonstrate a simple example here, to give you a taste of what you can use it for.

So, suppose we have a set of data, which is unique for id and dt:

        ID DT
---------- ----------
         1 01/01/2011
         1 02/01/2011
         1 03/01/2011
         1 04/01/2011
         1 05/01/2011
         1 10/01/2011
         1 11/01/2011
         1 12/01/2011
         1 14/01/2011
         1 15/01/2011
         1 16/01/2011
         2 01/01/2011
         2 02/01/2011
         2 03/01/2011
         2 04/01/2011
         2 06/01/2011
         2 07/01/2011
         2 11/01/2011
         2 12/01/2011
         2 13/01/2011
         2 28/01/2011
         2 30/01/2011
         2 31/01/2011
         2 01/02/2011

and you want to group the results to find the start and end dates of consecutive rows, eg:

        ID MIN_DT     MAX_DT    
---------- ---------- ----------
         1 01/01/2011 05/01/2011
         1 10/01/2011 12/01/2011
         1 14/01/2011 16/01/2011
         2 01/01/2011 04/01/2011
         2 06/01/2011 07/01/2011
         2 11/01/2011 13/01/2011
         2 28/01/2011 28/01/2011
         2 30/01/2011 01/02/2011

Where do we start? Well, the tabibitosan works by assigning a number to each of the rows in the resultset, either over the whole set or over the partitioned result sets – in our case, we’ll be doing this per each id. Then it assigns a different, consecutive number to the rows.

Because we’re using dates, we’ll label each row by converting those to a number in Julian format. We’ll also use the row_number() analytic function to label each row with a consecutive number for each id in ascending dt order. Finally, we’ll subtract one from the other:

with sample_data as (select 1 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('05/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('10/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('14/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('15/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('16/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('06/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('07/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('13/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('28/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('30/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('31/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('01/02/2011', 'dd/mm/yyyy') dt from dual)
select id,
       dt,
       to_number(to_char(dt, 'j')) main_rn,
       row_number() over (partition by id order by dt) partitioned_rn,
       to_number(to_char(dt, 'j'))
         - row_number() over (partition by id order by dt) grp
from   sample_data;

        ID TO_CHAR(DT,'DD/MM/YYYY')    MAIN_RN PARTITIONED_RN        GRP
---------- ------------------------ ---------- -------------- ----------
         1 01/01/2011                  2455563              1    2455562
         1 02/01/2011                  2455564              2    2455562
         1 03/01/2011                  2455565              3    2455562
         1 04/01/2011                  2455566              4    2455562
         1 05/01/2011                  2455567              5    2455562
         1 10/01/2011                  2455572              6    2455566
         1 11/01/2011                  2455573              7    2455566
         1 12/01/2011                  2455574              8    2455566
         1 14/01/2011                  2455576              9    2455567
         1 15/01/2011                  2455577             10    2455567
         1 16/01/2011                  2455578             11    2455567
         2 01/01/2011                  2455563              1    2455562
         2 02/01/2011                  2455564              2    2455562
         2 03/01/2011                  2455565              3    2455562
         2 04/01/2011                  2455566              4    2455562
         2 06/01/2011                  2455568              5    2455563
         2 07/01/2011                  2455569              6    2455563
         2 11/01/2011                  2455573              7    2455566
         2 12/01/2011                  2455574              8    2455566
         2 13/01/2011                  2455575              9    2455566
         2 28/01/2011                  2455590             10    2455580
         2 30/01/2011                  2455592             11    2455581
         2 31/01/2011                  2455593             12    2455581
         2 01/02/2011                  2455594             13    2455581

You can see that we now have the same number (the grp column) for rows with consecutive dt’s. Each group has a different grp value per each id. This is because as each dt increases by 1, so does the row number. The difference between the two values of that row and the previous row remains the same. As soon as the dt jumps by more than 1, the difference becomes greater, as the row number will only ever increase by 1.

Having identified the numbers that separate the rows into consecutive chunks, we can group the result set using them, to find the min and max dt’s for each set:

with sample_data as (select 1 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('05/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('10/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('14/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('15/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('16/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('06/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('07/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('13/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('28/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('30/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('31/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('01/02/2011', 'dd/mm/yyyy') dt from dual),
     tabibitosan as (select id,
                            dt,
                            to_number(to_char(dt, j))
                              - row_number() over (partition by id order by dt) grp
                     from   sample_data)
select id,
       min(dt) min_dt,
       max(dt) max_dt
from   tabibitosan
group by id, grp
order by id, grp;

        ID MIN_DT     MAX_DT    
---------- ---------- ----------
         1 01/01/2011 05/01/2011
         1 10/01/2011 12/01/2011
         1 14/01/2011 16/01/2011
         2 01/01/2011 04/01/2011
         2 06/01/2011 07/01/2011
         2 11/01/2011 13/01/2011
         2 28/01/2011 28/01/2011
         2 30/01/2011 01/02/2011

Neat, huh?

Whenever I see this kind of “grouping consecutive rows together” problem, the tabibitosan method immediately leaps to mind. Thanks Aketi!

Also, note how I’ve used both analytic functions and subquery factoring in this – both powerful tools in their own right, you can use them as building blocks to come up with queries that may look fairly complicated, but are easy to break down to see what each constituent part is doing. Pack a few tools in your SQL toolkit and you can easily combine them to build complex queries.

Oracle Tips and Tricks – Part 2: Analytic Functions

As Tom Kyte has often said, Analytic functions Rock and Roll!

Back when I first came across them just over 3 years ago, I thought they were the most confusing things ever to have been invented – I just could *not* get my head around the syntax. It wasn’t until I started answering questions at the OTN SQL and PL/SQL forum based on people’s real-life sample data and requirements that I started to make sense of it all.

So, firstly, what is an analytic function and what makes it different to an aggregate function? An aggregate function, such as min() or count(), produces a summary of the rows being worked on. An analytic fuction allows you to access information from other rows within the current row. It does not produce a summary – if you had 5 rows in your table, using an analytic function to find the count of all the rows would give you 5 rows back, whereas the aggregate version of sum would give you just 1.

Eg: With analytics:

SQL> with my_tab as (select 1 col1 from dual union all
  2                  select 2 col1 from dual union all
  3                  select 3 col1 from dual union all
  4                  select 4 col1 from dual union all
  5                  select 5 col1 from dual)
  6  select col1,
  7         sum(col1) over () sum_col1
  8  from   my_tab
  9  /

      COL1   SUM_COL1
---------- ----------
         1         15
         2         15
         3         15
         4         15
         5         15

With aggregate:

SQL> with my_tab as (select 1 col1 from dual union all
  2                  select 2 col1 from dual union all
  3                  select 3 col1 from dual union all
  4                  select 4 col1 from dual union all
  5                  select 5 col1 from dual)
  6  select sum(col1)
  7  from   my_tab;

 SUM(COL1)
----------
        15 

You can see there that both both functions correctly gave the sum of 1 to 5 as 15, but with the analtyic function, you could access each row in the table. This allows you to do such things as “What is the difference between each col1 and the sum of the total?” without having to query the table again:

With analytics:

SQL> with my_tab as (select 1 col1 from dual union all
  2                  select 2 col1 from dual union all
  3                  select 3 col1 from dual union all
  4                  select 4 col1 from dual union all
  5                  select 5 col1 from dual)
  6  select col1,
  7         sum(col1) over () sum_col1,
  8         sum(col1) over () - col1 diff
  9  from   my_tab;

      COL1   SUM_COL1       DIFF
---------- ---------- ----------
         1         15         14
         2         15         13
         3         15         12
         4         15         11
         5         15         10

With aggregate:

SQL> with my_tab as (select 1 col1 from dual union all
  2                  select 2 col1 from dual union all
  3                  select 3 col1 from dual union all
  4                  select 4 col1 from dual union all
  5                  select 5 col1 from dual),
  6       sum_mt as (select sum(col1) sum_col1
  7                  from   my_tab)
  8  select mt.col1,
  9         smt.sum_col1,
 10         smt.sum_col1 - mt.col1 diff
 11  from   my_tab mt,
 12         sum_mt smt;

      COL1   SUM_COL1       DIFF
---------- ---------- ----------
         1         15         14
         2         15         13
         3         15         12
         4         15         11
         5         15         10

I won’t be going over each of the analytic functions – you could always visit the documentation to see what functions there are, but I did want to go through the syntax to try and make it a little easier to understand.

To do this, I am assuming that you, the reader, is familiar with the syntax of aggregate functions.

The basic structure of an analytic function is:

function_name(arguments) over (partition by columns order by columns)

The partition by clause is similar to the group by clause of an aggregate function. It simply says “I want the function to be done on rows partitioned (or grouped) by these columns”.
The order by clause simply tells the analytic function the order it is meant to process the rows within the groups mentioned in the partition by clause (if one is specified). The order of the data makes a great deal of difference to the results of the function, for example, if you wanted a running total. More on this later.

You can have an analytic function without a partition by or an order by clause – this simply says “apply across all the rows in the table at once”.
You can just have the partition by clause – “apply across all the rows in each group at once”.
You can just have the order by clause – “apply across all the rows in the table in this order”.
You can have both partition and order by clause – “apply across the rows in each group in this order”.

Eg.

with my_tab as (select 1 id, 10 value from dual union all
                select 1 id, 20 value from dual union all
                select 1 id, 30 value from dual union all
                select 2 id, 5 value from dual union all
                select 2 id, 15 value from dual union all
                select 3 id, 2 value from dual)
select id,
       value,
       sum(value) over (order by id, value) running_tot_id_asc,
       sum(value) over (order by id desc, value) running_tot_id_desc,
       sum(value) over (partition by id) tot_by_id,
       sum(value) over (partition by id order by value) running_tot_by_id1,
       sum(value) over (partition by id order by value desc) running_tot_by_id2,
       sum(value) over () total_value
from   my_tab;

        ID      VALUE RUNNING_TOT_ID_ASC RUNNING_TOT_ID_DESC  TOT_BY_ID RUNNING_TOT_BY_ID1 RUNNING_TOT_BY_ID2 TOTAL_VALUE
---------- ---------- ------------------ ------------------- ---------- ------------------ ------------------ -----------
         1         10                 10                  32         60                 10                 60          82
         1         20                 30                  52         60                 30                 50          82
         1         30                 60                  82         60                 60                 30          82
         2          5                 65                   7         20                  5                 20          82
         2         15                 80                  22         20                 20                 15          82
         3          2                 82                   2          2                  2                  2          82

6 rows selected.

You can hopefully see the difference that a) the inclusion/exclusion of the partition by / order by clauses affects the results and b) the final ordering of the results has on how the running totals are displayed. In reality, you wouldn’t calculate running totals with different ordering to how the final output would be output, I’ve just done that here for demonstration purposes.

I have used them to answer questions such as:

  • What is the difference between the value in the previous row and in the current row?
  • How can I retrieve rows from table2 that have the exact set of rows in table1 per some join conditions?
  • How can I generate a running sequence over each group of rows in my resultset?
  • etc…

In short, if you’re not familiar with analytic functions yet, you would be wise to research and play around with them. If aggregate functions are the ratchet screwdrivers of the SQL toolbox, then analytic functions are the cordless power screwdrivers!

Oracle Tips and Tricks – Part 1 in a series (I hope!)

Over the past couple of years or so, I’ve been hanging round the OTN SQL and PL/SQL forum, trying to help out where I can. Originally, I started doing this as a way of gaining experience of different “real world” examples, so that I could consolidate my knowledge (such as how to work with analytic functions!) and also expose myself to a lot of different issues. And also, where possible, to help people!

Since then, I’ve learnt much more than I could ever have hoped to, from various tips and tricks that I use often, to things that I haven’t yet needed to use in my day-to-day job. Of this latter group, there are things that I haven’t got the set up or access to try out, but even though I haven’t used them, I *know* that they exist and can refer back to them if I ever do come across a situation that requires them.

I cannot recommend enough to people that if you want to consolidate your SQL and PL/SQL knowledge and improve your skills, start helping out on the OTN SQL and PL/SQL forum!

Anyway, I thought that I would start writing about some of the things that I have picked up from my fellow forum helpers (they know who they are! *waves*) in the hopes that more people will come across them.

Subquery Factoring

Subquery factoring has been around since Oracle 9i, and is an incredibly useful tool. This is one thing that I picked up from the forums that I use at least once every day whilst at work!

In essence, it’s a way of giving each subquery you use within a SELECT statement a name and giving the scope of that subquery to the whole query. Whilst you can name inline views, you have also restricted the scope of that subquery to the bit of the query that it’s in.

The format looks like:

with subq1 as (select ....),
     subq2 as (select ....),
     ...
     subqn as (select ....)
select ...
from   subq1,
       subq2,
       ...
       subqn
where  ....;

You can have one or multiple subqueries defined, and you can also refer to previous subqueries within later subqueries (eg. subq2 could select from subq1, etc). All subqueries must be selected from at least one somewhere in the overall query.

The beauty of subquery factoring is that if you have a subquery repeated throughout a query, like:

select 'first_query' name_col, col1, col3
from   (select col1, col2, sum(col3) col3
        from   a_table)
where  col2 != 'hello'
union all
select 'second_query' name_col, col1, col3
from   (select col1, col2, sum(col3) col3
        from   a_table)
where  col2 = 'hello';

where the subquery gets run twice, when you factor it out, it only gets run once and then accessed twice:

with my_sub_query as (select col1, col2, sum(col3) col3
                      from   a_table)
select 'first_query' name_col, col1, col3
from   my_sub_query
union all
select 'second_query' name_col, col1, col3
from   my_sub_query;

Oracle will quite happily create a Global Temporary Table behind the scenes to hold the data from the subquery if it thinks it is beneficial to do so.

As well as performance benefits, it helps to make the query easier to read by modularising the SQL statement. And as we all know, modularising your code and making it resuable is Good Practice (TM)!

As well as all those benefits, subquery factoring has become very popular as a means of simulating a table with data, like so:

with my_tab as (select 1 col1, 'A' col2, 'hi' col3 from dual union all
                select 2 col1, 'B' col2, 'bye' col3 from dual)
select *
from   my_tab;

This comes in very, very handy when needing to set up test cases that other people can run, as there’s no longer a need for them to create a test table – the data’s already there!

Having (your cake and eating it!)

I tried to help someone out today on a Bulletin Board System that I’ve been on for years.

Their question was:

Here is my dataset:

ID, Flag
1 1
1 0
1 0
2 1
2 1
2 1
3 0
3 0
3 0

Basically I want a list of IDs where Flag is ONLY 1, so in this case, 2.

Currently doing

select distinct id from table where id not in
  (select id from table where flag=0)
and  flag=1;

Seems overkill. Is there a better way?

I proposed:

select id
from   (select id, count(id) flag_count, sum(decode(flag, 1, 1, 0)) flag1_count
        from   my_tab
        group by id)
where  flag_count = flag1_count;

Which as any fule knows is exactly what the HAVING clause was meant to do:

select id
from   my_tab
group by id
having count(id) = sum(decode(flag, 1, 1, 0));

I seem to have a blind spot for this when it comes time to generating the SQL – sure, I know about the HAVING clause, but remembering it when I need it is an entirely different matter!

Sheesh!