Home › Monthly Archives › January 2015

Moving from Procedural to Set-Based Thinking

Most people who are new to working with databases tend to think procedurally. That is: I have a table that I need to update, so:

  • Find the first row
  • Update the relevant columns with the relevant values
  • Find the next row
  • Update the relevant columns with the relevant values
  • Find the last row
  • Update the relevant columns with the relevant values

People who are familiar with databases and set-based thinking would simply do:

  • Update all the rows in the table, setting the relevant columns to the relevant values

I think you’ll agree that this set-based (i.e. taking the whole set of data and then applying the changes in one go) is a much neater approach than the procedural, step-by-step approach! Also, it will be faster, since you don’t have to spend time context switching between – in the case of Oracle – the PL/SQL and SQL “engines”.

Having helped out on the SQL and PL/SQL Oracle forum, I’ve seen lots of different people ask questions around how to fix a syntax error in their horrendously procedural, row-by-row (aka slow-by-slow, or in SQL Server land, rbar (row-by-agonising-row) code, only for people (myself included!) to say “don’t do that, do it like this: <set-based code>”.

I remember having difficulty with the concept of set-based when I first started working with PL/SQL way back when, so I thought I’d put together a bit of a guide as to how to spot when you can use set-based processing and, more importantly, how to convert from the procedural code.

Remember, as the great Tom Kyte says, “you should do [the work] in a single SQL statement if at all possible”.

Spotting Procedural Code

In PL/SQL, the number one flag for procedural code that can be converted to set-based code is: code looping through a cursor (yes, even if you’re taking advantage of bulk processing (BULK COLLECT/FORALL)!).

It’s not always possible to convert procedural code to set-based, of course – if the work you’re doing against each row is not DML on another table (e.g. you’re sending a mail, or writing out to a file), then there’s nothing you can do to make it set-based. (There are other options that may be available to you in this scenario, but that’s not the subject of this post.)

Ok, so I’ve found some procedural code… what next?

Let’s take an example that popped up recently on the forum. Rewriting to remove the poster’s syntax and logic errors, here’s the procedure they intended to end up with:

create or replace procedure valid_proc (start_date in date,
                                        end_date in date)  
is  
  cursor c1 is   
    select *
    from emp  
    where hiredate between start_date and end_date;
begin
  for c_row in c1
  loop
    insert into emp_back (empno,
                          ename,
                          job,
                          mgr,
                          hiredate,
                          sal,
                          comm,
                          deptno)  
    values (c_row.empno,
            c_row.ename,
            c_row.job,
            c_row.hiredate,
            c_row.sal,
            c_row.comm,
            c_row.deptno);  
  end loop;
  commit;
end;  
/

The poster started off by asking what was wrong with his code, and immediately, what leapt to my mind was: “You’re using row-by-row processing! Why not do it in a single insert?” If you look through the post to reply #16, you can see that BluShadow had the same thought and has provided the solution, which I present, slightly reformatted, here:

create or replace procedure valid_proc(start_date in date,
                                       end_date in date) as
begin
  insert into emp_back (empno,
                        ename,
                        job,
                        mgr,
                        hiredate,
                        sal,
                        comm,
                        deptno)
  select empno,
         ename,
         job,
         mgr,
         hiredate,
         sal,
         comm,
         deptno
  from  emp
  where  hiredate between start_date and end_date;

  commit; -- if appropriate to business/transaction logic
end; 
/

The steps to get from the original, row-by-row procedure to the set-based procedure are:

  1. Look at the cursor
  2. Look at the insert … values () statement
  3. Think “all the rows being selected in the cursor are being inserted by the insert statement – this calls for an insert-as-select statement!”
  4. Make sure the select statement in the cursor lists the columns being inserted by the insert statement
  5. Remove the values() clause of the insert statement and replace with the amended select statement from the cursor

For those who are new to set-based thinking, I believe that step 3 is the hardest to grasp – part of it is familiarity with INSERT / UPDATE / DELETE / MERGE statements, but part of it is experience (which following the SQL and PL/SQL forum will help you get!)

But my procedure is much more complicated than that! I laugh at your single loop and give you my code with nested cursor loops!

First off, congratulations for reinventing the wheel – you have succeeded in replicating the NESTED LOOP join. Did you really intend to do that? If so, you’ve successfully hobbled Oracle’s optimizer to the possible detriment of your code (what if one of the other types of join condition was more efficient?). Whoops…!

Let’s look at a simple example of this type of processing:

declare 
  cursor c1 is select col_id, col_name from temp1; 
  cursor c2 is select col_id, col_name from temp2; 
begin 
  for i in c1 loop 
    for j in c2 loop 
      if i.col_name != j.col_name then 
        insert into temp1 values(j.col_id,j.col_name); 
      end if; 
    end loop; 
  end loop; 
  commit;
end;
/

The above code can be read as:

  1. For each row in cursor c1, find all rows in cursor c2 where the col_name in c1 doesn’t equal the col_name in c2
  2. Then insert the col_id and col_name from c2 into the temp1 table

Step 1 is basically describing a join between the two cursors, and we know that a join can be written in a single SQL statement:

select c1.col_id c1_col_id,
       c1.col_name c1_col_name,
       c2.col_id c2_col_id,
       c2.col_name c2_col_name
from   temp1 c1,
       temp2 c2
where  c1.col_name != c2.col_name;

Step 2 just inserts all the rows that we just selected, so the final statement becomes:

insert into temp1 (col_id, col_name)
select c2.col_id c2_col_id,
       c2.col_name c2_col_name
from   temp1 c1,
       temp2 c2
where  c1.col_name != c2.col_name;

If you need to, you can then slot the above statement into a PL/SQL procedure.

In reality, most of the code that I’ve seen involving nested cursor loops is not as simple as the above example, but with a bit of patience, it’s possible to untangle the nested cursors and, at the very least, turn it into a single cursor. If you can’t merge the cursor and the code that’s doing the work into a single SQL statement (e.g. sending a mail), at least you will have improved the performance by reducing the code to a single cursor loop!

Okay, so now I know a bit about refactoring existing procedure, what about if I have to write my own set-based procedure from scratch?

The very first step that I do is to try and build a SQL statement that matches the requirements. Once you have that (which you can build up in stages, depending on how complex the logic is), you can then use it to plug into an INSERT, UPDATE, MERGE or DELETE statement. You may have to modify the SELECT to fit in with the final statement, but once you have that base statement, you’re 90% of the way there.

The trick is to work out how to write that initial SELECT statement, which may involve aggregated queries, joins, analytic functions, etc. If you’re not very comfortable with SQL, then I suggest you invest some time in learning more about the various things you can do in SQL – it’s a very powerful language that will be faster at getting results from the database than anything you could write yourself in PL/SQL.