The SQL WITH Clause, (Order Your Thoughts, Reuse Your Code)

Kevin Meade's picture
articles: 

So here I am writing even more documentation for my current Gig, and thinking once again, why not post it to OraFAQ and get double duty out of the document. So here is a discussion of the WITH clause that comes with the SELECT statement now. It is easy to use, and handy as all get out, but seems many people have not yet noticed it. Hmm... a SELECT statement that does not start with SELECT.

I like examples as a learning tools, so lets start off with some seemingly silly code.

col dummy format a10

select dummy from dual
/

DUMMY
----------
X

1 row selected.

So where is this going?. Check this out.

WITH
   stupid_is_as_stupid_does as (
                                select dummy from dual
                               )
select *
from stupid_is_as_stupid_does
/

DUMMY
----------
X

1 row selected.

What just happened? Conceptually you can think of it like this: we created a temporary table called STUPID_IS_AS_STUPID_DOES and then selected from it. I say, conceptually you can think of it like this because although Oracle could use temporary storage as a place to keep rows from our “named” SELECT statement, it usually does not do that but instead merges SQL before executing anything. But that just confuses what we want to know here which is, what the WITH clause does for us.

So what we are saying is, conceptually, the WITH clause allows us to give names to predefined SELECT statements inside the context of a larger SELECT statement. We can then reference the NAMED SELECT statements later. Now consider this next sequence of SELECT statements:

with
   temp_t1 as (
               select dummy c1 from dual
              )
select *
from temp_t1 a
    ,temp_t1 b
/

C C
- -
X X

1 row selected.

with
   temp_t1 as (
               select dummy c1 from dual
              )
  ,temp_t2 as (
               select dummy c1 from dual
              )
select *
from temp_t1 a
    ,temp_t2 b
/

C C
- -
X X

1 row selected.

with
   temp_t1 as (
               select dummy c1 from dual
              )
  ,temp_t2 as (
               select dummy c1 from dual a,temp_t1 b where b.c1 = a.dummy
              )
select *
from temp_t2 a
/

C
-
X

1 row selected.

SQL> select *
  2  from temp_t2 a
  3  /
from temp_t2 a
     *
ERROR at line 2:
ORA-00942: table or view does not exist

SQL>

From these four SELECT statements we learn that the WITH clause lets us do the following:

1) we can reference a named query any number of times

2) we can create any number of named queries

3) named queries can reference other named queries that came before them and even correlate to previous named queries

4) named queries are only good for the SELECT statement that names them, their scope is local to the SELECT in which they are defined, hence no sharing across statements.

So what does the WITH clause really do for us? Well, there are two main benefits.

The most obvious thing one gets from use of the WITH clause is the ability to construct reusable components inside a SELECT. We can give a name to a set of rows (no matter how complicated the SELECT is to build them), and then use the name of the NAMED SELECT, anywhere we would have otherwise had to duplicate the text we named. The examples above show this in action. But as nice as it is to be able to reduce the amount of code one has to write, there is still a much more important reason to use the WITH clause, LOGICAL THINKING.

A better way of using the WITH clause is as a mechanism for exposing the thought process behind a SELECT. These days people are writing some pretty fancy stuff; long queries with many parts. Interestingly enough, it is typical for large queries like this to be constructed by a thought process where in the pieces of the query build upon each other in a stepwise fashion. Unfortunately large SQL written this way can be difficult to parse mentally. WITH clause to the rescue.

To see how the WITH clause can help in this situation consider the following scenario. You are building a data warehouse, and you want to create what is called a DENSE, MONTHLY INCEPTION TO DATE SUMMARY FACT. The data below will show you what this is:

This data from our daily fact table:

CALENDAR_DATE          CLAIM_ID AMOUNT_PAID
-------------------- ---------- -----------
08-jan-2007 00:00:00          1         100
11-jan-2007 00:00:00          1         100
23-feb-2007 00:00:00          1         100
14-jun-2007 00:00:00          1         100

4 rows selected.

Becomes this data in a DENSE, MONTHLY INCEPTION TO DATE SUMMARY FACT:

FILLED_CALENDAR_MONT   CLAIM_ID ITD_AMOUNT_PAID
-------------------- ---------- ---------------
01-jan-2007 00:00:00          1             200
01-feb-2007 00:00:00          1             300
01-mar-2007 00:00:00          1             300
01-apr-2007 00:00:00          1             300
01-may-2007 00:00:00          1             300
01-jun-2007 00:00:00          1             400

6 rows selected.

To get from point A to point B, the data had to be summed, then analytically summed, then filled in with a densification step. Here is the statement that takes the first dataset and produces the second dataset.

with
   simple_sum as (
                  select trunc(calendar_date,'mm') calendar_month_date
                        ,claim_id
                        ,sum(amount_paid) amount_paid
                  from temp_claim_history
                  group by
                           trunc(calendar_date,'mm')
                          ,claim_id
                 )
  ,end_month as (
                 select trunc(sysdate,'mm') calendar_month_date
                 from dual
                )
  ,itd_sum as (
               Select simple_sum.calendar_month_date
                     ,simple_sum.claim_id
                     ,simple_sum.amount_paid
                     ,sum(simple_sum.amount_paid)
                              over (
                                    partition by simple_sum.claim_id
                                    order by simple_sum.calendar_month_date
                                   ) itd_amount_paid
                     ,nvl(
                          lead(simple_sum.calendar_month_date)
                              over (
                                    partition by simple_sum.claim_id
                                    order by simple_sum.calendar_month_date
                                   )
                         ,add_months(end_month.calendar_month_date,1)
                         ) next_calendar_month_date
               from simple_sum
                   ,end_month
              )
  ,month_list as (
                  select
                         add_months(
                                     trunc(
                                           to_date('2010'
                                                  ,'rrrr'
                                                  )
                                          ,'rrrr'
                                          )
                                   ,-rownum+1
                                   ) some_month_date
                  from dual
                  connect by level <= 240
                 )
  ,dense_itd_summary as (
                         select itd_sum.*
                               ,month_list.some_month_date filled_calendar_month_date
                         from itd_sum
                             ,month_list
                         where month_list.some_month_date >= itd_sum.calendar_month_date
                         and month_list.some_month_date < itd_sum.next_calendar_month_date
                        )
select
       filled_calendar_month_date
      ,claim_id
      ,itd_amount_paid
from dense_itd_summary
order by
         claim_id
        ,filled_calendar_month_date
/

OK, this might seem a little intimidating at first but remember we said that real power of the WITH clause was in its ability to let you expose the thought process underneath your SELECT. Clearly this SELECT is making heavy use of the WITH clause. The WITH clause achieves this magic by letting you create your SELECT in steps.

We can think of the processing done by this SELECT as happening from top down. Each NAMED SELECT will transform the data that came before it in some way. All we need to do is look at the data output by each step, to see what is happening, and this is easily done by running the pieces of the SELECT one WITH clause at a time in an incremental fashion. You will find this makes it easy to debug your queries. So let us do that. Let us execute WITH clauses in pieces to see what they do.

Starting with this data:

SQL> select *
  2  from temp_claim_history
  3  /

CALENDAR_DATE          CLAIM_ID AMOUNT_PAID
-------------------- ---------- -----------
08-jan-2007 00:00:00          1         100
11-jan-2007 00:00:00          1         100
23-feb-2007 00:00:00          1         100
14-jun-2007 00:00:00          1         100

4 rows selected.

We do an ordinary GROUP BY month.

SQL> with
  2     simple_sum as (
  3                    select
  4                           trunc(calendar_date,'mm') calendar_month_date
  5                          ,claim_id
  6                          ,sum(amount_paid) amount_paid
  7                    from temp_claim_history
  8                    group by
  9                             trunc(calendar_date,'mm')
 10                            ,claim_id
 11                   )
 12  select *
 13  from simple_sum
 14  /

CALENDAR_MONTH_DATE    CLAIM_ID AMOUNT_PAID
-------------------- ---------- -----------
01-jan-2007 00:00:00          1         200
01-feb-2007 00:00:00          1         100
01-jun-2007 00:00:00          1         100

3 rows selected.

No rocket science there. All we did was group rows from the same month together. This creates a simple sum. We see next a WITH clause that grabs an EPOCH date. In this case the date represents the last month for which we want to create rows in our final DENSE SUMMARY. WE are going to need this in the step that follows.

SQL> with
  2     end_month as (
  3                   select trunc(sysdate,'mm') calendar_month_date
  4                   from dual
  5                  )
  6  select *
  7  from end_month
  8  /

CALENDAR_MONTH_DATE
--------------------
01-jun-2007 00:00:00

1 row selected.

Next we roll up the simple sum into a running sum using Oracle Analytics. Notice we include on these rows a second analytic that looks to the next row in its group to see the next month with real data. We are going to use that in a minute.

SQL> with
  2     simple_sum as (
  3                    select
  4                           trunc(calendar_date,'mm') calendar_month_date
  5                          ,claim_id
  6                          ,sum(amount_paid) amount_paid
  7                    from temp_claim_history
  8                    group by
  9                             trunc(calendar_date,'mm')
 10                            ,claim_id
 11                   )
 12    ,end_month as (
 13                   select trunc(sysdate,'mm') calendar_month_date
 14                   from dual
 15                  )
 16    ,itd_sum as (
 17                 select
 18                        simple_sum.calendar_month_date
 19                       ,simple_sum.claim_id
 20                       ,simple_sum.amount_paid
 21                       ,sum(simple_sum.amount_paid)
 22                                over (
 23                                      partition by simple_sum.claim_id
 24                                      order by simple_sum.calendar_month_date
 25                                     ) itd_amount_paid
 26                       ,nvl(
 27                            lead(simple_sum.calendar_month_date)
 28                                over (
 29                                      partition by simple_sum.claim_id
 30                                      order by simple_sum.calendar_month_date
 31                                     )
 32                           ,add_months(end_month.calendar_month_date,1)
 33                           ) next_calendar_month_date
 34                 from simple_sum
 35                     ,end_month
 36                )
 37  select *
 38  from itd_sum
 39  /

CALENDAR_MONTH_DATE    CLAIM_ID AMOUNT_PAID ITD_AMOUNT_PAID NEXT_CALENDAR_MONTH_
-------------------- ---------- ----------- --------------- --------------------
01-jan-2007 00:00:00          1         200             200 01-feb-2007 00:00:00
01-feb-2007 00:00:00          1         100             300 01-jun-2007 00:00:00
01-jun-2007 00:00:00          1         100             400 01-jul-2007 00:00:00

3 rows selected.

Yes, this ITD_AMOUNT_PAID is a running total, and the NEXT_CALENDAR_MONTH_DATE is the next date for which there is real data. Please notice that in the case of the last row, we defaulted the NEXT_CALENDAR_MONTH_DATE to be the end month + 1. Each last row in a group has no next row (doh!) and therefore no NEXT_CALENDAR_MONTH_DATE, so we took this default so that the date math we are going to do next does not deal with nulls.

In the end you recall we wanted to create a dense summary. Notice in the data above, there are no rows for MARCH, APRIL, or MAY. In a dense summary there would be. So we have to create one for each of these months. We can do this by copying the FEB row and changing its date. For example: if someone asked this question “what is the ITD_AMOUNT_PAID for march”, we would have to do a nearest neighbor lookup best fit low lookup (also known as the BOB BARKER LOOKUP) and show them FEB numbers, but with a MAR date. The reason for a DENSE summary is to skip the BOB BARKER LOOKUP by making sure there is always a row for the month requested. Thus we need to duplicate the FEB row for each month that is missing up to the next real row. This is what we call a fill. We are going to fill in missing periods with a copy of the BOB BARKER row of the missing period, but with the date that corresponds to the period we are filling in. In this case this means copying the FEB row to MAR, APR, MAY and tagging each copy with the appropriate date.

But to do this we first need a list of months so lets make a list of months.

SQL> with
  2     month_list as (
  3                    select
  4                           add_months(trunc(to_date('2010','rrrr'),'rrrr'),-rownum+1) some_month_date
  5                    from dual
  6                    connect by level <= 240
  7                   )
  8  select *
  9  from month_list
 10  /

SOME_MONTH_DATE
--------------------
01-jan-2010 00:00:00
01-dec-2009 00:00:00
01-nov-2009 00:00:00
…
01-feb-1990 00:00:00

240 rows selected.

You can see we used the connect by trick to generate two hundred and forty rows with which using JAN of 2010 as an anchor, and a little date math, we can create a list of month dates. Now we can join to this list of dates to DENSIFY our data.

SQL> with
  2     simple_sum as (
  3                    select
  4                           trunc(calendar_date,'mm') calendar_month_date
  5                          ,claim_id
  6                          ,sum(amount_paid) amount_paid
  7                    from temp_claim_history
  8                    group by
  9                             trunc(calendar_date,'mm')
 10                            ,claim_id
 11                   )
 12    ,end_month as (
 13                   select trunc(sysdate,'mm') calendar_month_date
 14                   from dual
 15                  )
 16    ,itd_sum as (
 17                 select
 18                        simple_sum.calendar_month_date
 19                       ,simple_sum.claim_id
 20                       ,simple_sum.amount_paid
 21                       ,sum(simple_sum.amount_paid)
 22                                over (
 23                                      partition by simple_sum.claim_id
 24                                      order by simple_sum.calendar_month_date
 25                                     ) itd_amount_paid
 26                       ,nvl(
 27                            lead(simple_sum.calendar_month_date)
 28                                over (
 29                                      partition by simple_sum.claim_id
 30                                      order by simple_sum.calendar_month_date
 31                                     )
 32                           ,add_months(end_month.calendar_month_date,1)
 33                           ) next_calendar_month_date
 34                 from simple_sum
 35                     ,end_month
 36                )
 37    ,month_list as (
 38                    select
 39                           add_months(trunc(to_date('2010','rrrr'),'rrrr'),-rownum+1) some_month_date
 40                    from dual
 41                    connect by level <= 240
 42                   )
 43    ,dense_itd_summary as (
 44                           select
 45                                  itd_sum.*
 46                                 ,month_list.some_month_date filled_calendar_month_date
 47                           from itd_sum
 48                               ,month_list
 49                           where month_list.some_month_date >= itd_sum.calendar_month_date
 50                           and month_list.some_month_date < itd_sum.next_calendar_month_date
 51                          )
 52  select
 53         filled_calendar_month_date
 54        ,claim_id
 55        ,itd_amount_paid
 56  from dense_itd_summary
 57  order by
 58           claim_id
 59          ,filled_calendar_month_date
 60  /

FILLED_CALENDAR_MONT   CLAIM_ID ITD_AMOUNT_PAID
-------------------- ---------- ---------------
01-jan-2007 00:00:00          1             200
01-feb-2007 00:00:00          1             300
01-mar-2007 00:00:00          1             300
01-apr-2007 00:00:00          1             300
01-may-2007 00:00:00          1             300
01-jun-2007 00:00:00          1             400

6 rows selected.

And there you have it. A somewhat sophisticated process. But using the WITH clause, we can easily break it down into simple steps, each of which is a data oriented transform. How wonderful it is that we can get away from a row by row (or should I say slow by slow) perspective on manipulating data to get a result like this, and instead think of it all in terms of sets of rows, and how we add to or change one row set in order to create the next.

You will find a process like this is in amazingly fast. After a little practice you will relish using the WITH clause, and eventually I hope find the way you write code changing to exploit a data set oriented approach to transforming information. Thanks WITH clause.

Kevin

Comments

Completely off-topic - I know - but another (less intuitive?) way to obtain a dense fact is to use Partitioned Outer Joins

We have been working with the WITH clause ... but I didn't really understand it, was just following the pattern we had found and developed ... I REALLY REALLY liked your "See Spot! See Spot run!" example ... gave me an "Aha!" moment.

Thanks for the time it took to type it ...
Suzanne

Thanks Kevin :-)

Hi Kevin,

The article was very informative. However, I have a doubt regarding the utility of WITH clause with DML statements like UPDATE or INSERT?

For example, say I create a named query for a huge query that I have written.

WITH namedQuery
AS
(
SELECT row_number,a,b,c............ from table ta,tb,tc,td
)

Now, I have to join the namedQuery with the same dataset
i.e. select * from namedQuery, namedQuery n2 where
namedQuery.row_number=namedQuery.rownumber+1;

And then I want to insert the dataset created above into a table.

INSERT INTO newTable
select * from namedQuery, namedQuery n2 where
namedQuery.row_number=namedQuery.rownumber+1;

Unfortunately, this doesn't work and gives errors. Could you please help me with this? If WITH clause has the restriction, do we have any other way of doing it?

Thanks,
Pratyush

Pratyush,

You don't specify your entire statement, but I would guess that it is:

WITH ...
INSERT into ...

WITH is an optional prologue to a SELECT, so it can't be used directly with INSERT, UPDATE, DELETE, etc. However, all of these can be written to use a SELECT in place of a table, and wherever a SELECT can appear, a WITH-SELECT can be used. If you get syntax errors, try wrapping it in another SELECT; i.e., SELECT * from (WITH... SELECT ...).

You should be able to write your statement as:

INSERT into newTable
SELECT *
FROM (
WITH namedQuery
AS
(
SELECT row_number,a,b,c............ from table ta,tb,tc,td
)
select * from namedQuery, namedQuery n2 where
namedQuery.row_number=namedQuery.rownumber+1
)
;

Russ

Hi Kevin,

Firstly, my congratulations for your excellent article. Very useful indeed.

We've been working with "with" clause for a long time and our queries are pretty big by now. Some of them go beyond one thousand lines. The problem is when we need to make small adjustments and we get this ugly ORA-32035 error. For example:

SQL> with
2 temp_t1 as (
3 select dummy c1 from dual
4 )
5 ,temp_t2 as (
6 select dummy c1 from dual
7 )
8 select *
9 from temp_t1
10 /
from temp_t1
*
ERROR at line 9:
ORA-32035: unreferenced query name defined in WITH clause

Would you have any piece of magic to debug enormous queries having WITH clause?

Thanks,

Eduardo Morelli

Change the statement "select * from temp_t1" to at least "select * from temp_t1,temp_t2".

i.e., You've created 2 queries with the "with cluase" temp_t1 & temp_t2
But you are referencin only the temp_t1 in the select statement not temp_t2.

Basic funda is you should be referencing all the views declared with the "with cluase".

Regards,
Vishwa.

Hi kevin,

By any chance, Can this "with cluase" boost performance.?

1.
with emp_t as
(select empno from emp)
select x from dual,emp_t
union
select x from dual,emp_t

2.
select x from dual,emp
union
select x from dual,emp

How does performance differs for the above statements?

Regards,
Vishwa

Kevin Meade's picture

Oracle almost never excutes the query you give it.

Oracle takes the query you give it and trys every time to re-write. It looks for ways to simplify the query by removing reducnancies, and it looks for ways to add extra information to open up access paths, and it considers how to minimize intermediary rowset sizes as it goes, and lots of other stuff. What that means is this and all three of these queries and the same:

with
      emp_t as (select * from emp)
select * from emp_t

select * from (select * from emp) emp_t

select * from emp emp_t

regardless of which variation you give to Oracke, Oracle will re-write the first two versions of the query into this last version.

With this understanding in mind, it should be clear that if Oracle did its re-write as we expect, the query plans and performance should be exaclty the same because in the end, all three queries will be executed the same because the all become that same query.

Hope this helps, Kevin

Hi Kevin,

Example:
with XX_orders as
(select * from order_detail where supplier='XX')
select item_desc from item_master a,XX_orders b
where a.item=b.item
union
select item_desc from item_master_new a,XX_orders b
where a.item=b.item

In the above case,It makes sense to scan the order_detail only once and join the result set (storing it in temp table) with the other tables rather than scanning the table order_detail twice with each query it is referenced.

Doesn't it make sense?

Can with clause do this..? if no is there any way can this be done in oracle(by oracle itself or by developers)?


Many Thanks
Vishwa...!

Actually, from a practical perspective, we've found that the use of the WITH clause can produce a significant performance boost. From what I've read, it relates to the fact that later versions of Oracle materialize (effectively, run once and then hold in memory) the subqueries identified in the WITH clause. If the queries are nestled securely in the FROM portion of the SQL there appears to be the possibility that they can be run multiple times. If there is any significant return time required for those subqueries, and for many of ours there is, that time can pile up massively over a large dataset. Simply moving the subqueries up to the WITH clause eliminates all of that reduplicative running and treats those subqueries as if they were standing tables. We've seen runtimes reduced on some of our more massive SQLs by orders of magnitude. Now granted, this is mostly anecdotal, as I've not run the updates through tracing or analyzing to examine exactly what Oracle is doing differently, but it is definitely doing something differently, and that something can be loads faster.

Thanks.

David H.