SQL Server 2005 introduced the APPLY operator, which is very much like a join clause
and which allows joining between two table expressions i.e. joining a
left/outer table expression with a right/inner table expression. The
difference between join and APPLY operator becomes evident when you have
a table-valued expression on the right side and you want this
table-valued expression to be evaluated for each row from the left table
expression. In this tip I am going to demonstrate what APPLY operator
is, how it differs from regular JOINs and what are few of its applications.
Solution
The APPLY operator allows you to join two table
expressions; the right table expression is processed every time for each
row from the left table expression. As you might have guessed, the left
table expression is evaluated first and then right table expression is
evaluated against each row of the left table expression for final
result-set. The final result-set contains all the selected columns from
the left table expression followed by all the columns of right table
expression.
The APPLY operator comes in two variants, the CROSS
APPLY and the OUTER APPLY. The CROSS APPLY operator returns only those
rows from left table expression (in its final output) if it matches with
right table expression. In other words, the right table expression
returns rows for left table expression match only. Whereas the OUTER
APPLY operator returns all the rows from left table expression
irrespective of its match with the right table expression. For those
rows for which there are no corresponding matches in right table
expression, it contains NULL values in columns of right table
expression. So you might now conclude, the CROSS APPLY is semantically
equivalent to INNER JOIN (or to be more precise its like a CROSS JOIN
with a correlated sub-query) with a implicit join condition of 1=1
whereas OUTER APPLY is semantically equivalent to LEFT OUTER JOIN.
You might be wondering if the same can be achieved with
regular JOIN clause then why and when to use APPLY operator? Though the
same can be achieved with normal JOIN,
the need of APPLY arises if you have table-valued expression on right
part and also in some cases use of APPLY operator boost the performance of your query. Let me explain you with help of some examples.
Script #1 creates a Department table to hold information
about departments. Then it creates an Employee table which hold
information about the employees. Please note, each employee belongs to a
department, hence the Employee table has referential integrity with the
Department table.
Script #1 - Creating some temporary objects to work on...
|
USE [tempdb] |
First query in Script #2 selects data from Department
table and uses CROSS APPLY to evaluate the Employee table for each
record of the Department table. Second query simply joins the Department
table with the Employee table and all the matching records are
produced.
Script #2 - CROSS APPLY and INNER JOIN
|
SELECT * FROM Department D CROSS APPLY
|
If you look at the results they produced, it is the
exact same result-set; not only that even the execution plan for these
queries are similar to each other and has equal query cost, as you can
see in the image below. So what is the use of APPLY operator? How does
it differ from a JOIN and how does it help in writing more efficient
queries. I will discuss this later, but first let me show you an example
of OUTER APPLY also.
The first query in Script #3 selects data from
Department table and uses OUTER APPLY to evaluate the Employee table for
each record of the Department table. For those rows for which there is
not a match in Employee table, those rows contains NULL values as you
can see in case of row 5 and 6. The second query simply uses a LEFT
OUTER JOIN between the Department table and the Employee table. As
expected the query returns all rows from Department table; even for
those rows for which there is no match in the Employee table.
Script #3 - OUTER APPLY and LEFT OUTER JOIN
|
SELECT * FROM Department D OUTER APPLY |
Even though the above two queries return the same
information, the execution plan is a bit different. Although cost wise
there is not much difference, the query with the OUTER APPLY uses a
Compute Scalar operator (which has an estimated operator cost of
0.0000103 or almost 0% of total query cost) before Nested Loops operator
to evaluate and produce the columns of Employee table.
Now comes the time to see where the APPLY operator is
really required. In Script #4, I am creating a table-valued function
which accepts DepartmentID as its parameter and returns all the
employees who belong to this department. The next query selects data
from Department table and uses CROSS APPLY to join with the function we
created. It passes the DepartmentID for each row from the outer table
expression (in our case Department table) and evaluates the function for
each row similar to a correlated subquery.
The next query uses the OUTER APPLY in place of CROSS APPLY and hence
unlike CROSS APPLY which returned only correlated data, the OUTER APPLY
returns non-correlated data as well, placing NULLs into the missing
columns.
Script #4 - APPLY with table-valued function
|
IF EXISTS (SELECT * FROM sys.objects WHERE OBJECT_ID = OBJECT_ID(N'[fn_GetAllEmployeeOfADepartment]') AND type IN (N'IF')) BEGIN
|
So now if you are wondering, can we use a simple join
in place of the above queries? Then the answer is NO, if you replace
CROSS/OUTER APPLY in the above queries with INNER JOIN/LEFT OUTER JOIN,
specify ON clause (something as 1=1) and run the query, you will get
"The multi-part identifier "D.DepartmentID" could not be bound." error.
This is because with JOINs the execution context of outer query is
different from the execution context of the function (or a derived table),
and you can not bind a value/variable from the outer query to the
function as a parameter. Hence the APPLY operator is required for such
queries.
So in summary the APPLY operator is required when you
have to use table-valued function in the query, but it can also be used
with an inline SELECT statements.
Now let me show you another query with a Dynamic Management Function (DMF).
Script #5 returns all the currently executing user queries except for
the queries being executed by the current session. As you can see the
script below, the sys.dm_exec_requests dynamic management view is being CROSS APPLY'ed with the sys.dm_exec_sql_text
dynamic management function which accepts a "plan handle" for the query
and the same "plan handle" is being passed from the left/outer
expression to the function to work and to return the data.
Script #5 - APPLY with Dynamic Management Function (DMF)
|
USE master |
Please note the [text] column in the above query returns
the all queries submitted in a batch; if you want to see only active
(currently executing) query you can use statement_start_offset and
statement_end_offset columns to trim the active part of the query. Tim
Ford has provided a very good explanation of usage of these columns in
his How to isolate the current running commands in SQL Server tip.
As I told you before there are certain scenarios where a
query with APPLY operator performs better than a query with regular
joins but I am not going to delve into much details rather here are some
articles which discuss this topic in greater details.
Please note, APPLY operator is not an ANSI operator but
rather an extension of SQL Server T-SQL (available in SQL Server 2005
and above), so if you plan to port your database to some other DBMS take
this into your considerations.
No comments:
Post a Comment