As some of you may know, I used to be the Product Development Director for Hotsos. Part of my job entailed work the Laredo product, which helps identify potentials performance problem areas when making changes to your database structure, indexes, statistics, init parameters, etc.
Recently I was called back in on a particularly sticky problem that one of their customers was having in using Laredo. Without going into a long and arduous explanation, the problem boiled down to the fact that Laredo was using DBMS_XPLAN to emit formatted explain plans to the user and they were coming out wrong.
I suppose you would have expected me to learn my lesson by now, but I kind of expect the Oracle documentation to be correct for the most part. So when we looked up DBMS_XPLAN.DISPLAY in the PL/SQL Packages and Types Reference for 10g, I kind of expected to be able to take the code example and use it verbatim.
Here is what it said:
This table function displays the contents of the plan table.
In addition, you can use this table function to display any plan (with or without statistics) stored in a table as long as the columns of this table are named the same as columns of the plan table (or V$SQL_PLAN_STATISTICS_ALL if statistics are included). You can apply a predicate on the specified table to select rows of the plan to display.
DBMS_XPLAN.DISPLAY( table_name IN VARCHAR2 DEFAULT ‘PLAN_TABLE’, statement_id IN VARCHAR2 DEFAULT NULL, format IN VARCHAR2 DEFAULT ‘TYPICAL’, filter_preds IN VARCHAR2 DEFAULT NULL);
Specifies the statement_id of the plan to be displayed. This parameter defaults to NULL, which is the default when the EXPLAIN PLAN command is executed without a set statement_id clause.If no statement_id is specified, the function will show you the plan of the most recent explained statement.
And even came with an example that showed how to use it:
To display the plan for a statement identified by ‘foo’, such as statement_id=’foo’:
SELECT * FROM table (DBMS_XPLAN.DISPLAY(‘plan_table’, ‘foo’));
So, when we coded a statement that went into the product that looked like the following, I would have expected to be shown the explain plan output for the correct statement.
select * from table(DBMS_XPLAN.DISPLAY(‘LAR_SCENARIO_PLAN_VW’,’82-2476-34’, ‘BASIC’))
But instead what we got was THIS:
Notice that we get two “0” IDs and two “1” IDs. And they are OBVIOUSLY from two different statements! WHAT THE….
Now I know I specifically told it to get me only the the plan where “82-2476-34” was that STATEMENT_ID. Did I get the ordinal position of the parameters wrong? I tried again with the following statement:
select * from table(DBMS_XPLAN.DISPLAY(TABLE_NAME =>’LAR_SCENARIO_PLAN_VW’, STATEMENT_ID =>;’82-2476-34’, FORMAT=>;’BASIC’))
Unfortunately, I received the same result. SO, I decided to go digging into the underlying plan table to see what was going on. I decided to select all rows where the STATEMENT_ID was equal to ‘82-2476-34’ to see if there was a duplicate STATEMENT_ID in the table. No luck.
Hmm… That doesn’t make any sense! There has to be something going on that makes DBMS_XPLAN bring back extraneous rows. My search then turned to the extraneous rows to try to find what they had in common with the rows that should have been coming back. So I started looking at the expanded output of DBMS_XPLAN and trudged through the base table examining the individual attributes to see where DMBS_XPLAN might have been losing it’s mind.
The search finally lead me to PLAN_ID. If I selected all the records from the base table where the PLAN_ID = 7191, the same PLAN_ID as my original statement, here is what I got.
EUREKA! But WHY would DBMS_XPLAN bring back other statement ID’s when I specifically passed in statement id “82-2476-34”? Luckily DBMS_XPLAN is not wrapped so I was able to use SQL-DEVELOPER’s debug feature to walk through the execution of DMBS_XPLAN and grab the explain plan SQL.
SELECT /*+ opt_param('parallel_execution_enabled','false') */ /* EXEC_FROM_DBMS_XPLAN */ id, position, depth, operation, OPTIONS, object_name, cardinality, bytes, temp_space, cost, io_cost, cpu_cost, TIME, partition_start, partition_stop, object_node, other_tag, distribution, projection, access_predicates, filter_predicates, NULL, qblock_name, object_alias, NVL( other_xml, remarks ) other_xml, NULL sql_profile, NULL sql_plan_baseline, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL FROM LAREDO.LAR_SCENARIO_PLAN_VW WHERE plan_id =(SELECT MAX( plan_id ) FROM LAREDO.LAR_SCENARIO_PLAN_VW WHERE id = 0 AND statement_id = :stmt_id) ORDER BY id
The problem shows up very clearly in the generated SQL. The WHERE clause is taking the STATEMENT_ID I passed and getting the MAX(PLAN_ID) and selecting ALL records that have that PLAN_ID. But what if, as in my case, there are multiple statements with the same plan ID?
OK. Well, since DBMS_XPLAN provides the ability to add filter predicates, I’ll try call the procedure as follows:
select * from table(DBMS_XPLAN.DISPLAY(‘LAR_SCENARIO_PLAN_VW’,’82-2476-34’, ‘BASIC’, 'STATEMENT_ID= ''82-2476-34'''))
Adding the filter predicate changed the where clause, but not as I thought it might.
... FROM LAREDO.LAR_SCENARIO_PLAN_VW</span> WHERE 1=1 AND statement_id = :stmt_id AND STATEMENT_ID = '82-2476-34' ORDER BY id
WHAT THE %*@#? …
Ok… I walked through the code again a little more carefully and came to the conclusion that this is just a run of the mill code error. Oracle are building pieces of the where clause and pasting them together based on certain criteria, mixing and matching based on what your PLAN_TABLE looks like. When it comes to the part where they paste in the STATEMENT_ID that you pass in, if you haven’t passed anything to FILTER_PRED, they forget to paste the predicates limiting the selection to your chosen STATEMENT ID in all the necessary places.
While the resulting WHERE CLAUSE you get by actually passing something in to FILTER_PRED is not technically correct, it does work.
I hope this might save some people some time, if they run into something similar. In the mean time I’ll be looking into METALINK to see it the bug is logged, and if not I’ll log it.