ODTUG : Blogs : SQL Tuning in Real Life: From Four Minutes to Eight Seconds

Liron Amitzi

There is a lot of information about SQL tuning—tons of presentations, books, and blog posts. But what do people usually think about when talking about SQL tuning? When I bring this up in presentations the answers are usually the same: "the optimizer, the execution plan, indexes, statistics," and so forth. This is all true, but when I look at SQL tuning there is so much more to consider than these technical tuning details.

Yes, sometimes we can solve issues by creating additional indexes, and sometimes we discover that the optimizer is making a mistake and we can fix it. But what about the design of the schema? What about the logic of the query? I have encountered cases where by first understanding what the query should do, I could suggest a different way of doing it, and the new way ended up improving performance much more than any index or statistic I could change.
So, when we look at a new SQL tuning case, these are the things I think we should consider and investigate in this order:

Schema design
Query logic (i.e., what the query’s author was actually trying to achieve)
Query code (SQL)
Execution plan

Then—and only then!—should we try to improve the statement’s performance.

In this article I will explain a real case where understanding the environment that the query was executing within was very important in improving performance and even helped to fix a bug.

Schema Design
The system I'm going to explain here was a monitoring system. It had many endpoint components that sent alert information to a management server which then inserted all of this data into the database. The development team realized that the online table was getting large very quickly, so they decided to move history data to a history table.

The online table (we'll call it ALERT_CURR for current alerts) was receiving all the inserts from all the endpoints. The older information was transferred to the ALERT_HIST table to keep the ALERT_CURR table relatively small. The decision they made was to keep 100 rows per endpoint in the ALERT_CURR table, while a specific process was responsible for moving the older rows from ALERT_CURR to ALERT_HIST.

The reason they chose to keep 100 rows per endpoint is unknown to me; I personally would probably have used a time-based partitioning scheme, as partitioning is the perfect solution for these types of problems. Unfortunately, however, this was an Oracle Standard Edition database, so partitioning was not an option because it’s only available in Oracle Enterprise Edition.

Figure 1. Schema Design

The next thing in the design to consider was the columns of the alert tables. Since these were time-based alerts, they had obviously implemented ENDPOINT_ID and INSERT_DATE columns. The thing is that the alerts were being inserted extremely rapidly—so rapidly that we could have a case where multiple alerts from the same time from the same endpoint would be received simultaneously. In this case, order was critical; we had to know which row was inserted first, so they added a SEQ column that contained an incremental number to guarantee the order.

With this table structure, the primary key that was defined on each table was SEQ, INSERT_DATE, and ENDPOINT_ID.

When I got involved with this project, my client’s team pretty much allowed me to change anything except one thing—this alert history architecture—so I couldn't change or touch the archiving mechanism because that was already built deeply into their application.

Query Logic
After I understood the ALERT_CURR and ALERT_HIST table structures, the relationship between them, and their primary keys, I next turned my attention to the query logic. The application had a dashboard and this dashboard was customizable by the people who used it; it showed the last 300 rows that fulfilled its predicates.

The application did something like this:

Built a query with the relevant predicates.
Executed the query against ALERT_CURR and checked how many rows were returned.
If the application retrieved 300 rows for the selection criteria, proceed no further.
If the application didn't retrieve 300 rows, it executed a similar query against ALERT_HIST to retrieve the remaining rows.

The Problem
Now that I understood the query logic, I was able to figure out the very problem they faced, which was the reason they had asked for my help. Whenever the application couldn't find 300 rows in ALERT_CURR and had to query ALERT_HIST, the query took more than four minutes (240 seconds) to execute. The application was configured to time out after four minutes, so when this query exceeded this time, the application simply cancelled the operation, leaving the user screen empty. This situation was completely unacceptable. My tuning goal was now tightly defined: Improve this query’s performance to something more reasonable, especially when the application had to access ALERT_HIST to retrieve the minimum 300 rows.

Query Code
The next step in the investigation was the query code. A simplification of the SQL looks something like this:
select * from
(select *
from <tab>
where <filter>
order by seq desc
)
where rownum <= 300;

This query simply applies the filter on the table. It orders the result by SEQ (as this is the only column that can guarantee returning the data in the original order in which it was inserted) and returns the latest 300 rows.

SEQ Column and the Execution Plan
As I explained above, the SEQ column was created only to preserve the order of insertion. Today, I would probably use a TIMESTAMP datatype instead (to be honest, I don't remember why we didn't leverage that TIMESTAMP strategy back then). The value of SEQ was internal only and was used only in the ORDER BY clause of the query. In essence, the users didn't really acknowledge that column’s existence, and most predicates in the query were applied to either only the INSERT_DATE column, or in conjunction with criteria against the ENDPOINT column.

In terms of available indexes, there were two: the primary key index (SEQ, INSERT_DATE, ENDPOINT_ID) and another secondary index on INSERT_DATE. Here’s where things get interesting: When we think about the query with the predicate on INSERT_DATE and ORDER BY applied to SEQ, there are two possible ways the Oracle optimizer could use these indexes.

Case 1 - INSERT_DATE

The optimizer can leverage the index on INSERT_DATE in the following way:

Use range scan on the index to filter the rows that match the INSERT_DATE predicate.
Fetch the rows from the table by the ROWIDs from the index.
Sort the result set by SEQ.
Return the first 300 rows.

Figure 2. INSERT_DATE Index Usage: RANGE SCAN

Case 2 - Primary Key
The optimizer’s second alternative is to use the primary key index:

Perform full index scan in descending order (ordered by SEQ, then INSERT_DATE, then ENDPOINT_ID).
During the index scan, check if the date meets the predicate criteria.
Once we have 300 rows that match the predicate, stop.
Fetch the rows from the table by the ROWIDs from the index.

Figure 3. PK Index Usage: FULL SCAN

In our case, Oracle decided to go with method #2 and use the primary key. When Oracle scans the index, it scans the data ordered by the SEQ columns (the first column in the index), but it needs to filter the rows by the INSERT_DATE column. The way Oracle works in this case is that it starts scanning the rows from the most recent SEQ down, and then for each row it checks the INSERT_DATE to see if it meets the required predicate. In cases where the predicate is not on the recent rows, Oracle will have to scan a large portion of the index, row by row, until it retrieves 300 rows that meet the predicate. For example, if the predicate is INSERT_DATE < SYSDATE - 1, then Oracle will first have to scan all the rows from today, one by one, until it gets to the rows from 24 hours ago, and this can be a very large number of rows. This row by row scan is relatively slow, and when we need to scan many rows, the query execution will be long.

But we knew something that the optimizer didn't know. First, the SEQ column meant nothing besides the order. Second, by design, ordering by SEQ meant ordering by INSERT_DATE as well. A newer row will get a higher SEQ number, and that was guaranteed.

Because of this, we could change the column order of the primary key to INSERT_DATE, SEQ, ENDPOINT_ID. When INSERT_DATE is the first column in the index, looking for rows that meet the predicate is fast. Now all we need to do is to add the INSERT_DATE to the ORDER BY clause and still preserve the order in which the rows would be returned. With this tiny change, the optimizer could now leverage the new primary key index more effectively for both the predicate and the ORDER BY, and we would get much better performance.

The new version of the query looked like this:
select * from
(select *
from <tab>
where <filter>
order by insert_date desc, seq desc
)
where rownum <= 300;

As you can see, the only difference in the query is in the ORDER BY clause. Together with the change of physical structure of the index, Oracle can use the index in a more efficient way for both the predicate and the sort.

A BUG!
At that point I realized that there is a bug in the logic of the query. The application tried to first retrieve 300 rows from ALERT_CURR, and if it didn’t find at least 300 rows, it would proceed to retrieve the remainder from ALERT_HIST.

But this logic might actually lead to getting the wrong alerts! Here is an example:

I have 5 endpoints sending data all the time (at 100 alerts per hour).
Today at 12 PM, endpoints 1-4 stopped sending alerts completely, but endpoint 5 continues.
Now it's 5 PM and I open the dashboard asking for the last 300 alerts without any filters.
The design is that ALERT_CURR will contain 100 rows for each endpoint (and not the last alerts by time), so the dashboard will show the last 100 from endpoint 5 and the other 200 alerts will be from the other endpoints.
However, this is wrong! The expected list should contain the last 300 from endpoint 5, as all alerts from endpoint 5 after 12 PM are actually newer than alerts from any other endpoint.

To fix this logic error, the solution is to query ALERT_HIST as well as ALERT_CURR every single time. Of course, this would also have the unintended side-effect of making the query’s performance issue occur even more frequently.

The updated query looks something like this:

select * from
(select * from
(select * from alert_curr
union all
select * from alert_hist
)
where <filter>
order by date desc, seq desc
)
where rownum <= 300;

In this query, the UNION ALL combines the rows from both tables. Only then do we apply the filter and the ORDER BY to get the first 300 rows.

Note that we used UNION ALL and not the UNION operator, as the UNION operator eliminates duplication by sorting the returned result set, thus increasing query overhead unnecessarily. In our case, we know that the tables are mutually exclusive, so both UNION and UNION ALL will return exactly the same result set, which makes the duplicate elimination a redundant overhead.

Reducing the Number of Rows Retrieved
At this point the query is now returning correct results but was still running quite slow. Checking the execution plan showed that the indexes were not being used in an efficient way (i.e., as they were when querying only a single table). Also, it was quite obvious that the ORDER BY after the UNION ALL required the database to work a lot harder than it needed to: Oracle retrieved the needed data from both tables, combined the resulting rows, and then still had to reorder the entire result set. There is a possibility that Oracle has added improvements to the optimizer so it can handle cases like this in a better way these days (like predicate pushing), but back in database release 9i, the plan was far from optimal.

Since large sort operations cause a huge load on the system, and are relatively slow, we usually want to avoid them, of course. One way is to create an index sorted by the query's ORDER BY clause. This is basically what the optimizer did originally: It leveraged the original primary key index on SEQ, INSERT_DATE, and ENDPOINT_ID to scan the data that was already sorted instead of using the INSERT_DATE index and sorting the resulting set afterwards.

But another method to avoid the overhead of large sort operations is simpler: What if we could reduce the number of rows that need to be sorted? In our current scenario, we only need to present the 300 rows, but the problem is that we need the most recent entries from both tables, and we don’t know which table that combined result set will be gathered from: 300 rows may be selected from just the ALERT_CURR table alone, or from the ALERT_HIST table alone, or from a combination of the two—say, 175 from ALERT_CURR and then the remaining 125 from ALERT_HIST.

The solution I proposed was to simply retrieve a maximum of the most recent 300 rows from each table and only then combine those two result sets. The final query looked like this:
select * from
(select * from
((select * from
(select * from alert_curr
where <filter>
order by insert_date desc,seq desc)
where rownum <= 300)
union all
(select * from
(select * from alert_hist
where <filter>
order by insert_date,seq desc)
where rownum <= 300)
)
order by insert_date desc, seq desc)
where rownum <= 300;

This query first filters each table, orders the results appropriately, but returns only the latest 300 rows from each table separately; only then does it combine both the result sets via UNION ALL. Both subsets of results need to be re-ordered again so that only the top 300 rows are selected from the combined result set, but now there are only 600 rows to re-sort, thus avoiding the original much larger sorting operations against each individual result set.

Summary
When we started working to improve the dashboard query’s performance, it took more than four minutes (240 seconds) in many cases. When we finished, the final query was executed in about eight seconds, which was completely reasonable—and a pretty dramatic performance improvement of 30X.

Let's revisit the steps of this SQL tuning process:

Schema design. We took the time to understand exactly what the SEQ column was used for.
Query logic. By delving into the logic to understand it better, we managed to find a bug that caused wrong results in the dashboard.
Query code (SQL). Changing the query code reduced the amount of rows in the sort and used the indexes efficiently, simply by moving the predicate and changing the ORDER BY clause.
Execution plan. Examining the execution plan helped us understand where the problem really was: First, the primary key scan wasn't efficient because it didn't allow filtering the rows using the index, and that helped clarify that the main problem was actually the huge sort.

SQL tuning is not only performing technical activities but understanding them as well. Although during this tuning process our team ended up changing the primary key, and by that, its index (which is one of the things people always think about when talking about tuning), we also used our knowledge and understanding of the environment and the query logic to solve this problem.

When we combine the understanding of how the data is stored within the database with what the application is actually trying to do and then apply our understanding of how the optimizer works, we often achieve the best results.

About the Author :

Liron is an Oracle ACE and a senior Oracle DBA consultant, with more than 20 years of experience. During these years Liron worked as a senior consultant with a large number of companies in various fields and managed an Oracle Professional Services Team. He mainly specializes in high availability solutions, performance, backup and recovery, and other infrastructure and application database areas.
Liron is the president of BCOUG (British Columbia Oracle User Group), and he is also a well-known instructor and lectures in Oracle courses, events and forums.

SQL Tuning in Real Life: From Four Minutes to Eight Seconds

Liron Amitzi

About ODTUG

Contact Us

Quick Links