Historical Details, ASH Reports, and Digging for the Truth

kellyn-PotVin.jpg

Guest Blogger: Kellyn Pot’Vin

Short Introduction:

I’m Kellyn Pot’Vin, pronounced Poet-vaughn, and I’m a senior technical consultant with Enkitec, an Oracle-centric consulting partner, (www.Enkitec.com).  I have a technical blog,dbakevlar.com, am part of the board of directors for RMOUG, and thanks to Chet Justice, (ORACLENERD.com) also part of the database track for Kscope12. I live in Westminster, CO with my partner, Tim, along with my three children, Sam, Cait, and Josh.

You can reach me via gmail at dbakevlar or I can be followed on Twitter @DBAKevlar and Facebook or LinkedIn as Kellyn Pot’Vin.

Now for the post:

All DBAs have experienced this. We come in after a weekend or a couple days have simply gone by when suddenly, a user or your manager approaches you and says, “You know, we had a performance problem on XYZ at 00 o’clock. I’d like to know what happened and why the database performed so poorly…”

The database is always guilty until proven innocent and the DBA must be the one to represent it. The more you know about how to find historical information that can justify the database is innocent; the better off your database environment is from becoming the source of blame every time something goes awry. The additional benefit of having the users and managers coming to you to find out what *REALLY* happened when performance challenges occur vs. having assumptions rule the day.

The example below has occurred just recently for a client. The manager wants to know why he continues to have blocked sessions in his APEX application. Even though this example is from an APEX issue, I felt it was a good one to use for this example.

As the DBA, we start by viewing Top Activity from Enterprise Manager and it’s easy enough to capture the issue in the grid historical view (click  to see image full size):

I know they say “Girls love pink,” but no matter what gender DBA you are, you don’t like anything in the pink, red, orange, or brown family on our grid. As the red section is easy enough to pinpoint the area of concern and the complaint, you can direct all of your attention to this point in time.

As the Active Session History is aggregated, the specifics seen in the left side is obscured. We clearly see the two sessions on the right that are also in red and can drill down, making the easy assumption that they are related:

 

Having the same SQL_ID substantiates the assumption and we can move forward.

You could drill down and come up with your own data, but today, my preference is to have a report that will clearly link all data and leave less to assumption. You have the time of the impact by the issue, running an ASH report is an easy way to see what occurred.

An ASH report is a simple task to run from the Enterprise manager, but we’re going to run this from the command line for the example.

Log into the database server, set your environment and change directories to the one you would like to write the report to. Log into the database with SQL Plus, as a user who has the privileges to execute an ASH report and execute the following:

You will only need the snapshot in time you want the report for:

Yes, the format for the snapshot is not the same as for an ADDM or AWR report but not difficult and Oracle ensures to even remind you of the format to use. My snapshot is from 9:09 AM and the duration I chose is thirty minutes. I then simply name the report and that’s all there is to the command line version to generating an ASH report.

The report is then deposited in the current directory you have run the report from and you can view/vi the report from a Linux/unix prompt.

The top wait event is something we would expect to see and the second not so much- CPU and Waits for CPU.

The second wait event line is less common and a serious concern for any DBA- transaction row lock contention.

The TCP socket (KGAS) is the network waits we could see at the top and are part of a separate export process, (can be seen as a light brown section in the Grid of the first graphic). It should not be considered as part of the scenario at and justifies why having a second view, such as EM, is handy when looking into an issue. It often quickly helps you rule out what is NOT part of the problem.

The next section of the ASH report shows us a bit more info:

Again, we see the transaction row lock contention. This section also shows us what is seen as the cause of the wait. Top service/modules show the problem from APEX and which “page” is the heaviest percent of the activity.

We can see what applications are experiencing the contention and which area of the application, in this case APEX, is the heaviest percentage of the action. This can be very valuable for the developer who may be assisting you on resolving the issue. I strongly believe that to come to the right solution, you should involve all the right skilled individuals. I’m the DBA- yes, I can and have been in the development role, but I would be wise to include the developer who knows this code, this application best. To not include him would be arrogant and ask for assumptions to be made.

We have now approached the section of our report that lists out the sid/serial# for the top sessions.  This offers us a clear view, which happens to include our two SIDs that we saw on the first grid panel in red, of the wait TX, (transaction) row lock contention. Note that there is one sid that is shown above it for top sessions. This is something you definitely want to take note of. We can then see in the next section why there is the wait and why sid 412 is listed first:

Well, no wonder SIDs 423 and 458 aren’t happy-  412 is blocking them! Now we need to know why 412 is blocking.

It is easy to see the matched % Event on the XXX_RPTS table and the 412 blocking session above.

We also know that the TX Row Lock Contention is showing on a sequence. A sequence should not be holding up a transaction, unless odd circumstances-

Check the sequence…

Hmmm…not as good as I would prefer…  This we can fix up, but THERE has to be something more…

So what is the sequence being used for?  Check for triggers-

Now I’m really curious and want to see all of this trigger’s code, so let’s take a look:

Now I’ve experienced triggers with more complex logic coded into them and this is packaged code from Oracle, too, but this is a “hidden” wait that often is difficult to find for most DBAs and Developers. What started out as a simple update to a table becomes much more when a trigger has a bit of complexity added into it, just a bit of logic that might be better put elsewhere. (click on image to enlarge).

Triggers are powerful, but they must be gauged wisely and understand the performance risk that can arise when too much complexity occurs either in the trigger or in the application logic or due to database growth, (or all three.)

The questions to ask the developer or application support person could be similar to the following:

  • Why can’t these columns have default values when null?
  • What is the business reason for the insert/update to the secondary table, (many times the trigger may have outlived its use, ask them to justify it.)
  • Note the indices- are indexes present that may help the process and/or are their indices impacting it that are not used, (need to monitor a few indexes)?
  • Could the complex logic be moved to a procedure or package that can be called from the application?
  • If the trigger is used for auditing or reporting tables, would a materialized view or other option be a better choice at this stage in the database’s life?
  • Can the logic in the trigger be simplified?

This is a lot of data for any developer to take in and the developer may get frustrated when they fully understand what is occurring inside the database each time this trigger fires vs. what they may have assumed was a simple transaction. Your job as a DBA is to support him/her and if the design/code can be changed to improve performance to stop the blocking session issue, then proceed forward to do so.

I will be speaking/attending the following conferences the next couple of months:

Miracle World 2012, Denmark, http://mow2012.dk/

April 19th: ASH Analytics- Top Activity, the Next Generation

April 20th: EM12c, Making it work for you!

OUG Harmony, Finland, http://www.ougf.fi/

May 30st: EM12c, Making it work for you!

OUG Harmony, Latvia, http://www.ougf.fi/

June 1st: EM12c, Making it work for you!

Kscope12, San Antonio, TX, http://kscope12.com/

June 27th: EM12c, Making it work for you!

E4, Enkitec Extreme Exadata Expo, Los Colinas, TX, www.enkitec.com

August 13-14, New Date and More Details to Come!

 

Recent Stories
ODTUG 2020 Leadership Program Application Now Open

ODTUG July 2019 Newsletter

2020 Leadership Program - Now Accepting Applications