When you are on the conference bridge, troubleshooting a slowdown or outage, it is necessary to have some understanding of what the DBAs and UNIX admins should be or are looking at to measure how the database is performing (or not performing). Here is an overview of that.
What is important to know is not just the current statistics, but the change in these statistics. Remember that the first thing the manager always says when he jumps on the bridge is “what changed?”
Some of this data is maintained in historical views in Oracle, like V$ACTIVE_SESSION_HISTORY, but other information is not.
UNIX commands like sar can capture system stats. But you need some way to accumulate and aggregate this information, which is what the performance monitoring tool does. You can also dump the beginning and end stats into an Excel spreadsheet and do you own time series analysis or other modelling against that to look for trends and statistically significant event and establish baselines and thresholds (i.e. all of this together is the science of analytics). I recommend that you try to learn something about statistics as this is what analytics is all about. Take a couple of lessons from the Khan Academy; they are free.
Oracle Performance Stats
Oracle collects ongoing stats plus it does sampling at certain intervals and then collects that. One example metric that they save is database calls per second.
Oracle actually defines a “metric” as a change in value, while I would call that a “delta,” which is what statisticians call it. To me a “metric” is just another name for a measurable value or “statistic,” like database calls per second.
Oracle allows you to capture a snapshot of statistics and establish that as a baseline. But that is not the analytics approach, which would be use statistics to determine the baseline. Because the question is: at what point do you capture the baseline? If you do that at what empirical evidence tells you is optimal performance, then that is the opposite of using analytics. That is a hunch. If you saw “Moneyball,” you know what I mean. The practiioneer of analytics would say you need some analysis of the data to determine what is “normal.”
Oracle captures session and system stats in the V$SESSTAT and V$SYSSTAT views. A view is a select statement that joins two or more data to bring data together in another easier to understand “view.” In other words it is the aggregate of some other information selecting what is only the most relevant pieces of information.
Here are some relevant Oracle metrics and operating system, disk, and network stats that allow you to measure and troubleshoot Oracle performance issues.
A process or thread is waiting due to some kind of contention. A process is running program that you can see when you run the ps command in UNIX. A thread is instructions running inside a process, meaning the process is multithreaded. Threads are supposed to be thread-safe, meaning that one stops when another is holding up needed data. To violate this principle would allow data to get updated out of order, leading to data or referential integration problems. Referential integrity means something is missing in the child-parent structure of the relational database, leaving orphaned records.
Record locking occurs when processes looking to update the same data. Oracle also locks data when it writes to the redo log after a commit. There are network and system I/O waits as well: for example, if the disk controller is busy there is the need to wait for that to free up to read or write data to disk.
Oddly enough, Oracle classifies an “idle” event as a wait event. I would say that is “idle,” meaning not waiting.
Time Model Statistics
Statistics over time put into a common scale are stored in the V$SESS_TIME_MODEL and V$SYS_TIME_MODEL views. DB time stored here shows the total time spent in database calls. It indicates workload. This is the sum or all CPU and wait times (subtracting the idle ones mentioned above.)
Active Session History
Each row in the view V$ACTIVE_SESSION_HISTORY is one active session. Active sessions are those that are not marked as waiting. The latest session is listed at the top.
Below are some UNIX commands grouped by the category that they measure. The UNIX admins and dba should be looking at this in additional to the Oracle performance views when measuring or troubleshoot performance issues.
sar, vmstat, mpstat, iostat
Virtual Memory Stats
When people use these command, they often do not know what they all the columns mean. So you should tell your staff to document what each of the columns mean and discuss that with everyone can everyone can have a common understanding of that.