If you have been in the IT industry long enough, you probably know this story well. The application works fine, then, suddenly, the application hangs with no apparent reason. You restart the application and it all goes away. A week passes, maybe two, and then the application hangs again. Another restart and you’re back. It doesn’t crash or fail (no crash dump or thread dump)—it just sits there and hangs. No users are being served.
Eventually you decide to just restart every night hoping it will not hang again. It doesn’t matter if you’re using a Tomcat application server, WebSphere, WebLogic, JBoss or whatever — if you have been in the software development business long enough, you must have experienced this problem. This is where application monitoring can help.
Below are the top 3 reasons why an application server hangs:
Reason #1: it’s a database problem.
This may sound strange, but the main reason an application server hangs is not directly related to the application server itself. The location of the symptom is rarely the location of the root cause. The following scenario is quite common:
- The database is bottlenecked, causing queries to run slower than usual.
- Requests that used to take 1 second, now take 5 seconds to complete.
- The average number of concurrent requests slowly increases (due to backlog).
- The server runs out of threads and the application server hangs.
If you manage to get a thread dump, you’ll just see a bunch of threads waiting and another group that’s actually running. Another possibility is that the number of waiting threads (or queued threads) will gobble up all available memory and, eventually, lead to an OutOfMemory error.
Reason #2: deadlocks.
If it seems that the application server is doing nothing, look for deadlocks. These can be database deadlocks that cause your SQL queries to hang, or seek the update statements. For example, a transaction log that is written to the database for each request may easily hang the entire application if the log table is locked. It can also be a deadlock of the application re-accessing itself. Do you have any HTTP SOAP calls from one application server to another? Also check for shared objects—an operating system file that is written to from multiple threads at once.
Reason #3: run-away thread.
In cases where the application server is indeed to blame, you should look for a run-away thread. These are hard to detect because they hardly show up on logs since they are usually only written when the request has completed. A run-away thread will probably not return until it has already affected the entire application. Therefore, the hanging request will not be written to the log. These ‘runaway’ threads typically include infinite loops in code. For example, a query that should show results that does not include the option of paging between result pages suddenly needs to display a large number of results. The page takes forever to render and clobbers the application server, eventually causing it to hang.
These types of application hangs are extremely difficult to diagnose and detect. The hardest part is isolating the root cause of the problem. If the application server hangs, it doesn’t necessarily mean that the problem resides there. It usually doesn’t. End-to-end transaction management tools, such as SharePath by Correlsense, helps to pinpoint the reasons for an application hang by providing a real-time view into the entire application behavior.