Ever heard of "context switches"? If yes, then you are probably an Administrator for a Citrix Presentation Server farm. Now this is not a Citrix related problem but the context switches are one of the default performance counters in the Citrix Resource Manager and obviously the most often raised alert. Other systems might also have issues with context switches but if there isn't an alerting monitor you will never know.

Understanding Context Switches

I don't want to go too much into details here, so this is only a very simple basic description of context switches but it should help you understand what it really means.

Processes contain "threads" that are doing the work; they are scheduled and run on the system CPU (at one CPU not on all available CPU's). A process can have multiple threads but only one thread can run on a CPU at a time. The amount of time a thread runs is called "quatum" and when the time is over the system "switches" to the next thread in line (This is the normal case for a switch) – a "context switch" happend.
If the performance counter shows high context switches, it means that threads have less time to do their work and the system performance might go down. At that time the Citrix Resource Manager or any other monitor will raise an alert to inform the Administrator that something is wrong.

 

Two other definition of context switches

Microsoft
"The average rate per second at which the processor switches context among threads. A high rate can indicate that many threads are contending for processor time."


Windows Internals
"When Windows selects a new thread to run, it performs a context switch to it. A context switch is the procedure of saving the volatile state associated with a running thread, loading another thread’s volatile state, and starting the new thread’s execution."

 

Cause of high Context Switches

Common issue I have encountered is a too small page file or where the page file could dynamically grow (start- and end size not set to the same value). Also an option is the write cache of a (RAID) controller that you might want to change using Microsoft's dskcache utility (or the vendor tool). High activity rates can also result from inefficient hardware or poorly designed applications.

 

Troubleshooting Context Switches

As always there are different ways to troubleshoot such problem but the main target is to find the process(es) that are generating high context switches. Keep in mind that you might need better hardware. Now how can I find the amount context switches on the system? The answer is the Microsoft performance counter (perfmon.msc) under system/context switches or thread/context switches. Looking at the performance monitor for context switches based on threads is hard to figure out what process(es) is causing the high rate.

A better utility is sysinternals process explorer. By default process explorer doesn't show context switches and needs to be set in view | select columns | Process Performance | Activate context switches and context switch delta.


Image  Image

You should see both columns in the main view of process explorer. The context switches row shows the total number of switches since the system boot time. Sort the row and look for a) high values and b) fast growing values both are good indicator for high switch rates of the process(es).

 

Image
  Process Explorer - Context Switches row

 

Next check the CSwitch Delta row for a high value, since the value shows the context switches made per process explorer refresh interval (if the "update speed" is set to one second, then you have Context Switches / Sec). Once you have found the process(es) you should find out why the process is generating those context switches.

 

Image
  Process Explorer - CSwitches Delta row

 

Values for "bad" Context Switches / sec

The default context switches "red alert" value for Citrix Resource Manager is 14.000 but is for a single CPU. The value is per CPU and if the system has two CPU’s you should change the value to 28.000 or 42.000 for three CPU's and 56.000 for a quad CPU system. Still these values are just some basic suggestions and for a "good" value you have to monitor your system over time.


Anyway, I have talked to Tim Mangan at BriForum 2006 EU and he said that context switches is something you should not constantly keep an eye on.

References

 


Add comment