Recently, I was asked to investigate high CPU on a production server. I began my routine which is a combination of wait stats, performance counters, and resource hogging statements. I had identified a CPU intensive stored procedure that had a cardinality issue in the plan. I was about to present my findings when…. POOF.
Another DBA had just executed DBCC FREEPROCCACHE and cleared the procedure cache. Immediately following this statement, the server hit 100% CPU and stayed there for 15-20 minutes. So, I started my investigation over and found the server was being flooded by a rarely seen wait type:
This wait type means SQL Server is trying to create an execute plan but doesn’t have enough memory, so the request waits until more memory is available.
On a stable system, you probably won’t see this wait type much unless you’re starved for memory. The server I was investigating had 64GB which seemed appropriate for the work load. So why are we getting hammered with these waits? And why is it taking so long to stabalize?
The key here was to look at the plan cache after the system stabalized. I use the SQLSkills script found on Kimberly Trip’s blog post.
Notice the high count of adhoc plans, they completely dominate the plan cache. 1/3rd of the plans are single use and the average use/plan is 58 uses. There’s far fewer procedure plans but they are extremely high re-use with 100k average uses/plan.
After the plan cache was cleared, all of these plans have to compete for same compilation memory, thus the memory waits. Having to re-compile tens of thousands of plans contributed to the high CPU, which in turn lead to dreadful database performance. In this scenario, a single use adhoc plan was using resources that would be better spent on a high use procedure plan. Had this been a system with a lower number of plans (or a lower number of adhoc plans), the performance degradation may have been shorter lived.
The lesson learned here is that blowing up the procedure cache on a system with a large amount of plans will lead to pain. In our case, we could not differentiate between the initial high CPU issue and the high CPU caused by clearing the cache. So in the end, we don’t know if DBCC FREEPROCCACHE helped the issue.