At work today I ran into a new reason for not keeping open MySQL connections for a long time. It involves dynamic session variables like long_query_time.
I wanted to capture a couple of hours worth of all queries in the slow query log so I could analyze them with pt-query-digest from the excellent Percona Toolkit. So, I used
set global long_query_time=0;
After a couple of hours I had a lot of data. I discovered later that I was actually missing some of the early queries.
Then I set long_query_time back to 1 so only queries longer than 1 second would be logged. To my initial amazement, lots of very short queries continued to be logged to the slow query log.
A little research turned up the fact that the long_query_time session variable for a connection is initialized from the global variable only when a connection is opened. So, any connections that were open when I set long_query_time to 0 continued on as if it were still set to 1. Therefore I missed capturing those queries.
Worse, though, is that some of our code uses connection pools that can keep alive connections for a few hours. Short queries on those connections continued to be written to the slow query log after I set long_query_time = 1. Fortunately, I was tailing the slow query log, so I noticed this before it got too big. At this point, you can temporarily disable slow query logging, let it go and hope those connections don’t last too much longer, or go all club and hammer on the connections and kill them and hope your code handles it gracefully. The last one generally isn’t so great of an idea for a production app.
As I mentioned above, my slow query log capture didn’t include sub-second queries from connections that were open when I set long_query_time to 0. That would obviously affect the results of my analysis by leaving out the queries from long-lived connections.
So, if you’re doing something like this, you should capture data for a long enough time to offset this factor or at least throw out the data from the earlier parts of the log. You should also check out the open connections with show processlist; before changing long_query_time.