2016-09-14

Enhanced T-SQL Error Handling With Extended Events

T-SQL TRY...CATCH with Extended Events

In the previous post, I examined a couple of error handling scenarios in T-SQL that don't work as expected with TRY...CATCH. The first problem involved multiple errors for a single statement in the TRY block: when control transferred to the CATCH block, only one error was accounted for. The second problem involved multiple errors caused by DBCC CHECKDB on a corrupt database. The TRY block "ate" all the errors and control never transferred to the CATCH block, leaving the impression there were no errors at all. Now let's look at a way to enhance error handling with Extended Events to overcome the shortcomings of TRY...CATCH for the two scenarios.

The design pattern involves these steps:

  1. Create and start an Extended Events Session: it will capture sqlserver.error_reported events, filtered primarily by SPID.
  2. Execute a statement in a TRY block.
  3. Within (or after) the CATCH block, read the XEvents session data.
  4. Use the available data to respond to the error(s) as appropriate.
  5. Stop and drop the XEvents session.


Here's an example that can be run manually in a single batch from SSMS. Just make sure to replace all occurrences of "2016" in the script with your SPID.

CREATE EVENT SESSION [Error Handling Session(2016)]
ON SERVER 
ADD EVENT sqlserver.error_reported
(
    ACTION(
        sqlserver.session_id,
        sqlserver.sql_text
    )
    WHERE [package0].[not_equal_unicode_string]([message],N'''''') 
    AND [severity]>(10) 
    AND [sqlserver].[session_id]=(2016)
) 
ADD TARGET package0.ring_buffer
WITH (
    --ALLOW_SINGLE_EVENT_LOSS
    --NO_EVENT_LOSS
    EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,

    MAX_MEMORY=4096 KB,
    MAX_DISPATCH_LATENCY=1 SECONDS,
    MAX_EVENT_SIZE=0 KB,
    MEMORY_PARTITION_MODE=NONE,
    TRACK_CAUSALITY=ON,
    STARTUP_STATE=OFF
);

ALTER EVENT SESSION [Error Handling Session(2016)] ON SERVER STATE=START;

BEGIN TRY
    BACKUP DATABASE master
    TO DISK = 'master.diff.bak'
    WITH DIFFERENTIAL;
END TRY
BEGIN CATCH
    DECLARE @XEData XML

    SELECT @XEData = CAST(xet.target_data AS XML)
    FROM sys.dm_xe_session_targets AS xet
    JOIN sys.dm_xe_sessions AS xe
        ON (xe.address = xet.event_session_address)
    WHERE xe.name = 'Error Handling Session(2016)';

    /*
    Check value of "totalEventsProcessed" to ensure events have been 
    dispatched to event session target (ring_buffer).
    If no events have been processed, delay for a period of MAX_DISPATCH_LATENCY +1 (in seconds).
    */
    IF @XEData.value('(/RingBufferTarget/@totalEventsProcessed)[1]', 'INT') = 0
    BEGIN
        WAITFOR DELAY '00:00:02';

        SELECT @XEData = CAST(xet.target_data AS XML)
        FROM sys.dm_xe_session_targets AS xet
        JOIN sys.dm_xe_sessions AS xe
            ON (xe.address = xet.event_session_address)
        WHERE xe.name = 'Error Handling Session(2016)';
    END

    --Comment/uncomment as desired to show the formatted XML session data.
    SELECT @XEData;

    --Shred the XML. Do whatever you want with this data: 
    -- log it to a table, create and send an email alert, etc.
    SELECT 
        x.c.value(N'(@name)[1]', N'NVARCHAR(MAX)') AS EventName,
        x.c.value(N'(@timestamp)[1]', N'datetime') AS EventTime,
        x.c.value(N'(data[@name="error_number"]/value)[1]', N'NVARCHAR(MAX)') AS ErrorNumber,
        x.c.value(N'(data[@name="severity"]/value)[1]', N'NVARCHAR(MAX)') AS Severity,
        x.c.value(N'(data[@name="state"]/value)[1]', N'NVARCHAR(MAX)') AS [State],
        x.c.value(N'(data[@name="user_defined"]/value)[1]', N'NVARCHAR(MAX)') AS UserDefined,
        x.c.value(N'(data[@name="category"]/text)[1]', N'NVARCHAR(MAX)') AS Category,
        x.c.value(N'(data[@name="destination"]/text)[1]', N'NVARCHAR(MAX)') AS Destination,
        x.c.value(N'(data[@name="is_intercepted"]/value)[1]', N'NVARCHAR(MAX)') AS IsIntercepted,
        x.c.value(N'(data[@name="message"]/value)[1]', N'NVARCHAR(MAX)') AS [Message],
        x.c.value(N'(action[@name="sql_text"]/value)[1]', N'NVARCHAR(MAX)') AS SqlText,
        x.c.value(N'(action[@name="session_id"]/value)[1]', N'NVARCHAR(MAX)') AS SessionId
    FROM @XEData.nodes('//RingBufferTarget/event') AS x(c)
    --WHERE x.c.value(N'(data[@name="destination"]/text)[1]', N'NVARCHAR(MAX)') = 'USER'

END CATCH

ALTER EVENT SESSION [Error Handling Session(2016)] ON SERVER STATE=STOP;
DROP EVENT SESSION [Error Handling Session(2016)] ON SERVER;
GO

Below is the output from my server (modified for display purposes). There is one row for each error event that occurred as a result of the BACKUP DATABASE statement. Columns correspond to metadata items from each event occurrence. Notice in the [Destination] column, errors 3024 and 3013 are sent to 'USER', while error 3041 is sent to 'ERRORLOG' and 'EVENTLOG'. IMO, error 3024 (sent to the user) is more informative than what gets written to the SQL error log and the Windows event log (error 3041). Yet it's the odd man out when you use TRY...CATCH. There's a commented WHERE clause that filters on just the 'USER' errors--uncomment as you see fit (or create your own).

T-SQL TRY...CATCH with Extended Events - Results1

 

Here's an example for DBCC CHECKDB on a corrupt database. Remember from the last post that in this scenario, control never passes to the CATCH block. So we'll need to check the Event Session data after END CATCH. You can also run this as a single batch in SSMS, but you'll need a corrupt database to get similar results. As before, replace all occurrences of "2016" in the script with your SPID.

CREATE EVENT SESSION [Error Handling Session(2016)]
ON SERVER 
ADD EVENT sqlserver.error_reported
(
    ACTION(
        sqlserver.session_id,
        sqlserver.sql_text
    )
    WHERE [package0].[not_equal_unicode_string]([message],N'''''') 
    AND [severity]>(10) 
    AND [sqlserver].[session_id]=(2016)
) 
ADD TARGET package0.ring_buffer
WITH (
    --ALLOW_SINGLE_EVENT_LOSS
    --NO_EVENT_LOSS
    EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,

    MAX_MEMORY=4096 KB,
    MAX_DISPATCH_LATENCY=1 SECONDS,
    MAX_EVENT_SIZE=0 KB,
    MEMORY_PARTITION_MODE=NONE,
    TRACK_CAUSALITY=ON,
    STARTUP_STATE=OFF
);
GO

ALTER EVENT SESSION [Error Handling Session(2016)] ON SERVER STATE=START;
GO

BEGIN TRY
    DBCC CHECKDB ('CorruptionTest')
    WITH NO_INFOMSGS;
END TRY
BEGIN CATCH
    PRINT 'Error occurred and we made it to the CATCH block.'
END CATCH

/*
 Delay for a period of MAX_DISPATCH_LATENCY +1 (in seconds).
 If there are any events that need to be dispatched to the event session 
 target (ring_buffer), this will ensure that we have alotted enough time.
*/    
WAITFOR DELAY '00:00:02';

DECLARE @XEData XML    

SELECT @XEData = CAST(xet.target_data AS XML)
FROM sys.dm_xe_session_targets AS xet
JOIN sys.dm_xe_sessions AS xe
    ON (xe.address = xet.event_session_address)
WHERE xe.name = 'Error Handling Session(2016)';

--Check value of "totalEventsProcessed".
IF @XEData.value('(/RingBufferTarget/@totalEventsProcessed)[1]', 'INT') > 0
BEGIN
    --Comment/uncomment as desired to show the formatted XML session data.
    SELECT @XEData;

    --Shred the XML. Do whatever you want with this data: 
    -- log it to a table, create and send an email alert, etc.
    SELECT 
        x.c.value(N'(@name)[1]', N'NVARCHAR(MAX)') AS EventName,
        x.c.value(N'(@timestamp)[1]', N'datetime') AS EventTime,
        x.c.value(N'(data[@name="error_number"]/value)[1]', N'NVARCHAR(MAX)') AS ErrorNumber,
        x.c.value(N'(data[@name="severity"]/value)[1]', N'NVARCHAR(MAX)') AS Severity,
        x.c.value(N'(data[@name="state"]/value)[1]', N'NVARCHAR(MAX)') AS [State],
        x.c.value(N'(data[@name="user_defined"]/value)[1]', N'NVARCHAR(MAX)') AS UserDefined,
        x.c.value(N'(data[@name="category"]/text)[1]', N'NVARCHAR(MAX)') AS Category,
        x.c.value(N'(data[@name="destination"]/text)[1]', N'NVARCHAR(MAX)') AS Destination,
        x.c.value(N'(data[@name="is_intercepted"]/value)[1]', N'NVARCHAR(MAX)') AS IsIntercepted,
        x.c.value(N'(data[@name="message"]/value)[1]', N'NVARCHAR(MAX)') AS [Message],
        x.c.value(N'(action[@name="sql_text"]/value)[1]', N'NVARCHAR(MAX)') AS SqlText,
        x.c.value(N'(action[@name="session_id"]/value)[1]', N'NVARCHAR(MAX)') AS SessionId
    FROM @XEData.nodes('//RingBufferTarget/event') AS x(c)
END

ALTER EVENT SESSION [Error Handling Session(2016)] ON SERVER STATE=STOP;
DROP EVENT SESSION [Error Handling Session(2016)] ON SERVER;
GO

Below is the output from my server (modified for display purposes). There is one row for each error event that occurred as a result of the DBCC CHECKDB statement. Columns correspond to metadata items from each event occurrence.

T-SQL TRY...CATCH with Extended Events - Results2


 

Notes

I wish I knew more about the nuances of Extended Events. I made some arbitrary decisions for the session I created, making changes as testing unfolded. Here are some other observations and comments:

  • I chose ring buffer as the event session target, because I figured it would be fast (and I didn't want to go out to the file system). I didn't realize it was an asynchronous target at first. In my early testing on SQL 2012 and 2014, results came back instantly, even with the default value of MAX_DISPATCH_LATENCY=30 SECONDS. I wasn't using WAITFOR DELAY in those early rounds of testing. Looking back, I'm surprised it worked. It wasn't until I tested on SQL 2008 R2 that I discovered the latency issue. I lowered the default setting to the minimum MAX_DISPATCH_LATENCY=1 SECONDS. It still wasn't enough, so I added the WAITFOR DELAY logic.
  • When creating the EVENT SESSION, I was unable to specify the session_id dynamically with @@SPID. You could take the code above and make something automated out of it, but it looks like some dynamic T-SQL will be required.
  • I filtered out errors with a severity of 10 or less. This mimics the behavior of TRY...CATCH.
  • The "best" setting for EVENT_RETENTION_MODE would logically be NO_EVENT_LOSS. This setting worked on SQL 2012 and SQL 2008 R2. But it was not allowed on SQL 2014. For that version, ALLOW_SINGLE_EVENT_LOSS was used.


Closing Thoughts

I don't know what to call all of this. A workaround? A hack? A gratuitous hot mess? I suppose it looks complicated and verbose as-is. I'll address that in my next post by isolating some of the code within stored procs that will create the EVENT SESSION dynamically. That should hide some of the complexity and improve the usability of the code. I'll also show how to iterate through a large number of databases, *not* stopping if an error is encountered. And of course, if an error is encountered, getting all the pertinent error information and sending an email alert.

UPDATE:
A humble and heartfelt thank you to SQL Server MVP Erland Sommarskog. In his epic series Error and Transaction Handling in SQL Server, Erland pays me a compliment for my efforts here in this post.


SHARE