Downgrading A SQL Server Database To An Older Version

Dave Mason - Downgrade SQL Server Database

One of the recurring questions I see on Stack Overflow is "How do I restore a SQL Server backup to a previous version of SQL Server?" The answer, of course, is you don't. Upgrading a database to a newer (major) version is a one-way ticket--at least as far as the database files and subsequent backups go. I recently found myself in a similar position as all those hapless Stack Overflow questioners. I had a customer that had migrated to a newer version of SQL and they wanted to roll back to the previous version. What to do?

A couple of thoughts immediately came to mind. There's the SQL Server Import and Export Wizard and the Generate and Publish Scripts Wizard. Neither of these sounded convenient. In particular, generating a script with both schema and 500 GB of data sounded like a fruitless endeavor. Two other options sounded much more appealing. So I focused on those.


BACPAC

The first was the Export Data-tier Application, which would create a BACPAC file from the newer version. In theory, I would be able to import that to a lower version of SQL Server. I started the wizard from SSMS, made two simple choices (name and path of the file, along with the list of tables I wanted) and clicked "finish". Any optimism I had was short lived. My database had cross-database references in numerous views and stored procedures. The BACPAC export failed. Much to my surprise, I found no option to pick and choose what objects (other than tables) were to be included/excluded with the export. On to plan B.


Something Old, Something New

The option I settled on was to bulk load tables one at a time. I'd have to truncate tables on the old version database first. Truncating tables in a particular order due to foreign keys gave me pause. I didn't want to deal with that (go ahead and chuckle now--I had to revisit this later). The "older" version of SQL in my circumstance was SQL 2014. I enlisted the use of DBCC CLONEDATABASE to get an "empty" database as my starting point. This was merely the second time I'd ever used this new-ish feature. To connect to the "new" version of SQL that held the current source data, I opted for a linked server. Now all I had to do was figure out the steps to load the data. With a little trial and error I settled on this:

  1. Put database into SIMPLE recovery mode
  2. Disable nonclustered indexes
  3. Disable FOREIGN KEY and CHECK constraints
  4. Disable triggers
  5. Set AUTO_CREATE_STATISTICS off
  6. Iterate through the tables
    • Turn IDENTITY_INSERT off (if necessary)
    • Insert into old version table from new version table, (excluding computed columns, if any)
    • Ensure rows are inserted by order of cluster key column(s)
    • Turn IDENTITY_INSERT on (if necessary)
  7. Set AUTO_CREATE_STATISTICS on
  8. Enable triggers
  9. Enable FOREIGN KEY and CHECK constraints
  10. Rebuild/enable nonclustered indexes
  11. Update non-index statistics?
  12. Revert recovery mode to FULL

This seemed to work fairly well for a smaller database about 15 GB. In the end, all of this ended up being an academic exercise. The customer ultimately decided to stay on the newer version of SQL and I never did downgrade the large 500 GB database.


Hat Tips

When I first realized the challenge ahead of me, I sought out some advice from the community. Special thanks to Peter Schott, Constantine Kokkinos, Enrique Arg├╝elles, and Patrick (aka "Paddyrick" on Slack) for all the healthy discussion and sharing.


Data Migration Assistant for SQL Server Upgrades

I recently took advantage of an opportunity to try Mirosoft's Data Migration Assistant. It was a good experience and I found the tool quite useful. As the documentation tells us, the DMA "helps you upgrade to a modern data platform by detecting compatibility issues that can impact database functionality in your new version of SQL Server or Azure SQL Database. DMA recommends performance and reliability improvements for your target environment and allows you to move your schema, data, and uncontained objects from your source server to your target server." For my use case, I wanted to assess a SQL 2008 R2 environment with more than a hundred user databases for an on-premises upgrade to SQL 2017.

There's not much to the Welcome screen. Simply click the + button as indicated on the left side toolbar.

Dave Mason - SQL Server - Data Migration Assistant

Pick a Project Type (Assessment in this example), enter a Project Name, and the Target server type. Note the four options

  • Azure SQL Database
  • Azure SQL Database Managed Instance
  • SQL Server on Azure Virtual Machine
  • SQL Server

Dave Mason - SQL Server - Data Migration Assistant

Clicking next brings us to the Options page. Choices here will vary slightly, based on the Target server type. Note the target versions listed--the oldest supported target version is SQL 2012. But since SQL 2008 and SQL 2008 R2 fall out of support soon, that's fine by me. For the report type, I was only interested in Compatibility issues.

Dave Mason - SQL Server - Data Migration Assistant

Dave Mason - SQL Server - Data Migration Assistant

On the next screen, specify the SQL instance to connect to, and one or more databases to assess. (If you want to assess databases from more than one instance, additional sources can be added by clicking the "Add sources" toolbar button/icon.) Here, I specified a named instance on my local machine, and chose 3 of the five user databases.

Dave Mason - SQL Server - Data Migration Assistant

Dave Mason - SQL Server - Data Migration Assistant

Review the choices and click Back to change them, or click Start Assessment. I'll go with the latter.

Dave Mason - SQL Server - Data Migration Assistant

There are some animated gif-like graphics while the assessment is running. By default, the app will use up to eight threads/connections (this behavior can be controlled by a parameter in the dma.exe.config file).

Dave Mason - SQL Server - Data Migration Assistant

After the DMA finishes assessing all the databases, a report of findings is displayed. All of the compatibility issues discovered will be displayed here. Click on a database to show what issues were found. Issues are broken out into the following types:

  • Breaking changes
  • Behavior changes
  • Deprecated features
  • Information issues

Dave Mason - SQL Server - Data Migration Assistant

Clicking/selecting a specific issue shows the Issue details, Impacted objects, and Recommended Fix(es). In the example here, there were only two issues found: unqualified joins and deprecated data types. There are tabs for multiple versions of SQL Server, from 140 (SQL Server 2017) downward. These show the impact of each issue, depending on what compatibility level the database would be placed in after an upgrade or migration. That both issues are present across the board tells us we can't make either issue "go away" by running the Northwind database on SQL 2017 in any of the lower compatibility levels.

Northwind, Pubs, and XYZ databases didn't turn up much. Since they didn't, here's a handful of other issues I found in another database:

  • Information
    • Remove references to undocumented system tables.
  • Behavior Change
    • SET ROWCOUNT used in the context of DML statements such as INSERT, UPDATE, or DELETE.
    • Full-Text Search has changed since SQL Server 2008.
    • ORDER BY specifies integer ordinal.
  • Breaking Change
    • CLR assemblies marked as SAFE or EXTERNAL_ACCESS are considered UNSAFE.
    • Constant expressions are not allowed in the ORDER BY clause in 90 or later compatibility modes.


Assessment Report

You can export a report to either a *.csv or *.json file. I didn't find either that helpful (I created a *.csv report file and opened, edited, and saved it as an Excel spreadsheet), although either can be ingested by Power BI.

Dave Mason - SQL Server - Data Migration Assistant


Feedback

The little smiley icon in the lower left is for user feedback. I clicked it and submitted a few items. Along the way, I also encountered an error with three databases that prevented the assessment from completing. I ended up contacting Microsoft via email. To my surprise, I received a response in less than an hour. They surmised there was a stored procedure in each database that caused the error. Removing it enabled the assessment to complete successfully. Thanks, Microsoft!

Dave Mason - SQL Server - Data Migration Assistant


Recap - SQLSaturday Tampa 2019

Dave Mason - SQLSaturday - SQL Server

SQLSaturday #859 in Tampa was my first event for the year. It unofficially begins "SQLSaturday season" for me. There's a handful of events in Florida and the Southeast that are relatively close and at agreeable times of the year. But back to Tampa--this was just my second time at their event, and I'm sure I'll be back again.

I debated driving to Tampa on Friday vs driving on Saturday morning. Take my time and spend for a hotel or be in a rush, but save some money. I opted for the latter. I stayed up late Friday night watching basketball and I paid the price on Saturday--I was tired. Traffic on I-4 was surprisingly good Saturday morning, and I arrived with plenty of time to spare.

I like the venue in Tampa, which is on the campus of USF. There's plenty of parking outside, and plenty of space inside, including numerous tables & chairs. It's great to have a place to sit and relax, whether it's breakfast, lunch, or just networking. I won't speak for the vendors, but it looked like they had enough elbow room too.


Sessions

There was one session in particular I was most interested in: Introduction to SQL Server Big Data Clusters by Buck Woody. Buck's session hit on a lot of topics I don't know much about: Linux, PolyBase, Containers, Kubernetes, Spark, and HDFS. This was my second time seeing Buck speak. He's equally entertaining and informative.

Another notable session I attended was Storytelling for Machine Learning and Advanced Analytics by Jen Underwood. This session also covered some topics I have an interest in, but not much personal experience. Jen's session was a "replay" on the one she presented at PASS Summit in 2018, which was nice for me since I didn't make to to Seattle last year. It was my first time seeing her speak.


No-Show

There was one session I went to where the speaker didn't show up. This gets talked about from time to time in the SQL community, especially amongst SQLSaturday organizers. A volunteer arrived after about 15 minutes to let us know the speaker wasn't coming. I felt bad for the volunteer, who seemed sheepish to make the announcement, almost as if he felt he had let us down. Obviously, it's not an ideal situation. But it's not the end of the world. Sometimes life gets in the way. I hope the speaker is safe and sound.


An Opportunity?

There were less than 10 of us in the room. I wondered if any of the attendees would want to do something impromptu? The topic for the session is something I've had a few experiences with, including a recent one. So I asked the others if they wanted to tell a problem story about Topic X. Did they fix the problem? If so, how? If they didn't fix the problem, was there someone in the room that knew how to fix it or could offer some insight?

I tried to get a conversation going a couple of times, but no one seemed interested. So I let it go.


Lunch

SQLSaturday Tampa opted for a catered lunch. It works well in their venue with the buffet/serving line in the main building adjacent to all the tables & chairs. It's hard to please everyone with lunch options, but I was a happy camper. Pork, chicken, beans & rice, along with some side items (yum!). I got to hang out with Buck for a bit, along with Jeff Taylor and Ron Dameron.


Friends

I met John Wells for the first time. We just met online for the first time back in December and had mostly talked about college football. (His LSU Tigers were playing my UCF Knights in the Fiesta Bowl.) Like me, John's career with SQL Server and IT got started a bit later in life than many others. John attends and speaks at a lot of SQLSaturday events in a lot of different parts of the US. He's easy to talk to. Go say Hi to him if you get a chance.

Robert Preseau and I went to a couple of the same sessions and got to hang out a bit. And there were many others in the #SQLFamily I bumped into.

Staying up late and waking up early makes for a sleepy attendee. I'd gone through two energy drinks during the day. After the last session, I was drained. So I skipped the raffle and hit the road for the drive home to Seminole County. Despite some pockets of congestion on I-4, I made it home in time for a late dinner.


Shredding XML Data From Extended Events

Dave Mason - SQL Server - Extended Events

Querying the data of an Extended Events session has never been easy. My XEvent sessions typically store event data in a target file, which means using sys.fn_xe_file_target_read_file. To get something of value, you need to shred the event data XML.

I might open the XEvent session properties in SSMS and make a list of the pertinent Event Fields and selected Global Fields (Actions) or take a look at the XML of the event_data (or both). Then I'll begin building my query with sys.fn_xe_file_target_read_file and end up with something like this:

;WITH XEvents AS
(
 SELECT object_name, CAST(event_data AS XML) AS event_data
 FROM sys.fn_xe_file_target_read_file ( 'system_health*.xel', NULL, NULL, NULL )  
)
SELECT object_name AS EventName,
 event_data.value ('(/event/@timestamp)[1]', 'DATETIME') AS [Time],
 event_data.value ('(/event/action[@name=''server_principal_name'']/value)[1]', 'VARCHAR(128)') AS login_name,
 event_data.value ('(/event/action[@name=''sql_text'']/value)[1]', 'VARCHAR(MAX)') AS sql_text,
 event_data.value ('(/event/action[@name=''session_id'']/value)[1]', 'BIGINT') AS session_id,
 event_data.value ('(/event/data[@name=''object_name'']/value)[1]', 'VARCHAR(128)') AS object_name,
 event_data.value ('(/event/data[@name=''duration'']/value)[1]', 'BIGINT') AS Duration,
 event_data.value ('(/event/data[@name=''physical_reads'']/value)[1]', 'BIGINT') AS physical_reads,
 event_data.value ('(/event/data[@name=''logical_reads'']/value)[1]', 'BIGINT') AS logical_reads,
 event_data.value ('(/event/data[@name=''writes'']/value)[1]', 'BIGINT') AS writes,
 event_data.value ('(/event/data[@name=''statement'']/value)[1]', 'VARCHAR(MAX)') AS statement
FROM XEvents 

It's effective and gives me what I want. But even with a reusable script and the benefits of copy/paste, building the query is time consuming and sometimes frustrating. What I've really wanted for a while is a way to autogenerate the query. I'll proceed to show one possibility. Feel free to skip to the end, if you wish. Otherwis, let's start with a couple of building block queries, using the system_health session as an example.

Here is a list of all the events selected for the session:

--Events selected for the XEvent session.
SELECT s.name SessionName, se.name EventName
FROM sys.server_event_sessions s
JOIN sys.server_event_session_events se
 ON se.event_session_id = s.event_session_id
WHERE s.name = 'system_health'

We can build off of the query above to include the Global Fields (Actions) that have been chosen for each event:

--Events selected for the XEvent session
--and the Global Fields (Actions) chosen per event.
SELECT s.name SessionName, se.name EventName, sa.name GlobalFieldName, 
 o.description, o.type_name, o.type_size
FROM sys.server_event_sessions s
JOIN sys.server_event_session_events se
 ON se.event_session_id = s.event_session_id
JOIN sys.server_event_session_actions sa
 ON sa.event_session_id = s.event_session_id
 AND sa.event_id = se.event_id
JOIN sys.dm_xe_objects o
 ON o.name = sa.name
 AND o.object_type = 'action'
WHERE s.name = 'system_health'
ORDER BY se.name, sa.name

Now for a list of Event Fields (including the Optional ones) for each event in the XEvent session:

--Events selected for the XEvent session and the 
--Event Fields (including the Optional ones) for each event.
SELECT s.name SessionName, se.name EventName, 
 c.name EventField, c.type_name 
FROM sys.server_event_sessions s
JOIN sys.server_event_session_events se
 ON se.event_session_id = s.event_session_id
JOIN sys.dm_xe_object_columns c
 ON c.object_name = se.name
 AND c.column_type = 'data' 
WHERE s.name = 'system_health'
ORDER BY se.name, c.name

For both the Global Fields and Event Fields, the type_name (data type) will be important: along with the xml.value() function, we'll want to specify an appropriate TSQL data type. A view will be used for the mapping. (A little bit of guesswork went into this.)

USE master;
GO

IF OBJECT_ID('XE_TSQL_TypeXref') IS NULL
 EXEC('CREATE VIEW XE_TSQL_TypeXref AS SELECT 1 AS Alias');
GO

/*
 This view attempts to map each Extended Events data type
 to a TSQL data type.
*/
ALTER VIEW dbo.XE_TSQL_TypeXref
AS
SELECT  
 o.name XE_type, o.description XE_type_description, 
 o.capabilities, o.capabilities_desc, o.type_size XE_type_size,
 CASE type_name
  --These mappings should be safe.
  --They correspond almost directly to each other.
  WHEN 'ansi_string' THEN 'VARCHAR(MAX)'
  WHEN 'binary_data' THEN 'VARBINARY(MAX)'
  WHEN 'boolean' THEN 'BIT'
  WHEN 'char' THEN 'VARCHAR(MAX)'
  WHEN 'guid' THEN 'UNIQUEIDENTIFIER'
  WHEN 'int16' THEN 'SMALLINT'
  WHEN 'int32' THEN 'INT'
  WHEN 'int64' THEN 'BIGINT'
  WHEN 'int8' THEN 'SMALLINT'
  WHEN 'uint16' THEN 'INT'
  WHEN 'uint32' THEN 'BIGINT'
  WHEN 'uint64' THEN 'BIGINT' --possible overflow?
  WHEN 'uint8' THEN 'SMALLINT'
  WHEN 'unicode_string' THEN 'NVARCHAR(MAX)'
  WHEN 'xml' THEN 'XML'

  --These mappings are based off of descriptions and type_size.
  WHEN 'cpu_cycle' THEN 'BIGINT'
  WHEN 'filetime' THEN 'BIGINT'
  WHEN 'wchar' THEN 'NVARCHAR(2)'

  --How many places of precision?
  WHEN 'float32' THEN 'NUMERIC(30, 4)'
  WHEN 'float64' THEN 'NUMERIC(30, 4)'

  --These mappings? Not sure. Default to NVARCHAR(MAX).
  WHEN 'activity_id' THEN 'NVARCHAR(MAX)'
  WHEN 'activity_id_xfer' THEN 'NVARCHAR(MAX)'
  WHEN 'ansi_string_ptr' THEN 'NVARCHAR(MAX)'
  WHEN 'callstack' THEN 'NVARCHAR(MAX)'
  WHEN 'guid_ptr' THEN 'NVARCHAR(MAX)'
  WHEN 'null' THEN 'NVARCHAR(MAX)'
  WHEN 'ptr' THEN 'NVARCHAR(MAX)'
  WHEN 'unicode_string_ptr' THEN 'NVARCHAR(MAX)'
 END AS SqlDataType
FROM sys.dm_xe_objects o
WHERE o.object_type = 'type'

Putting It All Together

Now for the big finish. This script combines the VIEW and parts of the other queries to dynamically build and execute a query string. Plug in the name of the XEvent session at the top.

DECLARE @XESessionName SYSNAME = 'system_health';
DECLARE @Tsql NVARCHAR(MAX) = '';

;WITH AllSessionEventFields AS
(
 --Unique Global Fields (Actions) across all events for the session.
 SELECT DISTINCT sa.name EventField, 'action' AS XmlNodeName, 
  CASE WHEN x.SqlDataType IS NULL THEN 'text' ELSE 'value' END AS XmlSubNodeName,
  'Global Fields (Action)' AS FieldType, o.type_name XE_type, 
  COALESCE(x.SqlDataType, 'NVARCHAR(MAX)') AS SqlDataType
 FROM sys.server_event_sessions s
 JOIN sys.server_event_session_events se
  ON se.event_session_id = s.event_session_id
 JOIN sys.server_event_session_actions sa
  ON sa.event_session_id = s.event_session_id
  AND sa.event_id = se.event_id
 JOIN sys.dm_xe_objects o
  ON o.name = sa.name
  AND o.object_type = 'action'
 LEFT JOIN master.dbo.XE_TSQL_TypeXref x
  ON x.XE_type = o.type_name
 WHERE s.name = @XESessionName

 UNION

 --Unique Event Fields across all events for the session.
 SELECT DISTINCT c.name EventField, 'data' AS XmlNodeName, 
  CASE WHEN x.SqlDataType IS NULL THEN 'text' ELSE 'value' END AS XmlSubNodeName,
  'Event Fields' AS FieldType, c.type_name XE_type, 
  COALESCE(x.SqlDataType, 'NVARCHAR(MAX)') AS SqlDataType
 FROM sys.server_event_sessions s
 JOIN sys.server_event_session_events se
  ON se.event_session_id = s.event_session_id
 JOIN sys.dm_xe_object_columns c
  ON c.object_name = se.name
  AND c.column_type = 'data' 
 LEFT JOIN master.dbo.XE_TSQL_TypeXref x
  ON x.XE_type = c.type_name
 WHERE s.name = @XESessionName
)
SELECT @Tsql = @Tsql + CHAR(9) + 
 CASE
  WHEN f.SqlDataType = 'XML' THEN
   'event_data.query (''(/event/' + f.XmlNodeName + '[@name=''''' + f.EventField + ''''']/' +
    f.XmlSubNodeName + ')[1]'') AS [' + f.EventField + '],' + CHAR(13) + CHAR(10)
  ELSE
   'event_data.value (''(/event/' + f.XmlNodeName + '[@name=''''' + f.EventField + ''''']/' +
    f.XmlSubNodeName + ')[1]'', ''' + f.SqlDataType + ''') AS [' + f.EventField + '],' + CHAR(13) + CHAR(10)
  END
FROM AllSessionEventFields f
ORDER BY f.EventField

SELECT @Tsql = LEFT(@Tsql, LEN(@Tsql) - 3);
SELECT @Tsql = ';WITH XEvents AS
(
 SELECT object_name, CAST(event_data AS XML) AS event_data
 FROM sys.fn_xe_file_target_read_file ( ''' + @XESessionName + '*.xel'', NULL, NULL, NULL )  
)
SELECT object_name, event_data,' + CHAR(13) + CHAR(10) + @Tsql + '
FROM XEvents;';

PRINT @Tsql;
EXEC(@Tsql);

The output for the system_health session is really "wide"--there are 19 events and over a hundred different Global Fields and Event Fields. Since those fields vary a bit from one event to the next, there will be a lot of NULLs in the result set.


When a SQL Server Data File Reaches Maximum Capacity

Dave Mason - SQL Server - Data File

Did you know the maximum capacity of a SQL Server data file is 16 TB? I didn't either. And I recently learned the hard way. When I got the call, I got as much information as I could, and started sleuthing. There was some blocking, and the head of the blocking chain was consistently the same stored procedure--let's call it dbo.SaveData. I ran a few more scripts to find "expensive" queries via sys.dm_exec_query_stats and sys.dm_exec_procedure_stats and sure enough, dbo.SaveData had its fingerprints there too.

I was told there had been a not-so-recent failover. The maintenance jobs for index defragmentation had not been run in a while. This particular instance used maintenance plans along with index reorganization operations. There were no statistics updates. Index reorganizations had been started manually and were still running. Based on past performance, I was told it could take around 24 hours to complete.

Execution plans for the stored proc showed SELECTS as really efficient compared to INSERTS, UPDATES, and DELETES. There were a couple of tables involved: a queue table that had less than 100 rows, and a permanent table that had more than 50 million. The permanent table (let's call it dbo.Docs) had a UNNIQUEIDENTIFIER column as the clustered index with a 90% FILL FACTOR. The table was in the PRIMARY FILEGROUP on D:\, however, there was also an IMAGE column that was on a separate FILEGROUP on G:\. Normally I'd be very suspicious of any table with a UNIQUEIDENTIFIER column as a cluster key. But this table only had three columns: the third column was a ROWVERSION data type. Even if there were crazy page splits happening, each row only required 24 bytes for the UNIQUEIDENTIFER/ROWVERSION columns (assuming the IMAGE column was always stored off-row in the other FILEGROUP). I didn't look up the full calculation, but I think that's something in the neighborhood of 300 rows per page. As slow as the system was running, it would have been a miracle if there were 300 inserts in an hour.

There was another DMV I had checked a number of times: sys.dm_exec_requests. I was looking for open transactions. I found them on a regular basis, but they would eventually "close" (presumably commits). I also noticed SOS_SCHEDULER_YIELD waits were showing up almost every time dbo.SaveData was executed. I knew there were a couple of common knee-jerk reactions associated with that wait type. I also remember looking that one up on SQLSkills on numerous occasions. This wait type often is related to SQL doing a lot of scanning of pages in memory.

Everything I've covered so far was what I found before I took action. So, what did I do? I humbly admit I had no idea what the problem was. My first order of business was to update statistics on the queue table. I felt confident that this was very likely unnecessary because of the small number of rows in the table. But because it was so small, I figured why not? Then I updated statistics on dbo.Docs. That took a long time. Afterwards, there was no appreciable change in performance. Nor was there a change in execution plans for the stored proc. I took multiple second looks at the stored proc, its code, and its execution plan. I didn't sense any code smells to indicate parameter sniffing. Nor were there signs of implicit conversions in the execution plans. There was about 0.1% fragmentation on the clustered index of dbo.Docs.

I could see that most of the stored proc execution calls were coming from the same server, which was a processing server running a web service. The customer decided to shut down the service and have a quick discussion with his dev team to see if there had been any recent changes deployed. We called it a night. I got a text the next morning. The server was rebooted, and performance had improved noticeably. None to my surprise, I got another call later that day: performance had deteriorated again.

I got back online and looked at all the same things I'd looked at before and still was puzzled. Something new I noticed, though was a huge volume of logical reads for the INSERT statement to dbo.Docs. The logical_reads column of sys.dm_exec_requests was showing a ginormous number, possibly as much as 6 digits worth of page reads. I knew that meant something, but I didn't know what. I mentioned this to the client and they told me the G:\ drive was getting thrashed.

Hmmm. Maybe the page splits that I dismissed earlier *were* a problem. I wasn't sure how the off-row IMAGE data storage was handled. A "document" was often several megabytes, so if it was stored on pages like other data, each would span multiple pages. That didn't sound like it would lead to page splits. But even as I write this, I'm not sure. I had another thought at this point: could the autogrowth events be happening in really small increments like 1 MB? Is IFI enabled for the Windows account running the database engine service? I checked the file properties for probably the second or third time. Yes, I had remembered correctly. Autogrowth was set to 5 GB. I also learned IFI was indeed enabled. I ran a query to check free space in the data file on G:\. Wait! What's this?

Why is there zero free space available in that file? I suppose that's technically possible, if not unlikely. And then I looked at that file size. It's a lot larger than the others. If it had started with 15 or 17, I might have thought nothing further. But 16... That's 2 to the 4th power. Did I hit an upper limit on data file size? Why yes, yes I did! After a few attempts, I was able to add another file to the existing FILEGROUP. Within moments, performance resumed to "normal".

Looking back I feel humbled that it took me so long to find the problem. I've come away from the experience thinking a better DBA might have solved it much more quickly. Heck, in a way I got lucky there at the end. I almost didn't recognize the solution when it was in front of me. I was very close to moving on. I suspect when the existing file could no longer grow, there was still some internal free space due to the 90% FILL FACTOR. SQL Server performed an enormous amount of IO (which included all those logical reads I observed) trying to find space for new rows in that 10% of unallocated space. I'm also assuming it was widely scattered and non-contiguous. Every insert may have led to a full table scan. I wish I could prove that supposition. It's speculation at this point. While there is still some frustration, I am pleased that I found the solution. The client (and their clients) are all happy now. I should also add I found no evidence in the error log of any issue with the maxed out data file. When I get a chance, I'll review sys.messages to see if there's any related error message. I'm not optimistic.

So that's what happened to me when a SQL Server data file reached it's maximum capacity. What would you have done? Where do you think I went wrong? What did I get right? Comments welcome!


CSV Row Counts

Dave Mason - SQL Server - CSV Row Count

This is one of those posts where if my opening premise is wrong, the whole thing is a moot point. I've been wrong before, so let's roll! There doesn't seem to be an efficient way to get a count of the number of records in a CSV file. Every record can be a different number of characters/bytes. So in my not-so-expert conclusion, you have to open the CSV file and read it into memory in its entirety to get the record count. This can be quite inefficient for large files.

I was reminded of this recently as I was working with R, trying to read a nearly 2 GB data file. I wanted to read in 5% of the data and output it to a smaller file that would make the test code run faster. The particular function I was working with needed a row count as one of its parameters. For me, that meant I had to determine the number of rows in the source file and multiply by 0.05. I tied the code for all of those tasks into one script block.

Now, none to my surprise, it was slow. In my short experience, I've found R isn't particularly snappy--even when the data can fit comfortably in memory. I was pretty sure I could beat R's record count performance handily with C#. And I did. I found some related questions on StackOverflow. A small handful of answers discussed the efficiency of various approaches. I only tried two C# variations: my original attempt, and a second version that was supposed to be faster (the improvement was nominal).

Sometime later I remembered OPENROWSET for SQL Server. And I wondered...how fast would SQL Server count records in a CSV file? Pretty fast, it turns out. I didn't spend a ton of time on the R, C#, or T-SQL code. It's below if you want to take a look. In short, R was the slowest, C# was pretty fast, and T-SQL with OPENROWSET was the fastest.

R C# T-SQL
122.60 seconds 1st Version: 12.7982 seconds
2nd Version: 12.2451 seconds
6.996 seconds

R

> begin <- proc.time()
> setwd("C:/Data/")
> filepath <- file("yellow_tripdata_2016-01.csv",open="r")
> maxLinesToRead <- 20000
> counter <- 0
> ( while((linesread <- length(readLines(con = filepath, n = maxLinesToRead))) > 0 ) 
+   counter <- counter+linesread )
NULL
Warning message:
closing unused connection 3 (yellow_tripdata_2016-01.csv) 
> close(filepath)
> counter
[1] 10906859
> proc.time() - begin
   user  system elapsed 
 116.90    3.08  122.60 
> paste("Record Count", counter, sep = ": ")
[1] "Record Count: 10906859"


C#

//Version 1
int counter = 0;
DateTime start = DateTime.Now;

using (System.IO.StreamReader file = 
    new System.IO.StreamReader(@"C:\Data\yellow_tripdata_2016-01.csv"))
{
    while (file.ReadLine() != null)
    {
        counter++;
    }

    file.Close();
}

DateTime end = DateTime.Now;
TimeSpan ellapsed = end - start;
Console.WriteLine("Record count: " + counter.ToString());
Console.WriteLine(ellapsed.ToString("c"));

//Version 2
int counter = 0;
DateTime start = DateTime.Now;
counter = System.IO.File.ReadLines(@"C:\Data\yellow_tripdata_2016-01.csv").Count();
DateTime end = DateTime.Now;
TimeSpan ellapsed = end - start;
Console.WriteLine("Record count: " + counter.ToString());
Console.WriteLine(ellapsed.ToString("c"));


T-SQL

DECLARE @Start DATETIME2(7) = CURRENT_TIMESTAMP;

SELECT COUNT(*)
FROM OPENROWSET (
    BULK 'C:\Data\yellow_tripdata_2016-01.csv',
    FORMATFILE = 'C:\Data\VarCharMax.xml'
) AS d

SELECT DATEDIFF_BIG(MILLISECOND, @Start, CURRENT_TIMESTAMP) / 1000.0


OPENROWSET Format File

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
  <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\r\n" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
 </RECORD>
 <ROW>
  <COLUMN SOURCE="1" NAME="VarCharMax" xsi:type="SQLVARYCHAR"/>
 </ROW>
</BCPFORMAT>


T-SQL Tuesday #110 - Automation

Dave Mason T-SQL Tuesday

For T-SQL Tuesday #110, Garry Bargsley (b|t) asks, what does “Automate All the Things” mean to you? What do you want to automate or what automation are you proud of completing? What is your go-to technology for automation?


2018 Year End Review

SQL Server - Dave Mason - 2018

2018 was a busy year for me. There were some misses and lost opportunities, to be sure. But overall, I had some good productivity and growth. I'm not a New Year's resolution person, so there's no checklist of goals to review from a year ago. I'll go mostly from memory. Here goes...


New Job

Towards the end of 2017, I left a full time job to give consulting a try. As January and February unfolded, I finally started getting enough billable hours to make a go of it. A full time job offer found its way to me a couple of months later and I decided to take it. I never announced it here or on social media, but my employer is now Ntirety. I'm a "SQL Server consultant" according to the HR department. It's mostly DBA-type work, but I am doing some occasional SQL development, C# development, and PowerShell development. I'm also doing a fair amount of T-SQL optimization, which I enjoy greatly.


New Blog

Like many of you, I've taken an interest in Data Science. I began learning R earlier this year, and I started a new blog Zero To R to document some of my learning. I've only posted 16 times so far, and to be honest, the blog has stalled a bit. Hopefully that changes soon.


Certification

At the end of June, I passed Exam 70-767: Implementing a Data Warehouse. If you wish, summon the vast powers of certification and read about it here.


PASS Summit

Sadly, I missed PASS Summit 2018. It's the second year in a row I've not been able to attend. The dates for 2019 look agreeable, and I should have enough PTO to make it work. Maybe with a little luck, I'll even make it to SQL Saturday Portland the weekend before.


SQLSaturday

For 2018, I made it to a handful of SQLSaturday events: Tampa (1st time), Jacksonville (3rd?), Atlanta (2nd), and Louisville (2nd). Geographically, I'm in a decent location with many SQLSaturday locations within driving distance. Maybe South Florida and Panama Beach in 2019? Oh, and then there's SQLSaturday Orlando. 2018 was the second time I lent a hand with Andy Warren and the other organizers. It's tough work--and kind of stressful when you're responsible for the big event.


Code Contributions

I haven't been much of a GitHub user and contributor over the years. But I did make my modest SQL to Excel project available there. It seems to be well received--special thanks to everyone at Brent Ozar Unlimited for promoting the project in your newsletter. There were also a few snippets of PowerShell I included with my SQL Server Features Discovery Report post. The folks at dbatools turned it into the Get-DbaFeature cmdlet. Very cool!


SQL Family

I got to meet a number of people face-to-face for the first time this year, mostly at SQLSaturday events. And of course it's always nice to catch up with old friends too. A lot of you are on Twitter and listened to my rants about UCF and ... and ... OK, mostly UCF stuff. Thanks for sticking it out with this college football fanatic. Slack has been a good place to stay in touch with friends and make new friends too. The conversations about SQL, R, PowerShell, and anything else imaginable are fun and informative. There are so many people there willing to share their knowledge and experience in support of others in the community.



Prepared T-SQL Statements

Dave Mason - SQL Server - Prepared Statements

I recently started reading Grant Fritchey's book SQL Server Execution Plans. I didn't get too many pages deep before the topic of query parameterization and prepared statements came up. As Grant notes, prepared statements "are built from the application code". Prepared T-SQL statements in SQL Server have been somewhat mysterious to me for many years. It's something I felt I should have a better understanding of. I thought to myself, "I'm a developer. I should be able to figure this out." So I decided to go down the rabbit hole and see if I could cobble together an example of a prepared statement using C#.

I quickly found my way to Microsoft documentation for the SqlCommand.Prepare method. To my surprise, that method dates all the way back to version 1.1 of the .NET Framework, which was released in 2003. I was a new C# developer at that time. I didn't know about the method then. And as you're probably figuring out, I just learned about it now.

Before we dive into some code, I want to make a plea to the developers that may be reading. If you arrived here in search of a "How-To" article, please stop and reconsider. In my opinion, prepared statements via sp_prepexec and sp_executevare a poor choice for query parameterization. There are better options, including using stored procedures, parameterizing in T-SQL with sp_executesql, and parameterizing ad-hoc queries with the SqlCommand object. I'm writing from the perspective of a DBA, sharing what I've learned with other DBAs.

OK, DBAs. Are you ready for some C# code? I'll keep it simple and painless. Here's what the code is going to do: connect to SQL Server and insert three rows to dbo.Stooges. That's it. I'll use an ad-hoc query that is parameterized--there should be no risk of SQL injection. The query will be re-used for all three inserts. And SQL Server will use the same execution plan for each insert. Here is the code:


using (SqlConnection conn = new SqlConnection(GetConnectionString()))
{
    conn.Open();

    using (SqlCommand cmd = new SqlCommand())
    {
        cmd.Connection = conn;

        // Create and prepare a SQL statement.
        cmd.CommandText =
            "INSERT INTO dbo.Stooges (StoogeName, StoogeNumber) " +
            "VALUES (@name, @number)";
        SqlParameter parName = new SqlParameter("@name", SqlDbType.Text, 128);
        SqlParameter parNumber = new SqlParameter("@number", SqlDbType.TinyInt, 0); 
        parName.Value = "Larry";
        parNumber.Value = 1;
        cmd.Parameters.Add(parName);
        cmd.Parameters.Add(parNumber);

        // Call Prepare after setting the Commandtext and Parameters.
        cmd.Prepare();
        cmd.ExecuteNonQuery();

        // Change parameter values and call ExecuteNonQuery.
        cmd.Parameters[0].Value = "Curly";
        cmd.Parameters[1].Value = 2;
        cmd.ExecuteNonQuery();

        // Change parameter values and call ExecuteNonQuery again.
        cmd.Parameters[0].Value = "Moe";
        cmd.Parameters[1].Value = 3;
        cmd.ExecuteNonQuery();
    }
}

At line 1 a connection to SQL Server is created. The connection string identifies the connectivity info. In this example, GetConnectionString() is a user-defined method that builds and returns our connection string. I've omitted the code for that method for brevity. At line 3 the connection is opened.

At line 5 a SqlCommand object is created. You can think of this as a "query" for SQL Server. The query string for the INSERT statement is created starting at line 10. You'll notice the @name and @number placeholders for the parameter values. Next on lines 13 and 14, we create SqlParameter objects for @name and @number. Each SqlParameter has a parameter name, a SqlDbType (aka a data type), and a size. The SqlParameter variables are identified as parName and parNumber.

At lines 15 and 16, parameter values are set in anticipation of our first INSERT for Larry. Then at lines 17 and 18, parName and parNumber are each added to cmd.Parameters, which is a collection of zero or more SqlParameters.

Line 21 is where we "prepare" our query. Specifically, it creates a prepared, reusable version of the command (our INSERT statement) on an instance of SQL Server. Line 22 INSERTs our first stooge by invoking cmd.ExecuteNonQuery(). Most of the hard work is done now. Adding more stooges gets easier from here on out.

Our SqlCommand, cmd still has two parameters in the collection. But they contain the values for Larry. On lines 25 and 26, we assign new values for Curly, who is INSERTed on line 27 where we once again invoke cmd.ExecuteNonQuery(). The process is repeated for Moe on lines 30 - 32.

And that's it from the application perspective. There's not really that much application code involved. If you were monitoring this activity via an Extended Events session (or a Profiler trace) it would look similar to this:

Dave Mason - SQL Server - Prepared Statements

I'd seen stored procedures sp_prepexec and sp_execute during monitoring here and there throughout the years. But I never had a good grasp on how those stored procs work or what they were doing. Now I have a much better idea.

What about the execution plan(s)? How many did the prepared statement actually use? A quick look at sys.dm_exec_query_stats tells us that as advertised, only one execution plan was used.

Dave Mason - SQL Server - Prepared Statements