Rounding to 15 Minute Intervals

Dave Mason - TSQL - SQL Server

A recent PowerBI experience presented me with a situation where I wanted to round Date/Time data to the nearest 15 minute increment. Although my PBI experience isn't vast, I don't find the DAX language to be difficult (especially for those that have written formulas in Excel). I was discouraged at how unhelpful the DAX Date/Time function were. After the code was written, I stepped back, pondered what I had written, and concluded it was a verbose and unintuitive mess. Surely I could do better in T-SQL. Or could I?


Query 1

Here's the first T-SQL query I came up with. (I allowed for mid-rounding, rounding up, or rounding down to the nearest 15 minute increment to meet the rounding criteria.) It calculates the difference in minutes between the DATE part of a DATETIME, subtracts the number of minutes beyond the most recent 15 minute interval, and adds the number of minutes to the DATE part. This is arguably the most intuitive query of the bunch. But it feels like an excessive amount of code to achieve such a simple request:

DECLARE @Now DATETIME = CURRENT_TIMESTAMP

--Round down to previous 15 min increment.
SELECT 
 DATEADD(
  MINUTE,
  DATEDIFF(MINUTE, CAST(CAST(@Now AS DATE) AS DATETIME), @Now) - 
   (DATEDIFF(MINUTE, CAST(CAST(@Now AS DATE) AS DATETIME), @Now) % 15),
  CAST(CAST(@Now AS DATE) AS DATETIME)
 )

The next few queries use a tactic that might be considered hack-ish: adding a DATETIME with a number type. Adding a whole number X to a DATETIME results in a DATETIME exactly X days after (or before for a negative number). A real number would add (or subtract) portions of a day:

SELECT 
 CURRENT_TIMESTAMP + 3 AS [3 Days Ahead],
 CURRENT_TIMESTAMP + 0.5 AS [½ Day Ahead],
 CURRENT_TIMESTAMP AS [Now]
GO

3 Days Ahead             ½ Day Ahead               Now
-----------------------  -----------------------   -----------------------
2019-08-20 16:49:18.340  2019-08-18 04:49:18.340   2019-08-17 16:49:18.340

(1 row affected)


Query 2

The second query truncates a DATETIME (removes the TIME portion) by casting it to a DATE and then back to a DATETIME. Then it calculates the number of minutes in the TIME part of the DATETIME, subtracts the number of minutes beyond the most recent 15 minute interval, divides that value by 1440 (the number of minutes in a day), and adds the value to the truncated DATETIME. There's a little less code, but it's far from intuitive:

DECLARE @Now DATETIME = CURRENT_TIMESTAMP

--Round down to previous 15 min increment.
SELECT
 CAST(CAST(@Now AS DATE) AS DATETIME) + 
  ((DATEDIFF(MINUTE, CAST(CAST(@Now AS DATE) AS DATETIME), @Now) - 
  (DATEDIFF(MINUTE, CAST(CAST(@Now AS DATE) AS DATETIME), @Now) % 15)) / 1440.000)


Query 3

The third query is similar to Query 2. To calculate the number of minutes (rounded down to the nearest 15 minute increment), it makes use of the TIME data type, which reduces the number of calls to the CAST function. The code gets a little bit smaller than the previous query, but it's equally unintuitive.

DECLARE @Now DATETIME = CURRENT_TIMESTAMP

--Round down to previous 15 min increment.
SELECT 
 CAST(CAST(@Now AS DATE) AS DATETIME) + 
  ((DATEDIFF(MINUTE, 0, CAST(@Now AS TIME)) - 
  (DATEDIFF(MINUTE, 0, CAST(@Now AS TIME)) % 15)) / 1440.000)


Query 4

Unlike the others, this query (appropriately) rounds time both up and down to the nearest 15 minute increment. As the two queries before, it truncates a DATETIME (removes the TIME portion) by casting it to a DATE and then back to a DATETIME. Then the number of minutes in the TIME portion is calculated, divided by 60 to convert to hours, multiplied by four, rounded to the nearest whole number, divided by four to convert back to hours, and divided by 24 to convert to days. The amount of code is reduced once more, and made even less intuitive:

DECLARE @Now DATETIME = CURRENT_TIMESTAMP

--Round to nearest 15 minute increment.
SELECT
 CAST(CAST(@Now AS DATE) AS DATETIME) + 
  ROUND(DATEDIFF(MINUTE, 0, CAST(@Now AS TIME)) / 60.0 * 4, 0) / 4.0 / 24.0


Query 5

The last query is the smallest and the weirdest. It casts a DATETIME to NUMERIC(38, 22). This represents the number of days past Jan 1st, 1900, with the digits to the right of the decimal place representing the number of hours, minutes, seconds, and fractions of a second (if any) past midnight. This is multiplied by 24 to convert to hours, multiplied by 4, rounded to zero decimal places, divided by four to convert back to hours, divided by 24 to convert back to days, and finally cast back to a DATETIME.

DECLARE @Now DATETIME = CURRENT_TIMESTAMP;

--Round to nearest 15 minute increment.
SELECT 
 CAST(ROUND(CAST(@Now AS NUMERIC(38, 22)) * 24 * 4, 0) / 4 / 24 AS DATETIME)


If I had to choose any of these options for production, I'd probably go with Query 1, just because I think it would make the most sense to any other developer that might encounter it (including future me, who probably wouldn't remember writing the code). But I'm not really thrilled with any of the options. The one thing T-SQL has going for it though, is the relative ease for truncating the TIME off of a DATETIME by casting to DATE and back to DATETIME. I couldn't find anything like this in DAX. Speaking of which, here's what the DAX function in my PowerBI report looked like:

Nearest15 = DATE(YEAR([SomeDate]), MONTH([SomeDate]), DAY([SomeDate])) + 
HOUR([SomeDate])/24 + 
INT( MINUTE([SomeDate]) /15) * 15 / 1440

That doesn't make any more or less sense to me than the T-SQL options. Truncating the time from a Date/Time in DAX felt really awkward and uncomfortable: I had to use the clumsy DATE function, passing in the individual date parts via the YEAR, MONTH, and DAY functions. If DAX has a better way, I couldn't find it in the documentation.

.NET Framework developers are probably chuckling at all of this hand-wringing. Here's a PowerShell example showing how easy it is for them:

 #Round to nearest 15 minute interval.
[DateTime]::Today.AddHours([Math]::Round([DateTime]::Now.TimeOfDay.TotalHours * 4) / 4) 

Thanks for dropping by. May all your code be concise and easy to read!


Nonprintable ASCII Characters And Your Data

Dave Mason - SQL Server - ASCII

The ASCII character set is comprised of 128 characters. Many of them are nonprintable. When those characters appear within your data, it can get a little confusing. Especially when querying the data with SSMS, or outputting it to a file, or... Well, you get the idea.

The specific ASCII codes that are nonprintable are the ones from 0 to 31 and 127. We can "see" what those characters look like a few different ways with SSMS. Here is a query to run:

;WITH NonPrintableAsciiCodes AS
(
 SELECT 0 AS IntCode
 UNION ALL
 SELECT 127 AS IntCode
 UNION ALL
 SELECT IntCode + 1
 FROM NonPrintableAsciiCodes
 WHERE IntCode < 31
)
SELECT IntCode,
 CHAR(IntCode) CharValue,
 '%' + CHAR(IntCode) + '%' EnclosedChar,
 '%' + RTRIM(CHAR(IntCode)) + '%' EnclosedTrimmedChar
FROM NonPrintableAsciiCodes
ORDER BY IntCode

Below is the query result with output results to grid in SSMS (CTRL + D). The first column shows the integer code of the ASCII character. The second column attempts to show the actual ASCII character. The third column shows the ASCII character enclosed in a percent % character. The last column attempts to "trim" the ASCII character, also enclosed in a percent % character: this demonstrates that what may appear to be a space character really isn't one.

Here is the same query result, this time with SSMS results to text (CTRL + T). Notice the integer code 9 is significantly out of alignment with the others, while integer codes 10 and 13 spill onto multiple lines:

Integer codes 9, 10, and 13 correspond to Tab, Line Feed, and Carriage Return characters respectively. If I encounter them within SQL Server data, I generally leave those intact. Although it's fair to question why formatting characters are stored in a relational database. If any of the other nonprintable characters are discovered, I find those are usually accidental outliers. Those can be replaced, usually with a single space character or an empty string '', depending on the data.

Be wary of nonprintable characters in your data. They can be really deceptive.


R Script Visual in Power BI

Dave Mason - Power BI

Power BI desktop provides an option to visualize data with R. I decided to give this a try after exhausting the limits of my knowledge with the other built-in Power BI visualizations. My experience with both R and Power BI is fairly limited, so I covered a lot of new territory with this exercise. There may have been other, better options to reach the goal I was aiming for. But so be it--I learned a lot in the process. Hopefully you get something of value from this recap.

R Environment

The R engine isn't included with the installation of Power BI desktop. I won't go into detail on this, so just know you'd need to install that separately. I had already installed the R component as part of Machine Learning Services for SQL Server 2017. I also had RStudio installed. Within Power BI desktop, take a moment to click File | Options and settings | Options to open the Options page. Then click R scripting in the list of Global Options. Here you'll see options to set the R home directory and the desired R IDE.

Power BI - R script options

In the Visualizations pane, the block "R" icon should be easy to spot. When clicked the first time in the Power BI report, you'll be prompted to enable script visuals:

Power BI - Enable script visuals

Click Enable, and you'll see a placeholder for the R script visual on the canvas, with the R script editor below it:

Power BI - R script visual

Data Fields

In the images above, note the existing "Query1" with three data fields: Backup Finish Date, BackupType, and Database. The source of this data is a simple SQL Server query:

SELECT bs.database_name, 
 bs.backup_finish_date,
 CASE bs.type
  WHEN 'D' THEN 'Full'
  WHEN 'I' THEN 'Differential'
  WHEN 'L' THEN 'Log'
 END AS BackupType
FROM msdb.dbo.backupset bs;

With the R script visual selected on the canvas, tick the checkbox next to each data field you want to use for the visual. (You can also drag & drop fields into the Values pane, or on the R script visual on the canvas. This auto-populates the R script editor, where you'll see the names of the fields that were selected.

Power BI - R script visual

Note the hierarchy for Backup Finish Date. The individual hierarchy fields are referenced in the R script editor code. I want the Backup Finish Date as a whole, though. So I address this in the Values pane:

Power BI - Hierarchy

The fields in the R script editor are updated accordingly:

Power BI - R script editor

Next, I write the R code. The R script editor pane is small and doesn't look user-friendly for development activity. But Power BI gives us the option to use an external development environment. Click the arrow icon for this option.

Power BI - Edit script in external IDE

For me, this opened RStudio with a few lines of code pre-written to create an R data frame. This mimics the dataset in Power BI. (Code is slightly formatted for readability.)

# Input load. Please do not change #
`dataset` = read.csv(
  'C:/Users/Dave/REditorWrapper_.../input_df_....csv',
    check.names = FALSE,
    encoding = "UTF-8",
    blank.lines.skip = FALSE);
# Original Script. Please update your script content here and once 
# completed copy below section back to the original editing window #
##############################

The rest of the R code is outside the scope of this blog post, but it's included for reference. The visual I create uses the ggplot function from the ggplot2 package:

library(ggplot2)
library(scales)

dataset$"Backup Finish Date" <- as.POSIXct(dataset$"Backup Finish Date", format = "%Y-%m-%dT%H:%M:%S")
dataset <- dataset[order(dataset$BackupType, decreasing = TRUE),]

colorValues <- c("Full" = "blue", "Differential" = "magenta", "Log" = "yellow")

ggplot(dataset, aes(
        dataset$"Backup Finish Date", 
        dataset$Database, 
        colour = dataset$BackupType)) + 
    geom_point(aes(size = BackupType != "Log")) + 
    scale_size_manual(values=c(1,3), guide = FALSE) +
    xlab("Backup Date") +
    ylab("Database Name") +
    labs(colour = "Backup Type") +
    scale_color_manual(values=colorValues) + 
    scale_y_discrete(limits = rev(levels(dataset$Database))) +
    scale_x_datetime(labels = date_format("%b-%d")) +
    theme(axis.text = element_text(size=16),
          axis.title = element_text(size=18),
          legend.title = element_text(size=18),
          legend.text = element_text(size=16))

The R code above is copied from RStudio and pasted into the R script editor in Power BI. To view the visual, click the Run Script icon in the R script editor:

Power BI - R script editor, run script

Power BI - R script visual

Lessons Learned

I stubbed my toe a number of times learning how to make this work. Numerous times I clicked the Run Script icon only for Power BI to tell me there was a script error. Most of my issues were attributed to my inexperience with R. But there may have been a Power BI-related issue or two as well. I'll look to blog about this in another post.


Splitting Strings With OPENJSON

SQL Server - String Split - OPENJSON - JSON_VALUE - JSON

Starting with SQL Server 2016, Microsoft provided a STRING_SPLIT function. It is a table-valued function that splits a string into rows of substrings, based on a specified separator character. It's been a welcome addition that we waited a long time for. It has one shortcoming, though: the order of the output rows is not guaranteed to match the order of the substrings in the input string.

Microsoft also provided support for parsing JSON data starting with SQL Server 2016. I discovered the OPENJSON function can be used to split strings, and it can also return the ordinal position of each substring from the original input string. Here is a simple example of the OPENJSON function:

DECLARE @JsonData NVARCHAR(MAX) = '{  
   "Team":"Boston Celtics",  
   "Season":"2018-2019",   
   "Players":["Al Horford","Jayson Tatum","Jaylen Brown","Kyrie Irving","Marcus Smart", "Gordon Hayward", "Marcus Morris","Terry Rozier"],  
   "Jersey Numbers":[42, 0, 7, 11, 36, 20, 13, 12]
}';
SELECT j.[key], j.value
FROM OPENJSON(@JsonData) AS j

SQL Server - String Split - OPENJSON - JSON_VALUE - JSON

The "Players" and "Jersey Numbers" keys both have arrays for values. The arrays themselves are JSON data and can be passed directly to the OPENJSON function. Let's start with the "Jersey Numbers" array:

--Split integer string.
DECLARE @JsonArray NVARCHAR(MAX) = '[42, 0, 7, 11, 36, 20, 13, 12]';
SELECT j.[key] AS ArrayIndex, j.value AS ElementValue
FROM OPENJSON(@JsonArray) AS j

SQL Server - String Split - OPENJSON - JSON_VALUE - JSON

The query returns the key and value columns of the OPENJSON function (aliased as ArrayIndex and ElementValue respectively). Each integer is returned as a separate row, along with the zero-based ordinal position of the integer from the original string. The integer element values can be ordered, keeping the proper array index key:

--Split integer string, order by integer value.
DECLARE @JsonArray NVARCHAR(MAX) = '[42, 0, 7, 11, 36, 20, 13, 12]';
SELECT j.[key] AS ArrayIndex, j.value AS ElementValue
FROM OPENJSON(@JsonArray) AS j
ORDER BY CAST(j.value AS INT)

SQL Server - String Split - OPENJSON - JSON_VALUE - JSON

If we wanted to select a specific element from the array, we can use a WHERE clause:

--Split integer string, select only the 4th element.
DECLARE @JsonArray NVARCHAR(MAX) = '[42, 0, 7, 11, 36, 20, 13, 12]';
SELECT j.[key] AS ArrayIndex, j.value AS ElementValue
FROM OPENJSON(@JsonArray) AS j
WHERE j.[key] = 3

SQL Server - String Split - OPENJSON - JSON_VALUE - JSON

Perhaps even better, we can use the JSON_VALUE function to select a specific element from an array, and without splitting the string:

--Select the 4th element directly via JSON_VALUE
DECLARE @JsonArray NVARCHAR(MAX) = '[42, 0, 7, 11, 36, 20, 13, 12]';
SELECT JSON_VALUE(@JsonArray, '$[3]') AS ElementValue;

SQL Server - String Split - OPENJSON - JSON_VALUE - JSON


Strings

All of the examples above work with string arrays. Here is an example with player names:

DECLARE @JsonArray NVARCHAR(MAX) = '["Al Horford","Jayson Tatum","Jaylen Brown","Kyrie Irving","Marcus Smart", "Gordon Hayward", "Marcus Morris","Terry Rozier"]';
SELECT j.[key] AS ArrayIndex, j.value AS ElementValue
FROM OPENJSON(@JsonArray) AS j

SQL Server - String Split - OPENJSON - JSON_VALUE - JSON

The strings must be enclosed in double quotes to be properly formatted JSON data, however. If they are omitted, an error occurs:

DECLARE @JsonArray NVARCHAR(MAX) = '[Al Horford,Jayson Tatum,Jaylen Brown,Kyrie Irving,Marcus Smart, Gordon Hayward, Marcus Morris,Terry Rozier]';
SELECT j.[key] AS ArrayIndex, j.value AS ElementValue
FROM OPENJSON(@JsonArray) AS j
Msg 13609, Level 16, State 4, Line 39
JSON text is not properly formatted. Unexpected character 'A' is found at position 1.

Using the REPLACE function to fix up TSQL string data is an option that comes to mind. Admittedly, additional lines of code (and possibly precious CPU cycles) are necessary. You'll have to decide if this is worth the extra effort.

Splitting strings with JSON is an interesting option for a problem that has mostly been solved with STRING_SPLIT. I might use this option, but only if I need to guarantee ordinal position of substring elements. Otherwise, I'll stick with STRING_SPLIT. Happy coding!


Parsing JSON Data In SQL Server

SQL Server - OPENJSON - JSON_VALUE - JSON

Microsoft added support for JSON data beginning with SQL Server 2016. JSON is an open-standard file format consisting of attribute–value pairs and array data types. It is commonly used to transmit data objects for asynchronous browser–server communication. But it is also used for storing unstructured data in files or NoSQL databases such as Microsoft Azure Cosmos DB. For most of us, SQL Server's support for JSON probably means two things: we can convert relational data to JSON and vice versa. In this post, I'll focus on converting JSON to relational data and share what I've learned from a recent experience.

Let's begin with a look at some sample data. This example from Microsoft's documentation (slightly modified) shows the two JSON data structures: key/value pairs and arrays (an ordered list of values).

{  
   "NullValue":null,  
   "StringValue":"John",  
   "IntValue":45,  
   "DecimalValue":6.7,
   "TrueValue":true,  
   "FalseValue":false,  
   "ArrayValue":["a","r","r","a","y"],  
   "ObjectValue":{"obj":"ect"}  
}

Objects are enclosed in curly braces. The entire block of data is enclosed in curly braces--it is a JSON object. It consists of key/value pairs including "StringValue":"John", "IntValue":45, and "FalseValue":false. The last key "ObjectValue" has a value that is another object, which is a very simple object consisting of a single key/value pair: {"obj":"ect"}.

Array values are enclosed in square brackets within JSON data. The "ArrayValue" key has an array for a value. It consists of five, single-character strings. Here, the array values are all single character strings ["a","r","r","a","y"]. We'll get to JSON types in a moment, but for now I'll quickly note array values don't have to be of the same type.


OPENSJON

SQL Server provides the OPENJSON function to "read" JSON data. It is a table-valued function that parses JSON text and returns objects and properties from the JSON input as rows and columns. Here is the data sample from above, parsed by the OPENJSON function:

DECLARE @json NVARCHAR(4000) = N'
{  
   "NullValue":null,  
   "StringValue":"John",  
   "IntValue":45,  
   "DecimalValue":6.7,
   "TrueValue":true,  
   "FalseValue":false,  
   "ArrayValue":["a","r","r","a","y"],  
   "ObjectValue":{"obj":"ect"}  
}'

SELECT *
FROM OPENJSON(@json) j

SQL Server - OPENJSON - JSON_VALUE - JSON

The JSON keys and values are returned as columns in a result set, along with the JSON type. JSON documentation indicates there are only six JSON types, which is far less than there are SQL data types. For instance, both 45 and 6.7 have the same JSON type (type 2), whereas in SQL Server there are TINYINT, SMALLINT, INT, BIGINT, and NUMERIC/DECIMAL data types.


OPENROWSET

Reading JSON data from a hard-coded variable is probably not a real world use case for most of us. To read data out of a *.json file, we can enlist the help of our old friend, the OPENROWSET function. Here we'll read the JSON data out of the Sample.json file, store the contents of the file into a @json variable, and parse it with OPENJSON:

DECLARE @json AS NVARCHAR(MAX);

SELECT @json = r.BulkColumn
FROM OPENROWSET (BULK 'C:\Data\Sample.json', SINGLE_CLOB) AS r

SELECT * FROM OPENJSON(@json);

The CROSS APPLY operator in the next example produces the same output as the previous example. There's less code, and it's arguably more clear:

SELECT j.*
FROM OPENROWSET (BULK 'C:\Data\Sample.json', SINGLE_CLOB) AS r
CROSS APPLY OPENJSON(BulkColumn) AS j

Once again, here is the output. Did you notice how different the values are for the last two rows? The OPENJSON function returns array values and object values as JSON text.

SQL Server - OPENJSON - JSON_VALUE - JSON


JSON_VALUE

How are values of arrays and nested objects accessed? One way is to use the JSON_VALUE function, which extracts a scalar value from a JSON string. It takes two arguments: a JSON text expresion and a path that specifies the property to extract. The query below extracts each element of the ArrayValue array, and the value of the obj key within the ObjectValue object.

--Use the JSON_VALUE function to extract scalar values.
--Note the JSON path expressions with the "$" character.
SELECT j.[key], j.value, 
 JSON_VALUE(BulkColumn, '$.ArrayValue[0]') AS Element1,
 JSON_VALUE(BulkColumn, '$.ArrayValue[1]') AS Element2,
 JSON_VALUE(BulkColumn, '$.ArrayValue[2]') AS Element3,
 JSON_VALUE(BulkColumn, '$.ArrayValue[3]') AS Element4,
 JSON_VALUE(BulkColumn, '$.ArrayValue[4]') AS Element5,
 JSON_VALUE(BulkColumn, '$.ObjectValue.obj') AS ObjValue
FROM OPENROWSET (BULK 'C:\Data\Sample.json', SINGLE_CLOB) AS r
CROSS APPLY OPENJSON(BulkColumn) AS j

SQL Server - OPENJSON - JSON_VALUE - JSON

The path string in this context is comprised of a dollar sign ($), which represents the context item (the entire JSON object string). It is followed by the dot operator, which indicates a member of an object. $.ArrayValue is the path to the key named "ArrayValue", and $.ObjectValue is the path to the key named "ObjectValue". Both are members of the JSON text object. For the array, elements are referenced by an index number in square brackets. Index numbers for array elements are zero-based. So the first array element is referenced by index 0, the second array element is referenced by index 1, and so on. For the object, there is only one key/value pair. The value is referenced by a second dot operator and the name of the key ("obj").

What happens if we specify a path to an array element or key name that don't exist? Let's try:

--There is no 6th element at index 5.
--"Obj" is spelled with the wrong case.
SELECT j.[key], j.value, 
 JSON_VALUE(BulkColumn, '$.ArrayValue[5]') AS Element6,
 JSON_VALUE(BulkColumn, '$.ObjectValue.Obj') AS ObjValue
FROM OPENROWSET (BULK 'C:\Data\Sample.json', SINGLE_CLOB) AS r
CROSS APPLY OPENJSON(BulkColumn) AS j

SQL Server - OPENJSON - JSON_VALUE - JSON

NULL is returned where we tried to access the nonexistent 6th element and the value of a key that doesn't exist (we spelled it wrong by capitalizing the first letter). Be aware that there are two path modes: lax and strict. Lax is the default. In lax mode, the function returns NULL if the path expression contains an error, as is the case in the example above. In strict mode, the function raises an error if the path expression contains an error:

--There is no 6th element at index 5.
--"Obj" is spelled with the wrong case.
SELECT j.[key], j.value, 
 JSON_VALUE(BulkColumn, 'strict $.ArrayValue[5]') AS Element6,
 JSON_VALUE(BulkColumn, 'strict $.ObjectValue.Obj') AS ObjValue
FROM OPENROWSET (BULK 'C:\Data\Sample.json', SINGLE_CLOB) AS r
CROSS APPLY OPENJSON(BulkColumn) AS j
Msg 13608, Level 16, State 2, Line 51
Property cannot be found on the specified JSON path.

Although there is one nested object ("ObjectValue"), the JSON data in these examples is quite simple. In my short time working with JSON data, I encountered several levels of nesting with much greater complexity than is shown here. Look for another post or two on this, along with a string splitting approach with OPENJSON that works for a use case not covered by STRING_SPLIT.


A Preamble To JSON

Dave Mason - SQL Server - JSON

I've had a few more opportunities to use the Data Migration Assistant. Along with that experience, I have some additional thoughts on the assessment report. If you've used the tool yourself (or you read my recent post about the DMA), you may remember there is an "Export report" option available after an assessment finishes. (See images here and here to jog your memory, if you like.)

More recently, the CSV export format has been a problem. Much of the data has line breaks in it and I got tired of fighting with Excel trying to get a clean tabular output. I even tried reading the data into SQL via OPENROWSET, but quickly concluded I wasn't getting anywhere. I decided to bite the bullet and give the JSON export format a try.

I spotted some DMA documentation that mentions using Power BI to analyze a JSON formatted assessment report. Dustin Ryan created a Power BI template that makes it fairly easy. Just double-click on the *.PBIT file, enter the path where the *.JSON report file is, and watch the Power BI magic. By all accounts, it's pretty cool. The end result is attractive, meaningful, and interactive--everything you'd expect from Power BI. But is it useful? I haven't come to a final conclusion, but for now, I'm saying no.

What I really need is an actionable list of compatibility issues in a tabular (there's that word again) format that I can pass along to development. It was at this point that I decided to try my hand at parsing JSON text with SQL Server, OPENJSON, and OPENROWSET. I'd seen a smattering of comments on social media from those that found it challenging and sometimes frustrating. So I tempered my expectations. The outcome for me was very positive, though. I found a solution that solved a real world problem for me. And I've added a few scripts to my T-SQL tool belt. I'll elaborate and share some code in my next post.


Downgrading A SQL Server Database To An Older Version

Dave Mason - Downgrade SQL Server Database

One of the recurring questions I see on Stack Overflow is "How do I restore a SQL Server backup to a previous version of SQL Server?" The answer, of course, is you don't. Upgrading a database to a newer (major) version is a one-way ticket--at least as far as the database files and subsequent backups go. I recently found myself in a similar position as all those hapless Stack Overflow questioners. I had a customer that had migrated to a newer version of SQL and they wanted to roll back to the previous version. What to do?

A couple of thoughts immediately came to mind. There's the SQL Server Import and Export Wizard and the Generate and Publish Scripts Wizard. Neither of these sounded convenient. In particular, generating a script with both schema and 500 GB of data sounded like a fruitless endeavor. Two other options sounded much more appealing. So I focused on those.


BACPAC

The first was the Export Data-tier Application, which would create a BACPAC file from the newer version. In theory, I would be able to import that to a lower version of SQL Server. I started the wizard from SSMS, made two simple choices (name and path of the file, along with the list of tables I wanted) and clicked "finish". Any optimism I had was short lived. My database had cross-database references in numerous views and stored procedures. The BACPAC export failed. Much to my surprise, I found no option to pick and choose what objects (other than tables) were to be included/excluded with the export. On to plan B.


Something Old, Something New

The option I settled on was to bulk load tables one at a time. I'd have to truncate tables on the old version database first. Truncating tables in a particular order due to foreign keys gave me pause. I didn't want to deal with that (go ahead and chuckle now--I had to revisit this later). The "older" version of SQL in my circumstance was SQL 2014. I enlisted the use of DBCC CLONEDATABASE to get an "empty" database as my starting point. This was merely the second time I'd ever used this new-ish feature. To connect to the "new" version of SQL that held the current source data, I opted for a linked server. Now all I had to do was figure out the steps to load the data. With a little trial and error I settled on this:

  1. Put database into SIMPLE recovery mode
  2. Disable nonclustered indexes
  3. Disable FOREIGN KEY and CHECK constraints
  4. Disable triggers
  5. Set AUTO_CREATE_STATISTICS off
  6. Iterate through the tables
    • Turn IDENTITY_INSERT off (if necessary)
    • Insert into old version table from new version table, (excluding computed columns, if any)
    • Ensure rows are inserted by order of cluster key column(s)
    • Turn IDENTITY_INSERT on (if necessary)
  7. Set AUTO_CREATE_STATISTICS on
  8. Enable triggers
  9. Enable FOREIGN KEY and CHECK constraints
  10. Rebuild/enable nonclustered indexes
  11. Update non-index statistics?
  12. Revert recovery mode to FULL

This seemed to work fairly well for a smaller database about 15 GB. In the end, all of this ended up being an academic exercise. The customer ultimately decided to stay on the newer version of SQL and I never did downgrade the large 500 GB database.


Hat Tips

When I first realized the challenge ahead of me, I sought out some advice from the community. Special thanks to Peter Schott, Constantine Kokkinos, Enrique Arg├╝elles, and Patrick (aka "Paddyrick" on Slack) for all the healthy discussion and sharing.


Data Migration Assistant for SQL Server Upgrades

I recently took advantage of an opportunity to try Mirosoft's Data Migration Assistant. It was a good experience and I found the tool quite useful. As the documentation tells us, the DMA "helps you upgrade to a modern data platform by detecting compatibility issues that can impact database functionality in your new version of SQL Server or Azure SQL Database. DMA recommends performance and reliability improvements for your target environment and allows you to move your schema, data, and uncontained objects from your source server to your target server." For my use case, I wanted to assess a SQL 2008 R2 environment with more than a hundred user databases for an on-premises upgrade to SQL 2017.

There's not much to the Welcome screen. Simply click the + button as indicated on the left side toolbar.

Dave Mason - SQL Server - Data Migration Assistant

Pick a Project Type (Assessment in this example), enter a Project Name, and the Target server type. Note the four options

  • Azure SQL Database
  • Azure SQL Database Managed Instance
  • SQL Server on Azure Virtual Machine
  • SQL Server

Dave Mason - SQL Server - Data Migration Assistant

Clicking next brings us to the Options page. Choices here will vary slightly, based on the Target server type. Note the target versions listed--the oldest supported target version is SQL 2012. But since SQL 2008 and SQL 2008 R2 fall out of support soon, that's fine by me. For the report type, I was only interested in Compatibility issues.

Dave Mason - SQL Server - Data Migration Assistant

Dave Mason - SQL Server - Data Migration Assistant

On the next screen, specify the SQL instance to connect to, and one or more databases to assess. (If you want to assess databases from more than one instance, additional sources can be added by clicking the "Add sources" toolbar button/icon.) Here, I specified a named instance on my local machine, and chose 3 of the five user databases.

Dave Mason - SQL Server - Data Migration Assistant

Dave Mason - SQL Server - Data Migration Assistant

Review the choices and click Back to change them, or click Start Assessment. I'll go with the latter.

Dave Mason - SQL Server - Data Migration Assistant

There are some animated gif-like graphics while the assessment is running. By default, the app will use up to eight threads/connections (this behavior can be controlled by a parameter in the dma.exe.config file).

Dave Mason - SQL Server - Data Migration Assistant

After the DMA finishes assessing all the databases, a report of findings is displayed. All of the compatibility issues discovered will be displayed here. Click on a database to show what issues were found. Issues are broken out into the following types:

  • Breaking changes
  • Behavior changes
  • Deprecated features
  • Information issues

Dave Mason - SQL Server - Data Migration Assistant

Clicking/selecting a specific issue shows the Issue details, Impacted objects, and Recommended Fix(es). In the example here, there were only two issues found: unqualified joins and deprecated data types. There are tabs for multiple versions of SQL Server, from 140 (SQL Server 2017) downward. These show the impact of each issue, depending on what compatibility level the database would be placed in after an upgrade or migration. That both issues are present across the board tells us we can't make either issue "go away" by running the Northwind database on SQL 2017 in any of the lower compatibility levels.

Northwind, Pubs, and XYZ databases didn't turn up much. Since they didn't, here's a handful of other issues I found in another database:

  • Information
    • Remove references to undocumented system tables.
  • Behavior Change
    • SET ROWCOUNT used in the context of DML statements such as INSERT, UPDATE, or DELETE.
    • Full-Text Search has changed since SQL Server 2008.
    • ORDER BY specifies integer ordinal.
  • Breaking Change
    • CLR assemblies marked as SAFE or EXTERNAL_ACCESS are considered UNSAFE.
    • Constant expressions are not allowed in the ORDER BY clause in 90 or later compatibility modes.


Assessment Report

You can export a report to either a *.csv or *.json file. I didn't find either that helpful (I created a *.csv report file and opened, edited, and saved it as an Excel spreadsheet), although either can be ingested by Power BI.

Dave Mason - SQL Server - Data Migration Assistant


Feedback

The little smiley icon in the lower left is for user feedback. I clicked it and submitted a few items. Along the way, I also encountered an error with three databases that prevented the assessment from completing. I ended up contacting Microsoft via email. To my surprise, I received a response in less than an hour. They surmised there was a stored procedure in each database that caused the error. Removing it enabled the assessment to complete successfully. Thanks, Microsoft!

Dave Mason - SQL Server - Data Migration Assistant


Recap - SQLSaturday Tampa 2019

Dave Mason - SQLSaturday - SQL Server

SQLSaturday #859 in Tampa was my first event for the year. It unofficially begins "SQLSaturday season" for me. There's a handful of events in Florida and the Southeast that are relatively close and at agreeable times of the year. But back to Tampa--this was just my second time at their event, and I'm sure I'll be back again.

I debated driving to Tampa on Friday vs driving on Saturday morning. Take my time and spend for a hotel or be in a rush, but save some money. I opted for the latter. I stayed up late Friday night watching basketball and I paid the price on Saturday--I was tired. Traffic on I-4 was surprisingly good Saturday morning, and I arrived with plenty of time to spare.

I like the venue in Tampa, which is on the campus of USF. There's plenty of parking outside, and plenty of space inside, including numerous tables & chairs. It's great to have a place to sit and relax, whether it's breakfast, lunch, or just networking. I won't speak for the vendors, but it looked like they had enough elbow room too.


Sessions

There was one session in particular I was most interested in: Introduction to SQL Server Big Data Clusters by Buck Woody. Buck's session hit on a lot of topics I don't know much about: Linux, PolyBase, Containers, Kubernetes, Spark, and HDFS. This was my second time seeing Buck speak. He's equally entertaining and informative.

Another notable session I attended was Storytelling for Machine Learning and Advanced Analytics by Jen Underwood. This session also covered some topics I have an interest in, but not much personal experience. Jen's session was a "replay" on the one she presented at PASS Summit in 2018, which was nice for me since I didn't make to to Seattle last year. It was my first time seeing her speak.


No-Show

There was one session I went to where the speaker didn't show up. This gets talked about from time to time in the SQL community, especially amongst SQLSaturday organizers. A volunteer arrived after about 15 minutes to let us know the speaker wasn't coming. I felt bad for the volunteer, who seemed sheepish to make the announcement, almost as if he felt he had let us down. Obviously, it's not an ideal situation. But it's not the end of the world. Sometimes life gets in the way. I hope the speaker is safe and sound.


An Opportunity?

There were less than 10 of us in the room. I wondered if any of the attendees would want to do something impromptu? The topic for the session is something I've had a few experiences with, including a recent one. So I asked the others if they wanted to tell a problem story about Topic X. Did they fix the problem? If so, how? If they didn't fix the problem, was there someone in the room that knew how to fix it or could offer some insight?

I tried to get a conversation going a couple of times, but no one seemed interested. So I let it go.


Lunch

SQLSaturday Tampa opted for a catered lunch. It works well in their venue with the buffet/serving line in the main building adjacent to all the tables & chairs. It's hard to please everyone with lunch options, but I was a happy camper. Pork, chicken, beans & rice, along with some side items (yum!). I got to hang out with Buck for a bit, along with Jeff Taylor and Ron Dameron.


Friends

I met John Wells for the first time. We just met online for the first time back in December and had mostly talked about college football. (His LSU Tigers were playing my UCF Knights in the Fiesta Bowl.) Like me, John's career with SQL Server and IT got started a bit later in life than many others. John attends and speaks at a lot of SQLSaturday events in a lot of different parts of the US. He's easy to talk to. Go say Hi to him if you get a chance.

Robert Preseau and I went to a couple of the same sessions and got to hang out a bit. And there were many others in the #SQLFamily I bumped into.

Staying up late and waking up early makes for a sleepy attendee. I'd gone through two energy drinks during the day. After the last session, I was drained. So I skipped the raffle and hit the road for the drive home to Seminole County. Despite some pockets of congestion on I-4, I made it home in time for a late dinner.


Shredding XML Data From Extended Events

Dave Mason - SQL Server - Extended Events

Querying the data of an Extended Events session has never been easy. My XEvent sessions typically store event data in a target file, which means using sys.fn_xe_file_target_read_file. To get something of value, you need to shred the event data XML.

I might open the XEvent session properties in SSMS and make a list of the pertinent Event Fields and selected Global Fields (Actions) or take a look at the XML of the event_data (or both). Then I'll begin building my query with sys.fn_xe_file_target_read_file and end up with something like this:

;WITH XEvents AS
(
 SELECT object_name, CAST(event_data AS XML) AS event_data
 FROM sys.fn_xe_file_target_read_file ( 'system_health*.xel', NULL, NULL, NULL )  
)
SELECT object_name AS EventName,
 event_data.value ('(/event/@timestamp)[1]', 'DATETIME') AS [Time],
 event_data.value ('(/event/action[@name=''server_principal_name'']/value)[1]', 'VARCHAR(128)') AS login_name,
 event_data.value ('(/event/action[@name=''sql_text'']/value)[1]', 'VARCHAR(MAX)') AS sql_text,
 event_data.value ('(/event/action[@name=''session_id'']/value)[1]', 'BIGINT') AS session_id,
 event_data.value ('(/event/data[@name=''object_name'']/value)[1]', 'VARCHAR(128)') AS object_name,
 event_data.value ('(/event/data[@name=''duration'']/value)[1]', 'BIGINT') AS Duration,
 event_data.value ('(/event/data[@name=''physical_reads'']/value)[1]', 'BIGINT') AS physical_reads,
 event_data.value ('(/event/data[@name=''logical_reads'']/value)[1]', 'BIGINT') AS logical_reads,
 event_data.value ('(/event/data[@name=''writes'']/value)[1]', 'BIGINT') AS writes,
 event_data.value ('(/event/data[@name=''statement'']/value)[1]', 'VARCHAR(MAX)') AS statement
FROM XEvents 

It's effective and gives me what I want. But even with a reusable script and the benefits of copy/paste, building the query is time consuming and sometimes frustrating. What I've really wanted for a while is a way to autogenerate the query. I'll proceed to show one possibility. Feel free to skip to the end, if you wish. Otherwis, let's start with a couple of building block queries, using the system_health session as an example.

Here is a list of all the events selected for the session:

--Events selected for the XEvent session.
SELECT s.name SessionName, se.name EventName
FROM sys.server_event_sessions s
JOIN sys.server_event_session_events se
 ON se.event_session_id = s.event_session_id
WHERE s.name = 'system_health'

We can build off of the query above to include the Global Fields (Actions) that have been chosen for each event:

--Events selected for the XEvent session
--and the Global Fields (Actions) chosen per event.
SELECT s.name SessionName, se.name EventName, sa.name GlobalFieldName, 
 o.description, o.type_name, o.type_size
FROM sys.server_event_sessions s
JOIN sys.server_event_session_events se
 ON se.event_session_id = s.event_session_id
JOIN sys.server_event_session_actions sa
 ON sa.event_session_id = s.event_session_id
 AND sa.event_id = se.event_id
JOIN sys.dm_xe_objects o
 ON o.name = sa.name
 AND o.object_type = 'action'
WHERE s.name = 'system_health'
ORDER BY se.name, sa.name

Now for a list of Event Fields (including the Optional ones) for each event in the XEvent session:

--Events selected for the XEvent session and the 
--Event Fields (including the Optional ones) for each event.
SELECT s.name SessionName, se.name EventName, 
 c.name EventField, c.type_name 
FROM sys.server_event_sessions s
JOIN sys.server_event_session_events se
 ON se.event_session_id = s.event_session_id
JOIN sys.dm_xe_object_columns c
 ON c.object_name = se.name
 AND c.column_type = 'data' 
WHERE s.name = 'system_health'
ORDER BY se.name, c.name

For both the Global Fields and Event Fields, the type_name (data type) will be important: along with the xml.value() function, we'll want to specify an appropriate TSQL data type. A view will be used for the mapping. (A little bit of guesswork went into this.)

USE master;
GO

IF OBJECT_ID('XE_TSQL_TypeXref') IS NULL
 EXEC('CREATE VIEW XE_TSQL_TypeXref AS SELECT 1 AS Alias');
GO

/*
 This view attempts to map each Extended Events data type
 to a TSQL data type.
*/
ALTER VIEW dbo.XE_TSQL_TypeXref
AS
SELECT  
 o.name XE_type, o.description XE_type_description, 
 o.capabilities, o.capabilities_desc, o.type_size XE_type_size,
 CASE type_name
  --These mappings should be safe.
  --They correspond almost directly to each other.
  WHEN 'ansi_string' THEN 'VARCHAR(MAX)'
  WHEN 'binary_data' THEN 'VARBINARY(MAX)'
  WHEN 'boolean' THEN 'BIT'
  WHEN 'char' THEN 'VARCHAR(MAX)'
  WHEN 'guid' THEN 'UNIQUEIDENTIFIER'
  WHEN 'int16' THEN 'SMALLINT'
  WHEN 'int32' THEN 'INT'
  WHEN 'int64' THEN 'BIGINT'
  WHEN 'int8' THEN 'SMALLINT'
  WHEN 'uint16' THEN 'INT'
  WHEN 'uint32' THEN 'BIGINT'
  WHEN 'uint64' THEN 'BIGINT' --possible overflow?
  WHEN 'uint8' THEN 'SMALLINT'
  WHEN 'unicode_string' THEN 'NVARCHAR(MAX)'
  WHEN 'xml' THEN 'XML'

  --These mappings are based off of descriptions and type_size.
  WHEN 'cpu_cycle' THEN 'BIGINT'
  WHEN 'filetime' THEN 'BIGINT'
  WHEN 'wchar' THEN 'NVARCHAR(2)'

  --How many places of precision?
  WHEN 'float32' THEN 'NUMERIC(30, 4)'
  WHEN 'float64' THEN 'NUMERIC(30, 4)'

  --These mappings? Not sure. Default to NVARCHAR(MAX).
  WHEN 'activity_id' THEN 'NVARCHAR(MAX)'
  WHEN 'activity_id_xfer' THEN 'NVARCHAR(MAX)'
  WHEN 'ansi_string_ptr' THEN 'NVARCHAR(MAX)'
  WHEN 'callstack' THEN 'NVARCHAR(MAX)'
  WHEN 'guid_ptr' THEN 'NVARCHAR(MAX)'
  WHEN 'null' THEN 'NVARCHAR(MAX)'
  WHEN 'ptr' THEN 'NVARCHAR(MAX)'
  WHEN 'unicode_string_ptr' THEN 'NVARCHAR(MAX)'
 END AS SqlDataType
FROM sys.dm_xe_objects o
WHERE o.object_type = 'type'

Putting It All Together

Now for the big finish. This script combines the VIEW and parts of the other queries to dynamically build and execute a query string. Plug in the name of the XEvent session at the top.

DECLARE @XESessionName SYSNAME = 'system_health';
DECLARE @Tsql NVARCHAR(MAX) = '';

;WITH AllSessionEventFields AS
(
 --Unique Global Fields (Actions) across all events for the session.
 SELECT DISTINCT sa.name EventField, 'action' AS XmlNodeName, 
  CASE WHEN x.SqlDataType IS NULL THEN 'text' ELSE 'value' END AS XmlSubNodeName,
  'Global Fields (Action)' AS FieldType, o.type_name XE_type, 
  COALESCE(x.SqlDataType, 'NVARCHAR(MAX)') AS SqlDataType
 FROM sys.server_event_sessions s
 JOIN sys.server_event_session_events se
 ON se.event_session_id = s.event_session_id
 JOIN sys.server_event_session_actions sa
 ON sa.event_session_id = s.event_session_id
 AND sa.event_id = se.event_id
 JOIN sys.dm_xe_objects o
 ON o.name = sa.name
 AND o.object_type = 'action'
 LEFT JOIN master.dbo.XE_TSQL_TypeXref x
 ON x.XE_type = o.type_name
 WHERE s.name = @XESessionName

 UNION

 --Unique Event Fields across all events for the session.
SELECT DISTINCT c.name EventField, 'data' AS XmlNodeName, 
 CASE WHEN x.SqlDataType IS NULL THEN 'text' ELSE 'value' END AS XmlSubNodeName,
 'Event Fields' AS FieldType, c.type_name XE_type, 
 COALESCE(x.SqlDataType, 'NVARCHAR(MAX)') AS SqlDataType
FROM sys.server_event_sessions s
JOIN sys.server_event_session_events se
 ON se.event_session_id = s.event_session_id
JOIN sys.dm_xe_object_columns c
 ON c.object_name = se.name
 AND c.column_type = 'data' 
LEFT JOIN master.dbo.XE_TSQL_TypeXref x
 ON x.XE_type = c.type_name
WHERE s.name = @XESessionName
)
SELECT @Tsql = @Tsql + CHAR(9) + 
 CASE
  WHEN f.SqlDataType = 'XML' THEN
   'event_data.query (''(/event/' + f.XmlNodeName + '[@name=''''' + f.EventField + ''''']/' +
   f.XmlSubNodeName + ')[1]'') AS [' + f.EventField + '],' + CHAR(13) + CHAR(10)
  ELSE
   'event_data.value (''(/event/' + f.XmlNodeName + '[@name=''''' + f.EventField + ''''']/' +
   f.XmlSubNodeName + ')[1]'', ''' + f.SqlDataType + ''') AS [' + f.EventField + '],' + CHAR(13) + CHAR(10)
 END
FROM AllSessionEventFields f
ORDER BY f.EventField

SELECT @Tsql = LEFT(@Tsql, LEN(@Tsql) - 3);
SELECT @Tsql = ';WITH XEvents AS
(
 SELECT object_name, CAST(event_data AS XML) AS event_data
 FROM sys.fn_xe_file_target_read_file ( ''' + @XESessionName + '*.xel'', NULL, NULL, NULL )  
)
SELECT object_name, event_data, 
 event_data.value (''(/event/@timestamp)[1]'', ''DATETIME2(0)'') AS EventTime,' + 
 CHAR(13) + CHAR(10) + @Tsql + '
FROM XEvents;';

PRINT @Tsql;
EXEC(@Tsql);

The output for the system_health session is really "wide"--there are 19 events and over a hundred different Global Fields and Event Fields. Since those fields vary a bit from one event to the next, there will be a lot of NULLs in the result set.