OPENROWSET and Quoted Data

Dave Mason - SQL Server OPENROWSET

OPENROWSET has been one of my favorite tools for "querying" data in files that are external to a SQL Server instance. I've written about it a couple times (here and here). One of the challenges I noted was dealing with source data files that have values enclosed in double quotes.

Let's take a look at an example. If you're not familiar with OPENROWSET, here is the basic usage:

SELECT *
FROM OPENROWSET (
 BULK '[Data_Source_File]',
 FORMATFILE = '[Format_File]'
) AS p

For this post, the data source file will be a Performance Monitor log file with sample data from server "Ozymandias". It's available on GitHub. For the format file, I typically start by creating a table in SQL Server, then running the bcp utility with -f and the format option.

Here is a table definition with columns that correspond to the header row of the Perfmon log file (note I've shortened and tidied up the column names):

CREATE TABLE tempdb.guest.PermonStats (
 PerfMonDate VARCHAR(MAX),
 Total_Read_Latency VARCHAR(MAX),
 Total_Write_Latency VARCHAR(MAX),
 Total_IOPS VARCHAR(MAX),
 C_Read_Latency VARCHAR(MAX),
 C_Write_Latency VARCHAR(MAX),
 C_IOPS VARCHAR(MAX),
 D_Read_Latency VARCHAR(MAX),
 D_Write_Latency VARCHAR(MAX),
 D_IOPS VARCHAR(MAX),
 E_Read_Latency VARCHAR(MAX),
 E_Write_Latency VARCHAR(MAX),
 E_IOPS VARCHAR(MAX),
 F_Read_Latency VARCHAR(MAX),
 F_Write_Latency VARCHAR(MAX),
 F_IOPS VARCHAR(MAX),
 G_Read_Latency VARCHAR(MAX),
 G_Write_Latency VARCHAR(MAX),
 G_IOPS VARCHAR(MAX),
 H_Read_Latency VARCHAR(MAX),
 H_Write_Latency VARCHAR(MAX),
 H_IOPS VARCHAR(MAX),
 I_Read_Latency VARCHAR(MAX),
 I_Write_Latency VARCHAR(MAX),
 I_IOPS VARCHAR(MAX),
 J_Read_Latency VARCHAR(MAX),
 J_Write_Latency VARCHAR(MAX),
 J_IOPS VARCHAR(MAX),
 Available_MBytes VARCHAR(MAX),
 Pct_Processor_Time VARCHAR(MAX),
 Processor_Queue_Length VARCHAR(MAX),
 Buffer_cache_hit_ratio VARCHAR(MAX),
 Page_life_expectancy VARCHAR(MAX),
 Page_lookups_per_sec VARCHAR(MAX),
 Page_reads_per_sec VARCHAR(MAX),
 Tansactions_per_sec VARCHAR(MAX),
 Active_Temp_Tables VARCHAR(MAX),
 User_Connections VARCHAR(MAX),
 Target_Server_Memory_KB VARCHAR(MAX),
 Total_Server_Memory_KB VARCHAR(MAX),
 Batch_Requests_per_sec VARCHAR(MAX),
 SQL_Compilations_per_sec VARCHAR(MAX),
 SQL_ReCompilations_per_sec VARCHAR(MAX)
)

Next, I run bcp to generate my XML format file. For reference, the format file (PerfmonStats.ver1.xml) is available on GitHub (we'll modify this file in a bit).

bcp tempdb.guest.PermonStats format nul -c -x -f PerfmonStats.ver1.xml -t, -S .\SQL2017 -T

Ok, now that we've got our data file and format file squared away, let's see what we get with OPENROWSET. Below is the query, followed by the SSMS output.

SELECT *
FROM OPENROWSET (
 BULK 'C:\PerfLogs\SQLServerPerf_000001.csv',
 FORMATFILE = 'C:\PerfLogs\PerfmonStats.ver1.xml',
 FIRSTROW = 2 --skip header row
) AS p
SQL Server OPENROWSET

I haven't shown all the columns, but you get the idea--every column in the result set has data enclosed in double quotes. That's exactly how it appears in the source data file.


Field Terminators

Let's go back to the XML format file. The <RECORD> element describes the data as it is stored in the data file. It contains a set of one or more <FIELD> elements. These elements correspond to fields in the data file. Note that our field terminators are commas, as highlighted below:

<RECORD>
    <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="," COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
    <FIELD ID="2" xsi:type="CharTerm" TERMINATOR="," COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
    <FIELD ID="3" xsi:type="CharTerm" TERMINATOR="," COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
    ...
    <FIELD ID="41" xsi:type="CharTerm" TERMINATOR="," COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
    <FIELD ID="42" xsi:type="CharTerm" TERMINATOR="," COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
    <FIELD ID="43" xsi:type="CharTerm" TERMINATOR="\r\n" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>

We can specify different TERMINATOR values, though. Instead of a comma ,, let's change it to double-qoute, comma, double-quote ",". We are already using double-quotes for string literals within the XML format file. But we can use single-quotes instead. With that in mind, let's change the XML format file to this (available on GitHub as ver2):

<RECORD>
    <FIELD ID="1" xsi:type="CharTerm" TERMINATOR='","' COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
    <FIELD ID="2" xsi:type="CharTerm" TERMINATOR='","' COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
    <FIELD ID="3" xsi:type="CharTerm" TERMINATOR='","' COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
    ...
    <FIELD ID="41" xsi:type="CharTerm" TERMINATOR='","' COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
    <FIELD ID="42" xsi:type="CharTerm" TERMINATOR='","' COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
    <FIELD ID="43" xsi:type="CharTerm" TERMINATOR="\r\n" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>

If we rerun our OPENROWSET query with version 2 of the XML format file, we've eliminated all of the double-quotes, except for two: an opening double-quote in the first field/column, and a closing double-quote in the last field/column. (Image as been altered to display both the first and last columns of the result set in the SSMS grid.)

SELECT *
FROM OPENROWSET (
 BULK 'C:\PerfLogs\SQLServerPerf_000001.csv',
 FORMATFILE = 'C:\PerfLogs\PerfmonStats.ver2.xml',
 FIRSTROW = 2 --skip header row
) AS p
SQL Server OPENROWSET


Last Field Terminator

Getting rid of the double-quote from the last column is easy enough. Again, we'll change a TERMINATOR, but this time just for the final FIELD (ID="43"). The old value is "\r\n" (\r is a carriage return character and \n is a newline character). The new value will be '"\r\n'. This variant of the XML format file is available on GitHub as ver3:

<RECORD>
    <FIELD ID="1" xsi:type="CharTerm" TERMINATOR='","' COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
    <FIELD ID="2" xsi:type="CharTerm" TERMINATOR='","' COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
    <FIELD ID="3" xsi:type="CharTerm" TERMINATOR='","' COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
    ...
    <FIELD ID="41" xsi:type="CharTerm" TERMINATOR='","' COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
    <FIELD ID="42" xsi:type="CharTerm" TERMINATOR='","' COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
    <FIELD ID="43" xsi:type="CharTerm" TERMINATOR='"\r\n' COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>

Here's our OPENROWSET query with version 3 of the XML format file. (Once again, the image as been altered to display both the first and last columns of the result set in the SSMS grid.)

SELECT *
FROM OPENROWSET (
 BULK 'C:\PerfLogs\SQLServerPerf_000001.csv',
 FORMATFILE = 'C:\PerfLogs\PerfmonStats.ver3.xml',
 FIRSTROW = 2 --skip header row
) AS p
Dave Mason - SQL Server OPENROWSET


First Field

The remaining double-quote has always been problematic. I've not been able to find a way to "remove" it as we've done with the others. So instead, I'll "hide" it. In the format file, I'll need to insert a new <FIELD> at the first position (ID="1"). That means I'd have to renumber all the remaining fields, incrementing the ID by one. In the <ROW> element, I'd also have to insert a new <COLUMN> at the first position (SOURCE="1") and renumber all the remaining columns, incrementing the SOURCE by one. That's a lot of manual editing. Instead, I'll let bcp do most of the work for me. Drop table tempdb.guest.PermonStats and recreate it with a new column in the first ordinal position:

DROP TABLE tempdb.guest.PermonStats 
GO

CREATE TABLE tempdb.guest.PermonStats (
 Quote CHAR(1),
 PerfMonDate VARCHAR(MAX),
 Total_Read_Latency VARCHAR(MAX),
 Total_Write_Latency VARCHAR(MAX),
 Total_IOPS VARCHAR(MAX),
 C_Read_Latency VARCHAR(MAX),
 C_Write_Latency VARCHAR(MAX),
 C_IOPS VARCHAR(MAX),
 D_Read_Latency VARCHAR(MAX),
 D_Write_Latency VARCHAR(MAX),
 D_IOPS VARCHAR(MAX),
 E_Read_Latency VARCHAR(MAX),
 E_Write_Latency VARCHAR(MAX),
 E_IOPS VARCHAR(MAX),
 F_Read_Latency VARCHAR(MAX),
 F_Write_Latency VARCHAR(MAX),
 F_IOPS VARCHAR(MAX),
 G_Read_Latency VARCHAR(MAX),
 G_Write_Latency VARCHAR(MAX),
 G_IOPS VARCHAR(MAX),
 H_Read_Latency VARCHAR(MAX),
 H_Write_Latency VARCHAR(MAX),
 H_IOPS VARCHAR(MAX),
 I_Read_Latency VARCHAR(MAX),
 I_Write_Latency VARCHAR(MAX),
 I_IOPS VARCHAR(MAX),
 J_Read_Latency VARCHAR(MAX),
 J_Write_Latency VARCHAR(MAX),
 J_IOPS VARCHAR(MAX),
 Available_MBytes VARCHAR(MAX),
 Pct_Processor_Time VARCHAR(MAX),
 Processor_Queue_Length VARCHAR(MAX),
 Buffer_cache_hit_ratio VARCHAR(MAX),
 Page_life_expectancy VARCHAR(MAX),
 Page_lookups_per_sec VARCHAR(MAX),
 Page_reads_per_sec VARCHAR(MAX),
 Tansactions_per_sec VARCHAR(MAX),
 Active_Temp_Tables VARCHAR(MAX),
 User_Connections VARCHAR(MAX),
 Target_Server_Memory_KB VARCHAR(MAX),
 Total_Server_Memory_KB VARCHAR(MAX),
 Batch_Requests_per_sec VARCHAR(MAX),
 SQL_Compilations_per_sec VARCHAR(MAX),
 SQL_ReCompilations_per_sec VARCHAR(MAX)
)

I run bcp again to generate the fourth version of the format file:

bcp tempdb.guest.PermonStats format nul -c -x -f PerfmonStats.ver4.xml -t, -S .\SQL2017 -T

Once again the TERMINATOR of every FIELD needs to be changed. Field ID=1 corresponds to the new "Quote" column we added to the SQL table. Change its terminator from a comma "," to a double-quote '"'. As was done in the previous examples, change the terminators for the remaining fields (except for the last one) from "," to '","' and the terminator for the last field from "\r\n" to '"\r\n'

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
      <FIELD ID="1" xsi:type="CharTerm" TERMINATOR='"' MAX_LENGTH="1" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
      <FIELD ID="2" xsi:type="CharTerm" TERMINATOR='","' COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
      <FIELD ID="3" xsi:type="CharTerm" TERMINATOR='","' COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
      ...
      <FIELD ID="42" xsi:type="CharTerm" TERMINATOR='","' COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
      <FIELD ID="43" xsi:type="CharTerm" TERMINATOR='","' COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
      <FIELD ID="44" xsi:type="CharTerm" TERMINATOR='"\r\n' COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
 </RECORD>

Here's our OPENROWSET query with version 4 of the XML format file (available on GitHub as ver4):

SELECT *
FROM OPENROWSET (
 BULK 'C:\PerfLogs\SQLServerPerf_000001.csv',
 FORMATFILE = 'C:\PerfLogs\PerfmonStats.ver4.xml',
 FIRSTROW = 2 --skip header row
) AS p
Dave Mason - SQL Server OPENROWSET

We're almost there. The final double-quote character is gone. But we have a pesky "Quote" column in the mix. We could omit it in the result set by specifying all of the remaining columns in our SELECT statement. But that's a lot of typing. And I'm lazy. Best practices be damned, I want to use a SELECT * query!


Hiding A Column

Let's make one more trip back to the format file. Scroll down to the ROW element and comment out the first COLUMN element (SOURCE="1"):

 <ROW>
    <!--COLUMN SOURCE="1" NAME="Quote" xsi:type="SQLCHAR"/-->
    <COLUMN SOURCE="2" NAME="PerfMonDate" xsi:type="SQLVARYCHAR"/>
    <COLUMN SOURCE="3" NAME="Total_Read_Latency" xsi:type="SQLVARYCHAR"/>
    ...
    <COLUMN SOURCE="42" NAME="Batch_Requests_per_sec" xsi:type="SQLVARYCHAR"/>
    <COLUMN SOURCE="43" NAME="SQL_Compilations_per_sec" xsi:type="SQLVARYCHAR"/>
    <COLUMN SOURCE="44" NAME="SQL_ReCompilations_per_sec" xsi:type="SQLVARYCHAR"/>
</ROW>

Here's our final OPENROWSET query with version 5 of the XML format file (available on GitHub as ver5):

SELECT *
FROM OPENROWSET (
 BULK 'C:\PerfLogs\SQLServerPerf_000001.csv',
 FORMATFILE = 'C:\PerfLogs\PerfmonStats.ver5.xml',
 FIRSTROW = 2 --skip header row
) AS p
Dave Mason - SQL Server OPENROWSET

All the double quote characters are gone, the "Quote" column is hidden, and we have the convenience of using SELECT * in our OPENROWSET query.


Schema-on-Read for SQL Server?

Dave Mason NoSQL JSON

I've grumbled that SQL Server doesn't natively provide a way to infer JSON schema in a query. It was pointed out to me that it's "schema on read or schema on write" when dealing with JSON data in SQL Server. I'd never heard that phrase before, and it certainly spelled things out in a clear and concise way.

Nonetheless, I had also concluded dynamic T-SQL could be used for the OPENJSON function...and that it would be icky. And here we are. Yes, I went down that path. And yes, it was sorta icky. Let's take a look.

In my estimation, I would need to use one of two options along with the OPENJSON function:

  1. Explicitly specify schema using the WITH clause to format the output.
  2. Define schema in the SELECT list.

Here are examples of each:

DECLARE @JsonData NVARCHAR(MAX) = '[
  {
    "ContactTypeID": 1,
    "Name": "Accounting Manager",
    "ModifiedDate": "2002-06-01T00:00:00"
  },
  {
    "ContactTypeID": 2,
    "Name": "Assistant Sales Agent",
    "ModifiedDate": "2002-06-01T00:00:00"
  },
  {
    "ContactTypeID": 3,
    "Name": "Assistant Sales Representative",
    "ModifiedDate": "2002-13-01T00:00:00"
  }
]'

--Option 1
SELECT *
FROM OPENJSON(@JsonData) 
WITH (
 ContactTypeID INT,
 Name NVARCHAR(50),
 ModifiedDate DATETIME
)

--Option 2
SELECT 
 TRY_CAST(JSON_VALUE(j.value, '$.ContactTypeID') AS INT) AS ContactTypeID,
 TRY_CAST(JSON_VALUE(j.value, '$.Name') AS NVARCHAR(50)) AS Name,
 TRY_CAST(JSON_VALUE(j.value, '$.ModifiedDate') AS DATETIME) AS ModifiedDate
FROM OPENJSON(@JsonData) j

Option #1 results in an error due to the invalid date (highlighted):

Msg 242, Level 16, State 3, Line 19
The conversion of a nvarchar data type to a datetime data type resulted in an out-of-range value.

On the other hand, Option #2 is more forgiving. The use of TRY_CAST does not cause any errors when invalid data is encountered (NULL is returned instead), and all data "rows" are in the result set.

Dave Mason - OPENJSON - SQL Server

OpenJsonTabular

The code for my stored procedure is below. It has a single input parameter for a string of JSON data. The name/value pairs of the first "row" of JSON data is parsed to obtain column names and types for the result set. A query string for the OPENJSON function is constructed and executed via EXEC. JSON's support for data types is pretty sparse compared to SQL Server (notice the big CASE expression where I attempt to do some mapping). If you're using SQL 2016, you'll have to make an edit for the STRING_AGG function.

USE tempdb;
GO

CREATE OR ALTER PROCEDURE dbo.OpenJsonTabular
 @Json NVARCHAR(MAX)
AS
/*
 Dynamically returns a resultset from the input JSON data.
 The JSON data is assumed to be tabular/flat, with schema
 inferred from the first "row" of JSON key/value pairs.
 The JSON data is assumed to be in the format produced by the
  FOR JSON clause with the AUTO option:

 [   
  { "Row1Column1Name": "Row1Column1Value", "Row1Column2Name": "Row1Column2Value", ... "Row1Column(n)Name": "Row1Column(n)Value" },   
  { "Row2Column1Name": "Row2Column1Value", "Row2Column2Name": "Row2Column2Value", ... "Row2Column(n)Name": "Row2Column(n)Value" },   
  ...
  { "Row(n)Column1Name": "Row(n)Column1Value", "Row(n)Column2Name": "Row(n)Column2Value", ... "Row(n)Column(n)Name": "RowColumn(n)Value" },   
 ]

*/
BEGIN
 DECLARE @Tsql NVARCHAR(MAX) = CHAR(9);

 SELECT @Tsql = @Tsql + STRING_AGG(
  'TRY_CAST(JSON_VALUE(j.value, ''$."' + CAST(k.[key] AS VARCHAR(MAX)) + '"'') AS ' + 
  
  --Try to map the JSON type to a SQL Server type.
  CASE
   --JSON null
   WHEN k.type = 0 THEN 'VARCHAR(MAX)'

   --JSON string (double-quoted Unicode with backslash escaping)
   WHEN k.type = 1 THEN 
    CASE 
     WHEN TRY_CAST(k.[value] AS DATETIME) IS NOT NULL THEN 'DATETIME' 
     ELSE 'VARCHAR(MAX)' 
    END

   ----JSON number (double- precision floating-point format in JavaScript)
   WHEN k.type = 2 THEN 
    CASE
     WHEN k.[value] LIKE '%.%' THEN 'NUMERIC(38, 5)'
     WHEN k.[value] LIKE '%,%' THEN 'NUMERIC(38, 5)'
     ELSE 'BIGINT'
    END

   --JSON boolean ("true" or "false")
   WHEN k.type = 3 THEN 'BIT'
   
   --JSON array (ordered sequence of values)
   WHEN k.type = 4 THEN 'VARCHAR(MAX)'

   --JSON object (an unordered collection of key:value pairs)
   WHEN k.type = 5 THEN 'VARCHAR(MAX)'

   ELSE 'VARCHAR(MAX)'  --null
  END + ') AS ' + QUOTENAME(k.[key]), ', ' + CHAR(13) + CHAR(10) + CHAR(9) )

 FROM OPENJSON(@Json) j
 CROSS APPLY OPENJSON(j.value) k
 WHERE j.[key] = 0

 SELECT @Tsql = 'SELECT ' + CHAR(13) + CHAR(10) +
  @Tsql + CHAR(13) + CHAR(10) +
 'FROM OPENJSON(''' + @Json + ''') j';

 --SELECT @Tsql;
 EXEC (@Tsql);
END

Examples

Here are a couple of code samples that show how it works.

DECLARE @Json NVARCHAR(MAX) =
(
 SELECT TOP(5) * 
 FROM AdventureWorks.Production.ProductListPriceHistory
 FOR JSON AUTO
);

EXEC tempdb.dbo.OpenJsonTabular @Json;

SET @Json = 
(
 SELECT TOP(5) *
 FROM msdb.sys.objects
 FOR JSON AUTO
)

EXEC tempdb.dbo.OpenJsonTabular @Json;

As you may have expected, the result sets from each execution are different:

Dave Mason - JSON - OpenJsonTabular - SQL Server

I don't know if any of this could be considered "schema on read". Maybe it's automated schema on read. Or maybe it's just an abomination. It may be icky, but I still had fun working on it. Converting relational data to JSON and then back to a tabular result set isn't terribly useful, of course. But I have at least one use case that involves serializing external data to JSON and then "querying" it with the OpenJsonTabular procedure. Perhaps more on this in another post.


The Bridge From NoSQL to SQL Server

Dave Mason NoSQL JSON

Starting with SQL Server 2016, Microsoft offers support for JSON. JSON functions in SQL Server enable us to combine NoSQL and relational concepts in the same database. We can combine classic relational columns with columns that contain documents formatted as JSON text in the same table, parse and import JSON documents in relational structures, or format relational data to JSON text. Collectively, this has been referred to as a "bridge" to the NoSQL world.

Converting relational data to JSON hasn't posed many challenges. In most instances, appending a query with FOR JSON AUTO is sufficient. Here's a simple query with the JSON output:

SELECT TOP(3) *
FROM AdventureWorks.Person.ContactType
FOR JSON AUTO;
[
  {
    "ContactTypeID": 1,
    "Name": "Accounting Manager",
    "ModifiedDate": "2002-06-01T00:00:00"
  },
  {
    "ContactTypeID": 2,
    "Name": "Assistant Sales Agent",
    "ModifiedDate": "2002-06-01T00:00:00"
  },
  {
    "ContactTypeID": 3,
    "Name": "Assistant Sales Representative",
    "ModifiedDate": "2002-06-01T00:00:00"
  }
]

Parsing JSON data to a tabular result set is more complicated. (Although JSON data is commonly hierarchical, I'll focus on tabular data in this post.) The OPENJSON function will be one of the primary functions used to parse JSON data. The schema of the result set returned by the function can be the default schema or an explicit schema. Let's parse the data from above with the default schema:

DECLARE @JsonData NVARCHAR(MAX) = '[
  {
    "ContactTypeID": 1,
    "Name": "Accounting Manager",
    "ModifiedDate": "2002-06-01T00:00:00"
  },
  {
    "ContactTypeID": 2,
    "Name": "Assistant Sales Agent",
    "ModifiedDate": "2002-06-01T00:00:00"
  },
  {
    "ContactTypeID": 3,
    "Name": "Assistant Sales Representative",
    "ModifiedDate": "2002-06-01T00:00:00"
  }
]'

SELECT j.*
FROM OPENJSON(@JsonData) j
Dave Mason OPENJSON

The default schema returns key/value pairs, along with a numeric "type" for the JSON value. Above, we see each "row" as a value of type 5 (JSON object type), itself comprised of key/value pairs, with the keys representing column names from the original query. Perhaps you were expecting the result set to look something like this:

Dave Mason SSMS

For that, we'll need an explicit schema where we specify the column names and data types (for brevity, I'll reuse the @JsonData variable and the JSON data value shown earlier):

SELECT *
FROM OPENJSON(@JsonData) 
WITH (
 ContactTypeID INT,
 Name NVARCHAR(50),
 ModifiedDate DATETIME
)

Here, the column names match the key names of the JSON data. We can also use the JSON_VALUE function to achieve the same result set:

SELECT 
 JSON_VALUE(j.value, '$.ContactTypeID') AS ContactTypeID,
 JSON_VALUE(j.value, '$.Name') AS Name,
 JSON_VALUE(j.value, '$.ModifiedDate') AS ModifiedDate
FROM OPENJSON(@JsonData) j

In both situations, we need to know something about the JSON schema to query it in a meaningful way: in the first example, column names and types are hard-coded; in the second example, column names are hard-coded as path parameter values for the JSON_VALUE function. Even though JSON data is self-describing, SQL Server doesn't have a way to infer schema. (I would be quite happy to be wrong about this--please add a comment if you know something I don't!) About the time I came to this realization, I commented on Twitter that JSON might be fool's gold. You don't need to know schema to store JSON data in SQL Server. But you do if you want to query it. "It's pay me now or pay me later."

If I could wave my magic wand, I'd make some enhancements. OPENJSON already has a default schema (no WITH clause) and an explicit schema (a WITH clause that specifies column names and types). How about an implicit or inferred schema? With this option, SQL would use the keys and types from the first "row" and those would become the column names and types of the result set. Syntax might look something like this:

SELECT *
FROM OPENJSON(@JsonData, 'lax/strict $.<path>') 
WITH SCHEMA IMPLICIT;

The OPENJSON function already has options for lax vs strict. With lax mode, data that can't be cast to the inferred/implicit data type would be returned as NULL, whereas with strict mode, an error would occur.

Another enhancement idea is to allow dynamic explicit schemas. With the use of CROSS APPLY, we can get the key names and types from JSON:

SELECT k.[key], k.[type]
FROM OPENJSON(@JsonData) j
CROSS APPLY OPENJSON(j.value) k
WHERE j.[key] = 0
Dave Mason CROSS APPLY OPENJSON

As it is, the column names and types for an explicit schema must be hard-coded at query time. Wouldn't it be great if we could do this?

DECLARE @Schema NVARCHAR(MAX);
SELECT @Schema = STRING_AGG( 
 QUOTENAME(k.[key]) + ' ' +
  CASE
   WHEN k.type = 0 THEN 'VARCHAR(MAX)'  
   WHEN k.type = 1 THEN 'VARCHAR(MAX)'  
   WHEN k.type = 2 THEN 'NUMERIC(38, 5)' 
   WHEN k.type = 3 THEN 'BIT'    
   WHEN k.type = 4 THEN 'VARCHAR(MAX)'  
   WHEN k.type = 5 THEN 'VARCHAR(MAX)'  
   ELSE 'VARCHAR(MAX)'  
  END, 
 ', ')
FROM OPENJSON(@JsonData) j
CROSS APPLY OPENJSON(j.value) k
WHERE j.[key] = 0

SELECT *
FROM OPENJSON(@JsonData) 
WITH (
 SCHEMA = @Schema
)

What if we used an existing object definition for an explicit schema? Any of these options would be pretty sweet, too:

--Table definition for Explicit Schema
SELECT *
FROM OPENJSON(@JsonData) 
WITH (
 SCHEMA = OBJECT_DEFINITION(OBJECT_ID('dbo.MyTable'))
);

--View definition for Explicit Schema
SELECT *
FROM OPENJSON(@JsonData) 
WITH (
 SCHEMA = OBJECT_DEFINITION(OBJECT_ID('dbo.MyView'))
);

--User-Defined Table Tyype definition for Explicit Schema
SELECT *
FROM OPENJSON(@JsonData) 
WITH (
 SCHEMA = OBJECT_DEFINITION(OBJECT_ID('dbo.MyUserDefinedTableType'))
);

Outside of dynamic TSQL (which I love...but ick!) I've not found a way to query an unknown piece of JSON data in a manner that returns a tabular result set. We have a bridge from NoSQL to SQL Server, but it's a little shaky. We need some reinforcements.


There is already an object named 'RSExecRole' in the database

Dave Mason - SSRS

When migrating an instance of SSRS, I performed a backup of the [ReportServer] and [ReportServerTemp] SSRS databases from a SQL Server 2008 R2 instance and restored them to a SQL Server 2017 instance. After installing SSRS 2017 on the target machine, I ran SSRS configuration and attempted to set the Current Report Server Database to the existing [ReportServer] database I had already restored:

Dave Mason - Report Server Configuration Manager

Dave Mason - SSRS Change Database

Dave Mason - SSRS Change Database


At the last step (Progress and Finish), here is the text of the error message:

System.Data.SqlClient.SqlException (0x80131904): There is already an object named 'RSExecRole' in the database.
CREATE SCHEMA failed due to previous errors.
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at System.Data.SqlClient.SqlCommand.RunExecuteNonQueryTds(String methodName, Boolean async, Int32 timeout, Boolean asyncWrite)
   at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, String methodName, Boolean sendToPipe, Int32 timeout, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry)
   at System.Data.SqlClient.SqlCommand.ExecuteNonQuery()
   at Microsoft.ReportingServices.Common.DBUtils.ApplyScript(SqlConnection conn, String script, ICommandWrapperFactory commandWrapper)
   at Microsoft.ReportingServices.Common.DBUtils.ApplyScript(String connectionString, String script)
   at ReportServicesConfigUI.SqlClientTools.SqlTools.ApplyScript(String connectionString, String script)
ClientConnectionId:4f47a341-9eca-4963-bfcd-ba707d8882a0
Error Number:2714,State:6,Class:16

I was somewhat familiar with the RSExecRole database role. Knowing it had to exist for SSRS to function, I was initially puzzled. After running an Extended Events session for the Error_Reported event, I found this SQL statement that was failing in the context of the [master] database:

if not exists (select * from sysusers where issqlrole = 1 and name = 'RSExecRole')  
BEGIN   
 EXEC sp_addrole 'RSExecRole'  
END

That role definitely did not exist in [master]. In fact, there was not an "object" in [master] by that name:

SELECT o.name, o.type_desc
FROM master.sys.objects o
WHERE o.name = 'RSExecRole'

name        type_desc
----------  -----------------

(0 rows affected)

Upon further inspection, I realized it was an existing schema named RSExecRole that was causing the issue.

Dave Mason - SQL Server Schemas


If I had paid just a wee bit more attention to the error message I would have discovered this sooner:

CREATE SCHEMA failed due to previous errors.

For a quick turnaround, I deleted the RSExecRole schema from [master] and also from [msdb] (the Extended Events session tipped me off to this), re-ran "Change Database" in SSRS Configuration and finally got the results I was expecting. Happy migrating, everyone!

Dave Mason - Report Server Configuration Manager



Documenting SSIS Catalogs

Dave Mason - SQL Server - SSIS

In the last post, we looked at query options for documenting SSIS packages that are deployed via the legacy package deployment model. What about SSIS packages deployed to a catalog via the project deployment model? Is the package XML accessible so that we can query and shred it? I don't know with certainty, but I think it is not.

However, there are .NET Framework interfaces we can use to programatically obtain SSIS catalog metadata. So for this post, I'll soldier on without my dear friend T-SQL and proceed with PowerShell. Here's a look at the projects and packages I was working with as I developed and debugged the script code. It's a short list with just two projects, having one package each. The "DailyETLMain.dtsx" package is a sample from Microsoft's GitHub repository.

SSIS Catalog

As for the script itself, I waffled over whether to include it here or not. In the end, I decided to include it here (at the end of the post), but it's also on Github as part of a PowerShell repository. Note there are a number of assemblies referenced. I'm not sure where all of them originate from. I've got a lot of stuff installed on my laptop: SQL Server 2017 (including the Integration Services feature), Visual Studio, SSDT, and SSMS to name a few.

The script output is a table of names and descriptions including the catalog, folders, projects, packages, and package tasks (type, name, order, and description). The task order reflects the package precedent constraints, if any. There is no recursion for container objects--the script doesn't list tasks within a Sequence or a ForEachLoop container, for instance. (For documentation purposes, this seemed like overkill. So I opted against it.)

PowerShell Get-SsisCatalogMetada


Acknowledgments

I did want to pass along a thank you to StackOverflow user Jonathan Garvey. His answer to a question got me past a roadblock that I wasn't able to overcome. Thanks, Jonathan!

 <#
    MIT License
    Copyright (c) 2019 Dave Mason
    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to deal
    in the Software without restriction, including without limitation the rights
    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
    copies of the Software, and to permit persons to whom the Software is
    furnished to do so, subject to the following conditions:
    The above copyright notice and this permission notice shall be included in all
    copies or substantial portions of the Software.
    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
    SOFTWARE.
#>

<#
    .SYNOPSIS
        Gets SSIS metadata.
    .DESCRIPTION
        Gets SSIS metadata including catalogs, folders, projects, packages, and tasks.
    .INPUTS
        Name of the SQL Server instance where the SSIS catalog database(s) reside(s).
    .OUTPUTS
        List of SSIS catalogs, folders, projects, packages, package task types, task names, task order, and task descriptions.
    .NOTES
        Version:        1.0
        Author:         Dave Mason
        Creation Date:  2019/09/27
#>

<#
    Assembly versions that worked on my laptop:
    GAC    Version        Location                                                                                                                                                                                        
    ---    -------        --------                                                                                                                                                                                        
    True   v4.0.30319     C:\WINDOWS\Microsoft.Net\assembly\GAC_MSIL\Microsoft.SqlServer.ManagedDTS\v4.0_14.0.0.0__89845dcd8080cc91\Microsoft.SqlServer.ManagedDTS.dll                                                    
    True   v2.0.50727     C:\WINDOWS\assembly\GAC_MSIL\Microsoft.SqlServer.Management.IntegrationServices\14.0.0.0__89845dcd8080cc91\Microsoft.SqlServer.Management.IntegrationServices.dll                               
    True   v2.0.50727     C:\WINDOWS\assembly\GAC_MSIL\Microsoft.SqlServer.Management.Sdk.Sfc\14.0.0.0__89845dcd8080cc91\Microsoft.SqlServer.Management.Sdk.Sfc.dll                                                       
    True   v2.0.50727     C:\WINDOWS\assembly\GAC_MSIL\Microsoft.SqlServer.Smo\14.0.0.0__89845dcd8080cc91\Microsoft.SqlServer.Smo.dll                                                                                     
    True   v4.0.30319     C:\WINDOWS\Microsoft.Net\assembly\GAC_MSIL\System.IO.Compression\v4.0_4.0.0.0__b77a5c561934e089\System.IO.Compression.dll    
#>

# Possible issues: what if these assemblies aren't present? or are the wrong version?
[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.ManagedDTS")
[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.Management.IntegrationServices")
[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.Management.Sdk.Sfc")
[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.Smo")
[System.Reflection.Assembly]::LoadWithPartialName("System.IO.Compression")


<#
    Adds a row to the meta data table.
    (Params should be self-explanatory.)
#>
function Add-MetaDataTableRow([string]$catalog, [string]$folder, [string]$project, [string]$package, [string]$taskType, [string]$taskName, [string]$taskOrder, [string]$taskDescription)
{
    [System.Data.DataRow]$row = $SsisMetaData.NewRow()
    $row["Catalog"] = $catalog
    $row["Folder"] = $folder
    $row["Project"] = $project
    $row["Package"] = $package
    $row["Task Type"] = $taskType
    $row["Task Name"] = $taskName
    $row["Task Order"] = $taskOrder
    $row["Task Description"] = $taskDescription
            
    $SsisMetaData.Rows.Add($row)
}

function Get-SortedPrecedenceConstraints()
{
    [OutputType([System.Collections.Generic.List[Microsoft.SqlServer.Dts.Runtime.PrecedenceConstraint]])]
    Param (
        [parameter(Mandatory=$true)]
        [Microsoft.SqlServer.Dts.Runtime.PrecedenceConstraints]
        $precedenceConstraints
    )
    # The PrecedenceConstraint objects in the collection are chained together. 
    # PrecedenceExecutable is the "previous" executable. 
    # ConstrainedExecutable is the "next" executable.

    $ret = [System.Collections.Generic.List[Microsoft.SqlServer.Dts.Runtime.PrecedenceConstraint]]::new()
    $ConstrainedExecutableIds = [System.Collections.Generic.List[string]]::new()

    #Create list of constrained executable Ids.
    for ([int]$i = 0; $i -lt $precedenceConstraints.Count; $i++)
    {
        [Microsoft.SqlServer.Dts.Runtime.EventsProvider]$container = [Microsoft.SqlServer.Dts.Runtime.EventsProvider]$precedenceConstraints[$i].ConstrainedExecutable
        $ConstrainedExecutableIds.Add($container.ID)
    }

    #Find the PrecedenceExecutable that doesn't have its ID in the string list.
    for ([int]$i = 0; $i -lt $precedenceConstraints.Count; $i++)
    {
        [Microsoft.SqlServer.Dts.Runtime.EventsProvider]$container = [Microsoft.SqlServer.Dts.Runtime.EventsProvider]$precedenceConstraints[$i].PrecedenceExecutable

        if ( -not $ConstrainedExecutableIds.Contains($container.ID) )
        {
            $ret.Add($precedenceConstraints[$i])
            break
        }
    }
    
    while ($ret.Count -lt $precedenceConstraints.Count)
    {
        [Microsoft.SqlServer.Dts.Runtime.EventsProvider]$lastContainer = [Microsoft.SqlServer.Dts.Runtime.EventsProvider]$ret[$ret.Count - 1].ConstrainedExecutable

        # Although the PrecedenceConstraints chain may be out of order,
        # don't assume the chain is unbroken. 
        [bool]$found = $false

        for ([int]$i = 0; $i -lt $precedenceConstraints.Count; $i++)
        {
            [Microsoft.SqlServer.Dts.Runtime.EventsProvider]$container = [Microsoft.SqlServer.Dts.Runtime.EventsProvider]$precedenceConstraints[$i].PrecedenceExecutable

            if ($container.ID -eq $lastContainer.ID)
            {
                $ret.Add($precedenceConstraints[$i])
                $found = $true
                break
            }
        }

        # TODO: throw an exception?
        if( -not $found )
        {
            break
        }
    }

    return $ret
}

function Get-TaskType()
{
    [OutputType([string])]
    Param (
        [parameter(Mandatory=$true)]
        [Microsoft.SqlServer.Dts.Runtime.EventsProvider]
        $container
    )

    [string]$ret = $container.GetType().Name

    if ([string]::Compare($ret, "TaskHost", $true) -eq 0)
    {
        $ret = ([Microsoft.SqlServer.Dts.Runtime.TaskHost]$container).InnerObject.GetType().ToString()
        $ret = $ret.Split('.')[$ret.Split('.').Length - 1]

        if([string]::Compare($ret, "__ComObject", $true) -eq 0)
        {
            #This seems to happen for Data Flow Tasks. 
            $ret = $container.GetType().Name
        }
    }

    return $ret
}


#region Create/reset DataTable, add columns.
[System.Data.DataTable]$SsisMetaData = [System.Data.DataTable]::new()
$SsisMetaData.Columns.Add("Catalog") | Out-Null
$SsisMetaData.Columns.Add("Folder") | Out-Null
$SsisMetaData.Columns.Add("Project") | Out-Null
$SsisMetaData.Columns.Add("Package") | Out-Null
$SsisMetaData.Columns.Add("Task Type") | Out-Null
$SsisMetaData.Columns.Add("Task Name") | Out-Null
$SsisMetaData.Columns.Add("Task Order") | Out-Null
$SsisMetaData.Columns.Add("Task Description") | Out-Null
#endregion

# Prompt user.
Write-host "Enter the SQL Server instance" -ForegroundColor Yellow 
[string]$SqlInstance = Read-Host " (where the SSIS catalog database(s) reside(s)) " 

if([string]::IsNullOrEmpty($SqlInstance))
{
    Write-host "Nothing entered." -ForegroundColor Magenta
    return
}

[System.Data.SqlClient.SqlConnection]$conn = [System.Data.SqlClient.SqlConnection]::new()
[System.Data.SqlClient.SqlConnectionStringBuilder]$csb = [System.Data.SqlClient.SqlConnectionStringBuilder]::new()
$csb["Initial Catalog"] = "tempdb"
#$csb["Data Source"] = ".\SQL2017"
$csb["Data Source"] = $SqlInstance
$csb["Integrated Security"] = $true
$conn.ConnectionString = $csb.ToString()
[Microsoft.SqlServer.Management.IntegrationServices.IntegrationServices]$intSvcs = [Microsoft.SqlServer.Management.IntegrationServices.IntegrationServices]::new($conn)
           
foreach ($cat in $intSvcs.Catalogs)
{
    foreach ($folder in $cat.Folders)
    {
        foreach ($proj in $folder.Projects)
        {
            <#
                Code enhanced/adapted from Jonathan Garvey's StackOverflow answer:
                https://stackoverflow.com/questions/40439662/get-package-xml-from-ssis-catalog-with-powershell/#43368494
            #>
            [byte[]]$projectBytes = $intSvcs.Catalogs[$cat.Name].Folders[$folder.Name].Projects[$proj.Name].GetProjectBytes();
            [System.IO.Stream]$stream = [System.IO.MemoryStream]::new($projectBytes)
            [System.IO.Compression.ZipArchive]$za = [System.IO.Compression.ZipArchive]::new($stream)

            foreach ($pkgInfo in $proj.Packages)
            {
                foreach ($zipEntry in $za.Entries)
                {
                    if ($zipEntry.FullName -eq $pkgInfo.Name)
                    {
                        [Microsoft.SqlServer.Dts.Runtime.Package]$pkg = [Microsoft.SqlServer.Dts.Runtime.Package]::new()
                        [System.IO.StreamReader]$sr = [System.IO.StreamReader]::new($zipEntry.Open())
                        $pkg.LoadFromXML($sr.ReadToEnd(), $null)
        
                        [System.Collections.Generic.List[string]]$constrainedExeId = [System.Collections.Generic.List[string]]::new()
                        [System.Collections.Generic.List[Microsoft.SqlServer.Dts.Runtime.PrecedenceConstraint]]$precedenceConstraints = `
                            Get-SortedPrecedenceConstraints($pkg.PrecedenceConstraints);
                        [string]$TaskType = $null;

                        #region Iterate through the PrecedentConstraint objects.
                        # Find the executables that are part of precedent constraints.
                        # We'll want to output those first (in order).
                        if ($precedenceConstraints.Count -gt 0)
                        {
                            # Cast the Executable to an EventsProvider class so we can get 
                            # the .Name and .Description property values.
                            [Microsoft.SqlServer.Dts.Runtime.EventsProvider]$container = `
                                [Microsoft.SqlServer.Dts.Runtime.EventsProvider]$precedenceConstraints[0].PrecedenceExecutable
                            $TaskType = Get-TaskType($container)
                                            
                            Add-MetaDataTableRow $cat.Name $folder.Name $proj.Name $pkgInfo.Name $TaskType $container.Name "1" $container.Description
                            $constrainedExeId.Add($container.ID)

                            for ([int]$i = 0; $i -lt $precedenceConstraints.Count; $i++)
                            {
                                $container = [Microsoft.SqlServer.Dts.Runtime.EventsProvider]$precedenceConstraints[$i].ConstrainedExecutable
                                $TaskType = Get-TaskType($container)
                                Add-MetaDataTableRow $cat.Name $folder.Name $proj.Name $pkgInfo.Name $TaskType $container.Name ($i+2).ToString() $container.Description
                                $constrainedExeId.Add($container.ID)
                            }
                        }
                        #endregion

                        #region Find remaining executables that are not part of a precedence constraint.
                        foreach ($exe in $pkg.Executables)
                        {
                            # Cast the Executable to an EventsProvider class so we can get 
                            # the .Name and .Description property values.
                            [Microsoft.SqlServer.Dts.Runtime.EventsProvider]$container = [Microsoft.SqlServer.Dts.Runtime.EventsProvider]$exe

                            if ( -not $constrainedExeId.Contains($container.ID))
                            {
                                $TaskType = Get-TaskType($container)
                                Add-MetaDataTableRow $cat.Name $folder.Name $proj.Name $pkgInfo.Name $TaskType $container.Name "" $container.Description

                                
                            }
                        }
                        #endregion
                        break
                    }
                }
            }
        }
    }
}

$SsisMetaData | format-table | out-host