2015-10-02

SQL Server: Attack Of The Clones

Virtualization is wonderful, isn't it? Among other things, "spinning up" a new VM by cloning a preconfigured template or another existing VM is pretty cool. Installing an OS, various roles & features, and a boatload of patches & updates doesn't rank very high on the fun index. But I won't clone a VM that's a SQL host. There's a lot that can go wrong. To begin with, there are the following considerations:

  • Was the source VM part of a SQL Server failover cluster?
  • Was it involved in replication?
  • Was it configured to use SSRS?
  • Was it configured to use database mirroring?
  • Are there remote logins and/or linked servers?
  • Are there client aliases that use named pipes?
SQL Server Clones

That's a lot of things to consider 1 It's not an exhaustive list--I'm sure there are many other things that could be added. . And it's just native SQL Server features and functionality. Ya' know, the documented stuff that we should know to look for. What about the undocumented stuff? Things like the SQL Agent jobs, scripts on the file system, custom code from the dev team, etc. All of that can have some weird ramifications that can be unpleasant to learn about the hard way.

On two occasions, I've had a SQL host cloned. Both times, I was unaware that the cloning had occurred. @@SERVERNAME was not changed via sp_dropserver/sp_addserver. SQL Agent jobs were stepping on each other (including backup jobs, which were running on the same schedule and trying to write out the same file names to the same UNC path). Sometimes the jobs on the original VM succeeded while the jobs on the cloned VM failed, and sometimes vice versa. I'd get email alerts for failed jobs. When I went to the original SQL host 2 I had no idea that the clone existed yet. and looked at the SQL Agent log, the failed jobs didn't synch up with the email alerts. Imagine my confusion.

The clones existed for days. If not for some overheard sysadmin office chatter, it could have gone on much longer. I got lucky both times. No data was lost or contaminated. I was able to get good backups soon after discovering what had happened. It makes me feel helpless knowing how easy it is for someone else to clone one of my SQL boxes without my knowledge. I suppose I could have a heartfelt conversation with everyone in the hypervisor group. But would they care? And would they remember the next time?

I'm not sure if I can be completely self-reliant to avert a cloned SQL host. But maybe I can be alerted to it--or at the very least, prevent some damage during that time when I don't know what has happened. So here's what I came up with: a stored proc that runs automatically at startup and compares the original machine name and domain name to the values observed dynamically at startup. If either of them don't match, do something drastic.

USE master
GO

IF EXISTS (
 SELECT *
 FROM INFORMATION_SCHEMA.ROUTINES r
 WHERE r.ROUTINE_SCHEMA = 'dbo' AND r.ROUTINE_NAME = 'ClonedServerCheck'
)
 DROP PROCEDURE dbo.ClonedServerCheck 
GO

DECLARE @Tsql NVARCHAR(MAX)
SET @Tsql = '
CREATE PROCEDURE dbo.ClonedServerCheck
AS
/*
 Purpose:
 Check the current host name & current host domain to the hard-coded values
 that existed for the original host at install time.  Ensure neither has changed.
 
 Inputs:
 None

 History:
 10/01/2015 DMason Created
*/
DECLARE @OrigHost VARCHAR(128) = ''' + CAST(SERVERPROPERTY(N'MachineName') AS VARCHAR) + '.' + DEFAULT_DOMAIN() + '''
DECLARE @CurrHost VARCHAR(128) = CAST(SERVERPROPERTY(N''MachineName'') AS VARCHAR) + ''.'' + DEFAULT_DOMAIN()

IF @OrigHost <> @CurrHost
BEGIN
 /*
  Get creative here.  There are lots of possibilities:
  1.  Try sending an email/alert.
  2.  Take all user databases offline.
  3.  Disable all SQL Agent jobs.
  4.  RAISERROR ... WITH LOG (so there''s something a sysadmin might see in the Windows Event Log.)
  5.  SHUTDOWN WITH NOWAIT (extreme measure-proceed with caution)
 */
END
'

/*
 Take a look at the output.  Does everything look ok?
 Then uncomment the EXEC line and re-run the script.
*/

PRINT @Tsql
--EXEC sp_executesql @Tsql;
GO

When you're ready to proceed, configure the stored proc to run automatically at startup.

EXEC sp_procoption @ProcName = 'dbo.ClonedServerCheck',
 @OptionName = 'startup',
 @OptionValue = 'true' 
GO

It may be months or years before I get an opportunity to assess how well this works (if at all). If it happens, I'll update this post with my findings.

UPDATE 02/24/2016

It works! Here's an event log screen shot of a SQL host that was cloned:



The code I had in place attempted to "phone home" via email, but that part failed. Everything else worked as I hoped:
  • All user databases were taken offline.
  • All SQL Agent jobs were disabled.
  • RAISERROR sent SEVERITY level 19 error messages to the SQL log every 10 min. Those errors also went into the Windows Event Log (see above). One of the system admins found it and contacted me.

  • SHARE