2017-03-17

Big Data Processing and McKayla Maroney Face

Dave Mason - SQL Server Big Data

I've been taking a few steps down a path that's new for me. So keep that in mind if you've been down this path too. Because you've likely traveled farther than me and the lens you look through offers more clarity than mine.

What's prompted this post is my reaction to Big Data processing. The examples I've seen involve a bunch of source data files stored in HDFS, which is commonly hosted in a blob container in Windows Azure storage. The data is unstructured or semi-structured, and often consists of delimited fields. Data is transformed via Map/Reduce jobs using tools like Hive or Pig. What happens after that? Probably reporting, analysis, and/or data analytics. But I haven't gotten that far yet.

If you're a data professional, a DBA, a developer, or the like...after being around long enough, you're bound to encounter a "new" technology that makes you stop and say "Hey! That's not new. I've been doing that for years!" (or something like that). It's an offputting feeling, maybe even to the point of feeling insulted or offended.

With Big Data processing, one of the recurring commonalities I've seen is taking data from a group (folder?) of files and returning the data as a set, much like the result of a T-SQL query. And every time I see it, I keep thinking it looks like a larger scale, much slower variation of OPENROWSET with a file format. I'm not feeling impressed. I could do the same thing with T-SQL (maybe with the help of an SSIS package) and keep everything within the SQL Server realm. Yeah, I suppose I'd probably have a hard time scaling out. No, I can't compete with Azure and Microsoft's assets. But my employer has three data centers with some nice hardware. And I know how to write multi-threaded apps (it's a hell of a lot easier than it used to be). I bet I could herd a hugely impressive number of cats...if I had to.

No, I'm not looking to build a better mousetrap. It's just my gut reaction to seeing something new. I've spent many years happily working with SQL Server and T-SQL. It's a great way to make a living. But so far, seeing other aspects of Big Data that accomplish what SQL Server can already do isn't swaying me or getting me excited. Indeed, I've got McKayla Maroney face. If Big Data were a puzzle, it would surely have many pieces. I've only seen a couple of them. I'm gonna keep looking.


SHARE