Press "Enter" to skip to content

Curated SQL Posts

Building A SQL Server Dockerfile

Andrew Pruski builds up a custom dockerfile with his SQL Server configuration and custom databases:

And there you have it. One newly built SQL container from a custom image running our databases.

Imagine being able to spin up new instances of SQL with a full set of databases ready to go in minutes. This is main advantage that container technology gives you, no more waiting to install SQL and then restore databases. Your dev or QA person can simply run one script and off they go.

I really think this could be of significant benefit to many companies and we’re only just starting to explore what this can offer.

The best part is that it’s quite easy to do, meaning you could set up a guaranteed clean QA image for each continuous integration deployment and know that some process oddity didn’t fail to clean up after itself and thereby wrecked the automated build process.

Comments closed

Threading And Learning

Jay Robinson has a two-pronged tale:

This reminds me of an old saying: If you’re the smartest person in the room, then you’re in the wrong room.*

Now, this is not a commentary on my current team. I work with some really smart people, and I’m very grateful for that. But while my teammate may be one of the best PHP or Node.js coders I know, that doesn’t necessarily translate to an expertise with the .NET Framework. The true test is this – no matter how smart they are, if they’re not catching my mistakes, then I’m not being held accountable.

There is some good advice here on threading (yes, definitely use the newer threading libraries), but also good advice on surrounding yourself with intelligent people who can catch your mistakes.

Comments closed

Articles On R And SQL Server

Tomas Kastrun links to a series of his published articles on working with SQL Server and R:

In past couple of months, I have prepared several articles on R and SQL Server that have been published on SQL Server Central.

The idea was, to have couple of articles covering the introduction to R, to basics on R Server, to some practical cases on R with SQL Server.

There’s a nice flow here, building up from the basics to practical marginal improvements.

Comments closed

DBCC Checks For Large Databases

David Alcock gives a couple methods for performing consistency checks against gigantic databases:

One way to achieve this is to  split up the consistency checks covering smaller objects and native functionality allows us to do just that, we can perform the checks at the table level or indeed if they are implemented at the filegroup level too using the DBCC CHECKFILEGROUP command.

How to go about this is pretty straightforward; take the list of tables, split them into equal(ish) groups. The groups now form a pool of objects and within a nightly (or daily) window perform the check on each object in the pool. This effectively spreads a database consistency check over multiple days, you avoid the impact on production activities but also ensure all objects are checked over time.

Read on for the solution.  I’m also a big fan of Minion CheckDB, which is designed to handle this type of scenario as well.

Comments closed

Custom Power BI Shapes Using R

Koen Verbeeck uses R to create dynamically changing images in Power BI:

You can insert images into Power BI Desktop, but these are static images. If you want them to dynamically change, you need the Image Viewer custom visual. Unfortunately, it doesn’t support measures, only columns. Since we want dynamic changes, fixed column values are not going to work. Someone proposed a work around on the Power BI forums, but this only works if you have a fixed set of attributes you want to slice on (for example, 4 categories). I want a totally flexible solution (e.g. each month we have a couple of new weeks to slice on), so again, not possible.

The only solution I could think of that would still work: using R visuals.

Read on for the solution.

Comments closed

Using Sparklyr To Analyze Flight Data

Aki Ariga uses sparklyr on Apache Spark 2.0 to analyze flight data living in S3:

Using sparklyr enables you to analyze big data on Amazon S3 with R smoothly. You can build a Spark cluster easily with Cloudera Director. sparklyr makes Spark as a backend database of dplyr. You can create tidy data from huge messy data, plot complex maps from this big data the same way as small data, and build a predictive model from big data with MLlib. I believe sparklyr helps all R users perform exploratory data analysis faster and easier on large-scale data. Let’s try!

You can see the Rmarkdown of this analysis on RPubs. With RStudio, you can share Rmarkdown easily on RPubs.

Sparklyr is an exciting technology for distributed data analysis.

Comments closed

Wall Clocks Lie

Lonny Niederstadt digs into some strange behavior on one of his VMs:

Rats.  That was a really good idea, too.  Nothin’.  Or….
(I want to call attention to the change in maximum axis value for CPU utilization.  I nearly *always* use 100% as the maximum value, In these last two graphs I’ve used a maximum value of 24% because utilization of each of the 4 vcpus is really low and I felt it made the graphs slightly more readable.)
Hey!! Perfmon was collecting 30 second samples.  But two intervals after 2:06:22 is 2:07:52 rather than 2:07:22.  The system lost 30 seconds in the span of a minute.  And two intervals past 2:07:52 is 2:11:22 – the system lost 150 seconds in a minute?

It would appear that there are a couple of issues with this box, one of which is a lack of appreciation of the current time.

Comments closed

Azure Price Cuts

Brad Sams reports that Azure VM and Azure Blob Storage prices are going down:

Microsoft, Amazon and now Google are in a heated cloud race to grab as much market share as they can as they know that once a company starts using their service, the likelihood of switching platforms is low. With more services being offered via cloud vendors and more companies diving into these platforms, Microsoft and Amazon are frequently cutting prices to create a competitive advantage.

On this edition of ‘cloud cuts’, Microsoft is slashing prices on some of its Azure Virtual Machines and its Blob storage. The company is dropping the prices on compute-optimized instances – F Series and general purpose instances – A1; the company says pricing cuts on its D-series general purpose instances will happen in the near future.

Blob storage is down to 2 cents per GB per month for hot storage.  That’s slightly below S3’s 2.3 cents per GB per month.

Comments closed

Integrating Data Lake Storage With SQL Data Warehouse

Sachin Sheth alerts us to a new integration point between Azure Data Lake Storage and Azure SQL Data Warehouse via Polybase:

Most common patterns using Azure Data Lake Store (ADLS) involve customers ingesting and storing raw data into ADLS. This data is then cooked and prepared by analytic workloads like Azure Data Lake Analytics and HDInsight. Once cooked this data is then explored using engines like Azure SQL Data Warehouse. One key pain point for customers is having to wait for a substantial time after the data was cooked to be able to explore it and gather insights. This was because the data stored in ADLS would have to be loaded into SQL Data Warehouse using tools row-by-row insertion. But now, you don’t have to wait that long anymore. With the new SQL Data Warehouse PolyBase support for ADLS, you will now be able to load and access the cooked data rapidly and lessen your time to start performing interactive analytics. PolyBase support will allow to you access unstructured/semi-structured files in ADLS faster because of a highly scalable loading design. You can load the files stored in ADLS into SQL Data Warehouse to perform analytics with fast response times or you use can the files in ADLS as external tables. So get ready to unlock the value stored in your petabytes of data stored in ADLS.

I’ve been waiting for this support, and I’m happy that they were able to integrate the two products.

Comments closed

Upgrading Replication To SQL Server 2016

Amit Banerjee explains the steps for upgrading replicated SQL Server instances to 2016:

  • A Distributor can be any version as long as it is greater than or equal to the Publisher version (in many cases the Distributor is the same instance as the Publisher).

  • A Publisher can be any version as long as it less than or equal to the Distributor version.

  • Subscriber version depends on the type of publication:

    • A Subscriber to a transactional publication can be any version within two versions (n-2) of the Publisher version. For example: a SQL Server 2012 Publisher can have SQL Server 2014 and SQL Server 2016 Subscribers; and a SQL Server 2016 Publisher can have SQL Server 2014 and SQL Server 2012 Subscribers.

    • A Subscriber to a merge publication can be any version less than or equal to the Publisher version.

There are several options available here, and it’s worth reading through if you’re looking to upgrade soon.

Comments closed