Press "Enter" to skip to content

Curated SQL Posts

OpenShift and SQL Server Big Data Clusters

Chris Adkin explains why support for OpenShift is important for SQL Server Big Data Clusters:

One thing that should become immediately apparent when installing and administering an OpenShift cluster, is that it is a lot more prescriptive and opinionated that vanilla Kubernetes. The simple reason for this is that OpenShift is intended to be deployed to environments that require enterprise grade levels of hardening and security. For example, Red Hat mandates the operating system distributions you must use, to the extent that when deploying a cluster on VMware – Red Hat’s documentation recommends the use of OVA’s, compressed files containing install-able virtual machines.

Read on for the full story.

Comments closed

VM Firmware and Windows Secure Boot

David Klee gives us the lowdown on firmware specifications in virtual machines:

The Register is reporting that future versions of Windows Server OS is going to require the TPM 2.0 chip and Secure boot enabled by default. Secure boot is quite helpful to validate that servers boot into trusted environments. It sounds basic and straightforward, but if your VM administrators are not preparing for this change now, a much-overlooked setting in the hypervisor might backfire and you might not be able to enable this setting. That scenario would be a disaster if your security team suddenly issued a decree stating that you must enable this setting by some date.

Read on to see what this means if you’re using Hyper-V or VMware.

Comments closed

Partitioning Tricks

Raul Gonzalez shows us five things you can do with partitioning in SQL Server:

Once we have rebuilt that old data to minimise its footprint and moved it to a cheaper storage tier, if we know no one will have to modify it, it’d be a good idea to make it READ_ONLY.

By making the data READ_ONLY, we can not only prevent accidental deletion or modification, but also reduce the workload required to maintain it, because as we’ve seen before, we can action index maintenance only on the READ_WRITE parts (partitions) of the data where fragmentation might still happen.

Read on for the rest of the tips and note that none of these are directly of the “Make your queries faster” variety, though a couple can have positive performance implications.

Comments closed

Calculating Partitions for Processing Data Files in Apache Spark

Ajay Gupta digs into how to calculate the number of partitions the different Spark APIs use when reading from files:

Until recently, the process of picking up a certain number of partitions against a set of data files, always looked mysterious to me. However, recently, during an optimization routine, I wanted to change the default number of partitions picked by Spark for processing a set of data files, and that is when I started to decode this process comprehensively along with proofs. Hopefully, the description of this decoded process would also help the readers to understand Spark a bit deeper and would enable them to design an efficient and optimized Spark routine.

This is important information if you’re tuning Spark cluster performance.

Comments closed

Apache Spark Connector for SQL Server

The SQL Server team announces an open-sourced Apache Spark connector for SQL Server:

The Apache Spark Connector for SQL Server and Azure SQL is based on the Spark DataSourceV1 API and SQL Server Bulk API and uses the same interface as the built-in JDBC Spark-SQL connector. This allows you to easily integrate the connector and migrate your existing Spark jobs by simply updating the format parameter! 

This appears to be different from the old Spark connector to Azure SQL Database and SQL Server. Also, for anyone potentially confused between it and PolyBase, this is going in the opposite direction: the Spark connector lets you access a SQL Server from an Apache Spark cluster, reading SQL Server’s data and processing it across a number of executor nodes. By contrast, PolyBase lets you read data stored in Spark SQL tables from SQL Server, virtualizing it so that it looks like a regular SQL Server table.

Comments closed

Using PowerShell to Build Excel Files

Mike Bronowski takes us through the ImportExcel Powershell module:

Using Add-PivotTable is straightforward (and from now on I am closing the whole Excel). Note the Activate switch at the end. When you open the Excel file the worksheet that was used in the command will show up first.

A long while ago, I had put together Powershell code to do this kind of work with Excel, but back then we needed to use COM. This looks much simpler.

Comments closed

Power BI On-Premises Data Gateway Error SpooledOperationMissing

Gilbert Quevauvilliers encounters an error:

I got the following error DM_GWPipeline_Gateway_SpooledOperationMissing”,”parameters”:{},”details”:[],”exceptionCulprit”:1

This error was caused because the current Virtual machine where the On-Premise Data Gateway was running was cloned and then started up.

Read on to learn about the implications and how Gilbert was able to solve this issue.

Comments closed

Breaking Changes in Microsoft.Data.SqlClient 2.0

Erik Ejlskov Jensen goes over the list of breaking changes with the most recent version of Microsoft.Data.SqlClient:

Microsoft.Data.SqlClient version 2 has just been released. This library is the latest and greatest .NET client driver for SQL Server and Azure SQL Database – and will be used by EF Core 5. In addition to a number of new features (which I blogged about earlier), this major version release also includes a number of breaking changes.

Click through for the list.

Comments closed

Against Encrypted Stored Procedures

Denny Cherry explains why encrypting stored procedures is a fool’s errand:

To this I point out, that if you’ve encrypted your code so that I won’t look at it by accident, you are actually getting the exact opposite result. Because you are encrypting code that means that I can’t see if. That means that I want to make sure that you aren’t hiding any stupid practices from me. That means that as soon as I see your encrypted procedures I’m decrypting them to see what is going on with this code.

Along with this, because you’ve bothered to encrypt the stored procedures that means that I can’t get an execution plan, and query store can’t be used for the queries within the stored procedure. And since I’m guessing that I can performance tune your database better then your developers can, I’m going to be decrypting the procedures so that I can tune the system.

They’re trivial to decrypt and Denny points out a few reasons why it’s just a bad idea.

Comments closed

Custom Parameters in Azure Data Factory Deployments

Rayis Imayev shows us how to use customer parameters in ARM templates when deploying Azure Data Factory pipelines:

If I needed to visually explain how this custom parameterization works for Azure Data Factory resource, I would picture it this way. Before you solely relied on publishing your ADF code from your collaboration Git branch to the adf_publish branch where ARMTemplateForFactory.jsonand ARMTemplateParametersForFactory.json files live and get further deployed to other environments. You had some flexibility to parameterize your deployment or run some custom code to update ARM templates before they get deployed.

With the introduction of the ADF custom parameterization, you have an additional JSON file arm-template-parameters-definition.json that you can use to define rules to add supplementary parameters to the main ARMTemplateParametersForFactory.json file. There is a very important statement on Microsoft documentation site that explains how this new file operates, “A definition can’t be specific to a resource instance. Any definition applies to all resources of that type”. It’s like a garden rake that will collect all the leaves or none, i.e. if your rule defines a JSON property, let’s say “timeout” of your ForEach loop container, then all timeouts will be scooped into ARM template parameter file.

Read on for the full explanation as well as an example.

Comments closed