Press "Enter" to skip to content

Curated SQL Posts

PolyBase and Azul Zulu OpenJDK

I have a post looking at one of the more interesting changes in SQL Server 2019 CTP 3.2:

One of the more interesting parts of SQL Server 2019 CTP 3.2’s release notes is the relationship between Microsoft and Azul Systems. Travis Wright covers it in some detail, as well as what it means for customers.

Prior to SQL Server 2019 CTP 3.2, installing PolyBase required an installation of Oracle’s Java Runtime Environment 7 Update 51 or higher, either directly from Oracle or through OpenJDK.

Java is still required if you want to read from or write to Hadoop or Azure Blob Storage. Oracle’s flavor of Java is no longer required, however.

Comments closed

Keeping Bash Scripts Reusable

Kellyn Pot’vin-Gorman explains some of the concepts behind scripting for longevity:

I’m going to admit, that the reason I didn’t embrace Powershell at first, was most of the examples I found were of full of hardcoded values.  I found it incredibly obtuse, but I started to realize that it came from many sources who might not have the scripting history that those of other shells, (this was just my theory, not a lot of evidence to prove on this one, so keep that in mind…)  As Powershell scripts have matured, I’ve noticed how many are starting to build them with more dynamic values and advance scripting options, and with this, I’ve become more comfortable with Powershell.

I think the best way to learn is to see real examples, so let’s demonstrate.

Read on for those examples.

Comments closed

Tearing Down and Rebuilding Replication

Pamela Mooney takes us through tearing down replication, restoring a database, and rebuilding transactional replication with scripts:

If you use replication, you have had the situation occur where you had to restore a replicated database.  You’ve have doubtless been paged to restore a replicated database. You have experienced the ineffable joy of being tearing down replication-dependent indexed views (if you have them), blowing away replication, doing the restore, putting replication and indexing back together again, and finally redeploying your indexed views.  I know I have.

In fact, I’ve done it enough times that I didn’t want to do it anymore. So, you may ask, did I go to a different modality of replicating my data?  Did I go to Availability Groups or mirroring instead?  No.  I actually like replication.  It’s invaluable when you need to write code around real-time data (especially from a third party database), but you aren’t able to index the original copy.  It’s been around for a long time and is well vetted, and pretty forgiving, once you understand how it works.  So, no need to reinvent the wheel. I decided to automate replication instead.

This is specific to transactional replication. There’s a whole ‘nother kettle of fish for merge replication.

Comments closed

Getting Powershell 7 Preview on Ubuntu

Max Trinidad shows us how to get the Powershell 7 preview without using apt:

First, look under the release documentation and search for the deb package. In my case I’m install the amd64 version.

Then, right-click on the “powershell-preview_7.0.0-preview.2-1.ubuntu.18.04_amd64.deb”, and select “Copy link address“.

This is also something you could script out. It’s quite useful in scenarios where your apt repositories don’t have the latest version of something but you still need or want it.

Comments closed

Exporting Large Data Sets from Power BI

Imke Feldmann shows how you can export data in an amount which exceeds Power BI’s native limitations:

Although the method is fairly simple, there are simpler methods if you just need the raw data from your data model (and not the specific aggregations or measures that the visual contains):

– Use DAX Studio to download all tables from your data model at once
– Use DAX Studio to download specific tables from your data model (one by one)
– Or use R or Python to download specific tables if you’re comfortable with these languages. This method also allows scheduled refreshes in the service.

In short, this is probably a fifth-best solution, but it does work.

Comments closed

Updated Power BI Violin Chart

Meagan Longoria reviews an update to Daniel Marsh-Patrick’s violin chart custom visual in Power BI:

First, the violin plot is now a certified custom visual. This means that it has been tested by the Power BI team to ensure it meets certain requirements, one of which is that the visual does not access external services or resources. You can be confident your data isn’t being sent externally when you use the violin plot.

As for the functional enhancements, a new legend has been added. This is a great addition to make the chart clearer and more easily read, especially for audiences that may not be familiar with how the violin plot works. The customizable legend calls out what markers are used for mean, median, and quartiles.

Meagan is quite pleased with these updates.

Comments closed

Explaining Tree-Based Algorithms

Stephanie Glen takes us through quick explanations of decision trees, random forests, and gradient boosting:

The three methods are similar, with a significant amount of overlap. In a nutshell:

– A decision tree is a simple, decision making-diagram.
Random forests are a large number of trees, combined (using averages or “majority rules”) at the end of the process.
Gradient boosting machines also combine decision trees, but start the combining process at the beginning, instead of at the end.

Read on for more details. All three are useful algorithms serving similar but slightly different purposes.

Comments closed

Query Store Required Permissions

Andreas Wolter takes us through exactly which permissions are required for Query Store to work:

Typically, there are 3 aspects to the work with the Query Store, which can be reflected in roles:
 
1) Configuration – turning Query Store on and off, clearing the contents, flushing its contents to disk and changing its settings.
2) Viewing the reports or using the DMVs to analyze the Query Store contents (queries, plans and wait statistics) to gain insights but not necessarily having the authority to change anything
3) Actively change Plans by forcing or un-forcing, based on the information obtained from (2)

This is a nice overview of the problem and a fair amount of the solution.

Comments closed

Kubernetes on Windows

Elton Stoneman helps us get started with Kubernetes on Windows boxes:

Now you can take older .NET Framework apps and run them in Kubernetes, which is going to help you move them to the cloud and modernize the architecture. You start by running your old monolithic app in a Windows container, then you gradually break features out and run them in .NET Core on Linux containers.

Organizations have been taking that approach with Docker Swarm for a few years now. I cover it in my book Docker on Windows and in my Docker Windows Workshop. It’s a very successful way to do migrations – breaking up monoliths to get the benefits of cloud-native architecture, without a full-on rewrite project.

Now you can do those migrations with Kubernetes. That opens up some interesting new patterns, and the option of running containerized Windows workloads in a managed Kubernetes service in the cloud.

Elton’s not kidding about this support being new. I’m not sure I’d entrust it for my production work just yet, but I’m glad to see people working on the problem.

Comments closed

Explaining Column Statistics

Bert Wagner takes us through column statistics in SQL Server:

Statistics are the primary meta data used by the query optimizer to help estimate the costs of retrieving data for a specific query plan.

The reason SQL Server uses statistics is to avoid having to calculate information about the data during query plan generation. For example, you don’t want to have the optimizer scan a billion row table to learn information about it, only to then scan it again when executing the actual query.

Instead, it’s preferable to have those summary statistics pre-calculated ahead of time. This allows the query optimizer to quickly generate and compare multiple candidate plans before choosing one to actually execute.

These statistics aren’t perfect, but life is almost always better when you have accurate, up-to-date statistics on relevant columns.

Comments closed