Press "Enter" to skip to content

Author: Kevin Feasel

Automating Columnstore Index Partition Rebuilds

Brett Powell has a procedure for us:

This post provides an example of a stored procedure which A) identifies the last two partitions of an Azure Synapse Analytics SQL pool table (which uses the columnstore index (default)) and B) rebuilds the index for these two partitions. Additionally, a sample PowerShell-based Azure Automation runbook is included for scheduling the execution of this procedure.

This post follows up on the previous post regarding a Power BI template to be used to analyze the health or quality of a columnstore index. For example, the template shared may help you find that the last one or two partitions such as partition numbers 39 and 40 out of 40 partitions may have many open (uncompressed) and/or not-optimized rowgroups. The cause of these low quality partitions could be that recent and ongoing data processing events are impacting these partitions (inserts,updates). Perhaps partitions 39 and 40 refer to the current and prior month for example.

Read on for the link to the script, as well as details on how to use it.

Comments closed

Traits in Scala

Sanjana Aggarwal explains the notion of Traits in Scala:

Traits are a fundamental unit of code reuse in Scala. Trait encapsulates method and field definitions, which can be reused by mixing into classes.

Two most important concept about Traits are :-

– Widening from thin interface to rich interface
– Defining stackable modifications.

There are some rather powerful things you can do with traits in Scala, though it’s important to be careful not to overdo it.

Comments closed

Free Data Sets

Chris Taylor puts on the Santa Claus getup:

This is just a short overview page of various dataset sources I’ve used in the past for usage in my SQL Server, Azure, AWS and Power BI demo’s. Most if not all of these are free as I don’t like paying for stuff and there are some whereby you can generate your own data if there is something more specific you need……..just be careful as although its “randomly” generated, I have had issues where I pushed the files to github and as it happens, one of the email addresses randomly generated was that of an employee of a company which I’d never heard of. Pure chance (I believe) but something to be mindful of if that is your intention for use.

Chris also has sub-links to Google and Kaggle data sets.

Comments closed

Ambiguous Columns in Queries when Using One Table

Dave Bland shows how easy it is to get the “Ambiguous column name” error message when querying from a single table:

When I added the “*”, this is where I received unexpected results.  All I did was add the “*”.  Looking at the code below, you can see SQL Server is having issues with the Name column in the ORDER BY.

I do wish SQL had a symbol representing “everything else,” where the engine of choice would include all columns except those explicitly named. I know there’d be trickiness around things like “LTRIM(ColumnA) AS TrimmedColumnA” but that’d be for the language designers to figure out…

Comments closed

Using Docker Volumes to Hold SQL Server Databases

John Morehouse shows how to use volumes to expose data—such as SQL Server data and log files—to a Docker container:

Over the past couple of blog posts, I have been talking about the versatility of deploying SQL Server with Docker.  This combination is a great way to quickly and easily spin up local SQL Server instances.  In the most recent post, I talked about a method to copy and restore a sample database into a Docker container that is running SQL Server.   In this post, I am going to talk about an easier way to accomplish this by attaching a persistent volume to the container.   With this method you don’t have to copy any files into the container, and it makes the overall process easier and repeatable.

First, before we get into the code, let’s talk about what a volume is.  Essentially, a volume is a location on the host machine that can be referenced by the container.  I think of this as a shared folder that the container can see.  Once attached to the container, it can then read or write to the volume.   You can easily declare the volume when you create the container with a simple switch in the command.

John’s examples are on a Mac, but the concepts are essentially the same for Windows or Linux.

Comments closed

Using VALUES for Multi-Record Operations

Daniel Hutmacher explains some of what you can do with the VALUES clause:

Note the commas at the end of each line, denoting that a new row begins here. Because this runs as a single statement, the INSERT runs as an atomic operation, meaning that all rows are inserted, or none at all (like if there’s a syntax issue or a constraint violation).

I use this construct all the time to generate scripts to import data from various external sources, like Excel, or even a result set in Management Studio or Azure Data Studio.

Daniel also has a new app for us to try out.

Comments closed

Key Lookups and Self-Joins

Erik Darling has an interesting method for eliminating key lookups:

This post isn’t going to go terribly deep into anything, but I do want to make a few things about them more clear, because I don’t usually see them mentioned anywhere.

1. Lookups are joins between two indexes on the same table
2. Lookups can only be done via nested loops joins
3. Lookups can’t be moved around in the execution plan

I don’t want you to think that every lookup is bad and needs to be fixed, but I do want you to understand some of the limitations around optimizing them.

Definitely worth the read.

Comments closed

Kubernetes on Virtualized Hardware

Chris Adkin gives us the pros and cons of running Kubernetes on virtual hardware:

A full discussion on Kubernetes security is beyond the scope of this blog post. However, the Mitre Att&ck Framework provides a comprehensive matrix of security attack patterns. Microsoft have produced a similar style of matrix to cover Kubernetes in this blog. As per the blog, resource hijacking and lateral movement have ramifications for multi-tenant platforms and Kubernetes application delivery techniques via things such as GitOps – where you may have one Kubernetes cluster per code branch. Putting nodes in their own virtual machines, provides an extra layer of defense that can reduce the impact of pods that might become malicious as the result of an attack. VMware vSphere 7.0 (more on this later) takes this concept further by running each pod in its own light weight virtual machine.

Click through for a breakdown of each side’s arguments.

Comments closed

Good Practices when Combining Spark with Cassandra

Valerie Parham-Thompson shares some insights for working with Spark and Cassandra together:

Although we are focusing on Cassandra as the data storage in this presentation, other storage sources and destinations are possible. Another frequently used data storage option is Hadoop HDFS. The previously mentioned spark-cassandra-connector has capabilities to write results to Cassandra, and in the case of batch loading, to read data directly from Cassandra.

Native data output formats available include both JSON and Parquet. The Parquet format in particular is useful for writing to AWS S3. See https://aws.amazon.com/about-aws/whats-new/2018/09/amazon-s3-announces-new-features-for-s3-select/ for more information on querying S3 files stored in Parquet format. A good use case for this is archiving data from Cassandra.

Read on for more advice.

Comments closed