Author: Kevin Feasel

SQL Server 2019 introduced batch mode over row store, which allows for batch mode processing to kick in on queries when the optimizer deems it cost effective to do so, and also to open up row store queries to the possibility of Adaptive Joins, and Memory Grant Feedback.

These optimizer tricks have the potential to help with parameter sniffing, since the optimizer can change its mind about join strategies at run time, and adjust memory grant issues between query executions.

But of course, the plan that compiles initially has to qualify to begin with. In a way, that just makes parameter sniffing even more frustrating.

Read on for both the good and the bad.

Comments closed

Creating Firewall Rules With Azure Cloud Shell

Published 2018-10-26 by Kevin Feasel

Kellyn Pot’vin-Gorman shows how you can add a firewall rule for Azure SQL Database from the Azure Cloud Shell:

With my use of scripting and Azure Cloud Shell, I’m automating and building my environment, including SQL Database resources and then have a requirement to access and build the logical objects. This means that I need a firewall rule build for the Azure Cloud Shell I’m working from. The IP for this cloud shell is unique to the session I’m running at that moment.

The requirement to add this enhancement to my script is:

Capture and read the IP Address for the Azure Cloud shell session.
Populate the IP Address to a Firewall rule
Log into the new SQL Server database that was created as part of the bash script and then execute SQL scripts.

Click through for instructions.

Comments closed

Optimizing DAX SWITCH Statements With Variables

Published 2018-10-26 by Kevin Feasel

Marco Russo gives us some advice on optimizing IF and SWITCH statements in DAX:

Though the DAX engine might reuse the result obtained for the same measures in the same filter context (Sales Amount and Sales LY), this is not always the case. In this scenario, variables are a good way to ensure a better optimized code execution.

However, variables should only be used within their respective scope. For example, if a variable is defined before a conditional statement, then the variable will be evaluated regardless of the condition. This has a strong performance impact in case there are disconnected slicers in the report. To elaborate on this, consider the following report where a Time Selection table is used to define a slicer that controls which columns of the matrix should be visible. The matrix contains a single measure called Sales, whose content depends on the period selected in the column.

Read on for more.

Comments closed

Azure SQL Database Hyperscale Tier

Published 2018-10-26 by Kevin Feasel

Chris Seferlis looks at a new service tier offering for Azure SQL Database:

The Hyperscale service tier provides the following capabilities:

Support for up to 100 terabytes of database size (and this will grow over time)
Faster large database backups which are based on file snapshots
Faster database restores (also based on file snapshots)
Higher overall performance due to higher log throughput and faster transaction commit time regardless of the data volumes
The ability to rapidly scale out. You can provision one or more read only nodes for offloading your read workload for use as hot standbys.
You can rapidly scale up your compute resources (in constant time) to accommodate heavy workloads, so you can scale compute up and down as needed just like Azure Data Warehouse

At what cost? I like Chris’s “not inexpensive” understatement here.

Comments closed

Slipstream Installation Of SQL Server

Published 2018-10-26 by Kevin Feasel

Randolph West shows how to install a pre-patched version of SQL Server:

For this example, we will be using the SQL Server 2017 Developer Edition RTM (called en_sql_server_2017_developer_x64_dvd_11296168.iso), and Cumulative Update 11 (called SQLServer2017-KB4462262-x64.exe), which was the latest CU available at the time of this writing.

Place the Cumulative Update in a folder that will contain the patch files. On older versions of SQL Server, this could comprise the latest Service Pack, Cumulative Updates, as well as additional hotfixes you may wish to apply. For instance, as of this writing, SQL Server 2016 requires Service Pack 2, Cumulative Update 3 for Service Pack 2, and two more hotfixes to bring it up to date. We would have to have all four files in this folder.

The actual path does not matter as long as we keep track of where they are. For the purposes of this post, we will assume they are stored in C:\Temp.

Definitely a good idea if you’re installing SQL Server regularly.

Comments closed

Using Table-Valued Parameters In SQL Server

Published 2018-10-26 by Kevin Feasel

Ben Richardson has a post showing how to create user-defined table types and pass them into stored procedures:

Table-valued parameters were introduced in SQL Server 2008. Before that, there were limited options to pass tabular data to stored procedures. Most developers used one of the following methods:

Data in multiple columns and rows was represented in the form of a series of parameters. However, the maximum number of parameters that can be passed to a SQL Server stored procedure is 2,100. Therefore, in the case of a large table, this method could not be used. Furthermore preprocessing is required on the server side in order to format the individual parameters into a tabular form.
Create multiple SQL statements that can affect multiple rows, such as UPDATE. The statements can be sent to the server individually or in the batched form. Even if they are sent in the batched form, the statements are executed individually on the server.
Another way is to use delimited strings or XML documents to bundle data from multiple rows and columns and then pass these text values to parameterized SQL statements or stored procedures. The drawback of this approach was that you needed to validate the data structure in order to unbundle the values.

The .NET framework then makes it easy to pass in an IEnumerable as a table-valued parameter.

Comments closed

Whither Running Kafka On Kubernetes

Published 2018-10-25 by Kevin Feasel

Gwen Shapira walks through some of the costs and benefits of using Kubernetes to host your Apache Kafka brokers:

First, if you are running most of your other applications and microservices on Kubernetes, it becomes the organizational path of least resistance. This is just like how organizations who standardized on VMs have found it very difficult to allocate physical machines with local disks for Kafka.

I see situations with larger organizations where deploying Kafka outside of Kubernetes causes significant organizational headache that involves many approvals. When this is the case, I usually say that this isn’t a good hill to die on. It is possible to run Kafka on Kubernetes, so just do it. You’ll get your environment allocated faster and will be able to use your time to do productive work rather than fight an organizational battle.
And if things go wrong, you’ll get much better service from your internal infrastructure teams, because you’ll be running in an environment that is familiar to them.

Read on for more benefits as well as a few drawbacks.

Comments closed

Medium-Term Effects Of The Cloudera-Hortonworks Merger

Published 2018-10-25 by Kevin Feasel

Alex Woodie describes some of the ramifications of Cloudera’s merger with Hortonworks:

Whatever camp you sit in, the merger undoubtedly caught the attention of the 2,500 organizations that have adopted Cloudera’s Distribution of Hadoop (CDH) or the Hortonworks Data Platform (HDP) over the years — not to mention the thousands of other companies that have adopted open source Apache Hadoop platforms or Hadoop ecosystem components in the cloud. These Global 2000 companies have invested billions of dollars into building giant clusters to store and process many exabytes worth of data, and they’re not going to just turn them off overnight because the two biggest players suddenly decided to merge.

At the same time, these customers need to be reassured that Cloudera has a plan to maintain the investments they’ve already made in HDP and CDH platforms, both in a short-term or tactical sense, as well as in terms of Cloudera’s long-range strategy to evolve its platform to meet emerging future compute and storage needs.

Read on for more detail.

Comments closed

Spark Streaming On Azure Databricks

Published 2018-10-25 by Kevin Feasel

Tristan Robinson shows us how to run Spark Streaming within Azure Databricks:

Real-time stream processing is becoming more prevalent on modern day data platforms, and with a myriad of processing technologies out there, where do you begin? Stream processing involves the consumption of messages from either queue/files, doing some processing in the middle (querying, filtering, aggregation) and then forwarding the result to a sink – all with a minimal latency. This is in direct contrast to batch processing which usually occurs on an hourly or daily basis. Often is this the case, both of these will need to be combined to create a new data set.

In terms of options for real-time stream processing on Azure you have the following:

Azure Stream Analytics
Spark Streaming / Storm on HDInsight
Spark Streaming on Databricks
Azure Functions

Click through for more.

Comments closed

Data Modeling In Cassandra

Published 2018-10-25 by Kevin Feasel

Charmy Garg walks us through some of the basics of modeling tables in Cassandra:

Two basic goals in Cassandra which we should keep in mind:

Spread data evenly around the cluster – You want every node in the cluster to have roughly the same amount of data. Rows are spread around the cluster based on a hash of the partition key, which is the first element of the PRIMARY KEY. So, the key to spreading data evenly is this: pick a good primary key.
Minimize the number of partitions read – Partitions are groups of rows that share the same partition key. When you issue a read query, you want to read rows from as few partitions as possible. Why is this important? [Each partition may reside on a different node. The coordinator will generally need to issue separate commands to separate nodes for each partition you request. This adds a lot of overhead and increases the variation in latency. Furthermore, even on a single node, it’s more expensive to read from multiple partitions than from a single one due to the way rows are stored.]

Charmy also has a couple of pitfalls that people used to the relational database model may hit.

Comments closed