Press "Enter" to skip to content

Curated SQL Posts

Stream Analytics And Power BI

Brad Llewellyn gives us a demo on connecting a Stream Analytics stream to Power BI for data analysis:

We understand that streaming data isn’t typically considered “Data Science” by itself.  However, it’s are often associated and setting up this background now opens up some cool applications in later posts.  For this post, we’ll cover how to sink streaming data to Power BI using Stream Analytics.

The previous posts in this series used Power BI Desktop for all of the showcases.  This post will be slightly different in that we will leverage the Power BI Service instead.  The Power BI Service is a collaborative web interface that has most of the same reporting capabilities as Power BI Desktop, but lacks the ability to model data at the time of writing.  However, we have heard whispers that data modeling capabilities may be coming to the service at some point.  The Power BI Service is also the standard method for sharing datasets, reports and dashboards across organizations.  For more information on the Power BI Service, read this.

Brad has a nice demo, so check it out.

Comments closed

Using SWITCHOFFSET

Doug Kline has a video and T-SQL script around date/time offsets and particularly the SWITCHOFFSET function:

— so, before SWITCHOFFSET existed, …

SELECT SWITCHOFFSET(SYSDATETIMEOFFSET(),'-05:00') AS [EST the easy way], TODATETIMEOFFSET(DATEADD(HOUR, -5, SYSDATETIMEOFFSET()), '-05:00') AS [EST the hard way]

— so, thinking of a DATETIMEOFFSET data type as a complex object

— with many different parts: year, month, day, hour, time zone, etc.

— it looks like SWITCHOFFSET changes two things: time zone and hour

This was an interesting video. I typically think entirely in UTC and let the calling application convert to time zones as needed, but if that’s not an option for you, knowing about SWITCHOFFSET() is valuable.

Comments closed

Azure Kubernetes LoadBalancer External IP Woes

Andrew Pruski writes up some issues he had with creating a LoadBalancer service in Azure Kubernetes:

I logged a case with MS Support and when they came back to me, they advised that the service principal that is spun up in the background had expired. This service principal is required to allow the cluster to interact with the Azure APIs in order to create other Azure resources.

When a service is created within AKS with a type of LoadBalancer, a Load Balancer is created in the background which provides the external IP I was waiting on to allow me to connect to the cluster.

Because this principal had expired, the cluster was unable to create the Load Balancer and the external IP of the service remained in the pending state.

There were a lot of steps here; click through to see just how many.

Comments closed

Password Protect Everything, Including Hadoop

George Leopold summarizes a recent Securonix report:

The malware spreads via brute-force attacks on weak passwords “or by exploiting one of three vulnerabilities found on Hadoop YARN Resource Manager, Redis [in-memory key-value store service] and ActiveMQ,” Securonix said. Once logged into database services, the malware can for example delete existing databases stored on a server and create another with a ransom note specifying a bitcoin payment.

The security analyst recommends continuous review of cloud-based services like Hadoop and YARN instances and their exposure to the Internet. Along with strong passwords, companies should “restrict access whenever possible to reduce the potential attack surface.”

It’s pretty standard advice: secure your data, password-protect your systems, and minimize the number of computers that get to touch your computers.

Comments closed

Preparing Text Data For Natural Language Processing

Shirin Glander takes us through the process of preparing natural language data for machine learning using Keras:

As with any neural network, we need to convert our data into a numeric format; in Keras and TensorFlow we work with tensors. The IMDB example data from the keras package has been preprocessed to a list of integers, where every integer corresponds to a word arranged by descending word frequency.

So, how do we make it from raw text to such a list of integers? Luckily, Keras offers a few convenience functions that make our lives much easier.

This is a very nice tutorial if you’re new to the process.

Comments closed

Daylight Savings Time Calculations In Power BI

Fred Kaffenberger shows us how to convert UTC to local time zones with daylight savings time:

Quick tip for DST Refresh Date function Power BI Service. I’ll put the code up front, and explain it below. I’ll also say a bit about how to use it at the end. The United States and other places, like Australia, have a pesky thing called Daylight Savings Time. This means that in Central Time US, the offset from Universal Time Coordinated (UTC) is sometimes -6 and other times it’s -5. While Power Query can convert time zones, it doesn’t handle DST. And, my users like to see when the reports were refreshed as a step in evaluating data quality. In 2019, US DST is from March 10 – November 3 (2 AM local time). So, the functions here need to be updated every year.

As promised, here’s the custom function. 

Click through for the custom function and a nice explanation of how it works.

Comments closed

Reporting Services Scale-Out With Docker

Paul Stanton architects out a scenario using Windocks to create cloned Reporting Services containers in order to scale out Reporting Services:

Database cloning is a key aspect of the SSRS scale out architecture, with database clones providing each container a complete set of databases.  Two or more VMs operated behind a load balancer delivers a highly available and scalable reporting service.  This article focuses on Windows SQL Server containers and Windows Virtual Hard Drive (VHD) based cloning, but the same architecture can support SQL Server Linux containers or conventional instances (Windows or Linux).   Redgate SQL Clone, for example would support SQL Server instances.   Other options include the use of storage arrays instead of Windows VHD based clones.   The trade-offs between SQL containers and instances, and between VHDs and storage arrays are covered in separate sections below. 

The combination of SSRS containers with database cloning is appealing for simplicity and operational savings.  SSRS containers are also drawing interest as part of public cloud strategies, as SSRS containers can be integrated with AWS RDS or SQL Azure databases to provide a horizontally scalable reporting solution.

This is a bit more complex than Reporting Services scale-out with Enterprise Edition, but if you’re on Standard Edition and can’t use scale-out, it’s an interesting alternative.

Comments closed

QueryMemoryLimit In SSAS 2019

Shabnam Watson covers a new setting in Analysis Services 2019:

The purpose of this setting is limit the amount of memory any single query can take. This setting is extremely useful when you want to limit the amount of memory consumption per query for queries across the board. Before this setting, it was possible to have an extremely poorly written query eat up all of a server’s memory and bring all other queries down to a halt. You can see an example of a such a query and SSAS memory settings in my previous post here.

Read on for details about what it does and what happens when a query reaches the memory limit.

Comments closed

The Importance of Cardinality

Bert Wagner shows us why cardinality is important to understand when indexing data:

When building indexes for your queries, the order of your index key columns matters.  SQL Server can make the most effective use of an index if the data in that index is stored in the same order as what your query requires for a join, where predicate, grouping, or order by clause.

But if your query requires multiple key columns because of multiple predicates (eg. WHERE Color = ‘Red’ AND Size= ‘Medium’), what order should you define the columns in your index key column definition?

One of my favorite books for query tuning is a bit long in the tooth at this point but remains quite relevant, and a key point there is to look for ways to drop the largest percent of rows as soon as possible. This applies for good indexes as well: they’ll let you ignore as large a percentage of your irrelevant data as you can, as soon as possible.

Comments closed