Press "Enter" to skip to content

Author: Kevin Feasel

Query Store Hints in SQL Server 2022

Erik Darling has thoughts:

When you’re dealing with untouchable vendor code full of mistakes, ORM queries that God has turned away from, and other queries that for some reason can’t be tinkered with, we used to not have a lot of options.

In SQL Server 2022, Query Store gains a new super power: you can add hints to queries without intercepting the code in some other manner.

There are a couple of useful hints which won’t be available but Erik seems mostly upbeat about what is there.

Comments closed

Multidimensional Bloom Filters

The Instaclustr team talks bloom filters:

Bloom filters are space-efficient probabilistic data structures that can yield false positives but not false negatives. They were initially described by Burton Bloom in his 1970 paper  “Space/Time Trade-offs in Hash Coding with Allowable Errors“. They are used in many modern systems including the internals of the Apache® projects Cassandra®, Spark™, Hadoop®, Accumulo®, ORC™, and  Kudu™.

Multidimensional Bloom filters are data structures to search collections of Bloom filters for matches. The simplest implementation of a Multidimensional Bloom filter is a simple list that is iterated over when searching for matches. For small collections (n < 1000) this is the most efficient solution. However, when working with collections at scale other solutions can be more efficient. 

Read on to learn more, including some discussion about an implementation in Cassandra.

Comments closed

Refreshing SQL Managed Instances which Use TDE

Bradley Ball keeps the dev environment up to date:

Hello Dear Reader!  I was working with some friends lately and we needed to set up a process to refresh their Development Environment databases from Production.  Additionally, the databases are encrypted using Transparent Data Encryption, The SQL MI instances are in different regions, and the SQL MI Instances are in different subscriptions.  To duplicate the environment, in order to match our friends, we did the following setup.

Click through for a high-level overview, step-by-step guidance, and a whole lot of detail.

Comments closed

Extracting Data from DAX Measures into CSV

Gilbert Quevauvilliers builds a process:

In this blog post I am going to demonstrate how to use the new Power Automate Flow to extract data from a DAX measure into a SharePoint CSV file.

I got this idea after reading the blog post from the Microsoft Power BI Team: Unlocking new self-service BI scenarios with ExecuteQueries support in Power Automate | Microsoft Power BI Blog | Microsoft Power BI

The great news is that this works on Power BI Pro, Premium Per User and Premium.

Read on to see how.

Comments closed

Reading the SQL Server Error Log

Lee Markum has two ways to read the SQL Server error log:

Reading the SQL Server Error Log is important when troubleshooting many issues for SQL Server. Some example issues would be errors related to Always On Availability Groups, long IO occurrence messages, and login failures.

Unfortunately, the SQL Server Error Log can be very full of information, making specific things hard to find, especially if you’re just visually scrolling the Error Log. Even if you’re recycling the Error Log each day and keeping 30 or more days of error log, on a busy system, the error log can still be quite full, even for a single day.

Click through for those techniques.

Comments closed

Searching for SQL Server Backup Locations

David Fowler can’t remember where those backups went:

Sometimes I find remembering where a particular server sends its backups to a nightmare.

You might have servers backing up to different locations, you might have different locations for individual databases and different locations for your fulls, diffs and logs. You might be trying to get your head around a customer’s set up, where the backups make no logical sense at all.

Whatever you’re up to, at some point, for some reason you’re going to need to access your backup location to get at the files.

Read on for a Powershell script which can help out with this task.

Comments closed

Normalization Layers in Deep Learning Models

Zhe Ming Chng explains why data normalization matters in data science:

You’ve probably been told to standardize or normalize inputs to your model to improve performance. But what is normalization and how can we implement it easily in our deep learning models to improve performance? Normalizing our inputs aims to create a set of features that are on the same scale as each other, which we’ll explore more in this article.

Also, thinking about it, in neural networks, the output of each layer serves as the inputs into the next layer, so a natural question to ask is: If normalizing inputs to the model helps improve model performance, does standardizing the inputs into each layer help to improve model performance too?

Click through for the tutorial.

Comments closed

Comparing Data Analysis in Java and Python

Manu Barriola does some data analysis in a pair of quite different languages:

Python is a dynamically typed language, very straightforward to work with, and is certainly the language of choice to do complex computations if we don’t have to worry about intricate program flows. It provides excellent libraries (Pandas, NumPy, Matplotlib, ScyPy, PyTorch, TensorFlow, etc.) to support logical, mathematical, and scientific operations on data structures or arrays.

Java is a very robust language, strongly typed, and therefore has more stringent syntactic rules that make it less prone to programmatic errors. Like Python provides plenty of libraries to work with data structures, linear algebra, machine learning, and data processing (ND4J, Mahout, Spark, Deeplearning4J, etc.).

In this article, we’re going to focus on a narrow study of how to do simple data analysis of large amounts of tabular data and compute some statistics using Java and Python. We’ll see different techniques on how to do the data analysis on each platform, compare how they scale, and the possibilities to apply parallel computing to improve their performance.

Read on to see how the two compare. Note that this is base Java and Python+Pandas, not Spark/PySpark, Koalas, etc.

Comments closed

When SELECT * Doesn’t

Chad Callihan protects the reputation of SELECT *:

There are plenty of scenarios where using SELECT * can be an issue. Using SELECT * with EXISTS isn’t one of them.

When using EXISTS and SELECT * together, SQL Server is smart enough to realize what you’re doing and knows you don’t care about what’s in the SELECT.

Read on for an example. I’ve trained myself (been trained?) still to use SELECT 1. The reason is, I know SELECT * works exactly the same way but the benefit of using SELECT 1 is that doing this consistently allows you to do a search for SELECT * in your code base to find actual perpetrators (people writing queries expecting to return the entire result set and which may be susceptible to performance problems or future maintainability problems). Using SELECT 1 in the EXISTS clause means you get fewer false positives in that search as a result.

That said, Joe Celko chimes in to provide some of the history behind SELECT * as the convention for references in the EXISTS clause.

Comments closed

Against Tibbling

Hugo Kornelis hates tibbling:

Probably the one I hate most. And one that is stubbornly persistent. Object name prefixing.

Or, to be more precise, the standard that enforces that all table names need to start with a prefix that designates them as a table, and all view names with a different prefix to clearly mark them as a view. Typically tbl_ and vw_ are used, though I have also seen just the letters t and v, and I have seen them with or without underscores.

I hate this coding standard (or rather, naming standard) with a vengeance. For a few reasons. The perceived benefit is in fact not a benefit at all. It is detrimental to a quick understanding of what I see on the screen. But my biggest objection is that it negates one of the greatest benefits of views.

Read on to understand why this is a bad idea. I completely agree with Hugo on this.

Comments closed