Press "Enter" to skip to content

Curated SQL Posts

Learning Experiences from Transactional Replication

Ned Otter shares war stories:

I’ve dealt with SQL replication for decades, and in a sense, not a lot has changed. I mean this from a basic configuration and troubleshooting perspective, though it has in some ways been extended a bit through the years, for new SQL Server features (like In-Memory OLTP, Azure, etc.).

Many refer to replication as the the Swiss Army Knife of SQL Server, and I can understand why, but with this “extreme flexibility” comes “extreme shortcomings”, and this post will delve into some of the issues you should be aware of.

Click through for plenty of useful tips.

Comments closed

Strongly Type those Parameters

Erik Darling has a recommendation:

When working with ORMs, care has to be taken to strongly type your parameters to match the data type, length, precision, and scale of the columns those parameters will be compared to. Time and time again, I see the same patterns with string parameters:

– They’re unnecessarily typed as Unicode/nvarchar

– They’re not defined with an appropriate length

– They’re used as catch-all parameters for temporal types (dates, etc.)

Spoiler: these aren’t benefits.

Comments closed

Give Only Table Creators Access to Tables

Ronen Ariely takes us to crazy town:

From time to time someone come to the forum with an interview questions which are not a real scenario on live servers and in other cases the requirement on live servers should be implemented and you better re-design your system so you will not need this recruitment. What ever the reason is, you might want to know how the task can be done and this is what we have in the following request on stackoverflow question.

So.. if someone ask, let’s provide the answer…

I find this interesting in a macabre fashion. I’d really hate to be in a position where the information is useful, though.

Comments closed

Importing Azure active Directory Users into Power BI

Reza Rad gets an assist:

There are two main methods to fetch the Azure Active Directory information; Microsoft Graph, or PowerShell Cmdlets. Both methods are very useful. However, explaining both in one article will be overwhelming. In this article, I’ll focus on how you can fetch the information using PowerShell Cmdlets. The method I explain here is manual. However, the PowerShell scripts can be automated to run as a scheduled process (I might explain that later in another article too). Let’s see how it works.

The method explained here is exporting the AAD users into a CSV file first, and then Power BI imports data from the CSV. You can use any other intermediate data sources such as Excel, SQL Server, etc if you want to. You just need to use their PowerShell cmdlets or parameters to do that.

Special thanks to Aaron Nelson for helping on preparing the demo for this article. Anytime I have a PowerShell question, he is the master who just finds a way to do it in a few seconds. Connect with him using his blogTwitterGitHub, or LinkedIn profile.

Click through for the Powershell-based solution.

Comments closed

Composite Models via DirectQuery over Power BI Datasets

Paul Turley is living the dream:

Last year I wrote this post about the new composite model feature in Power BI that enables datasets to be chained together using a featured called “DirectQuery for PBI Datasets and AS“. The prospect of creating a data model that virtually layers existing data models together without having to repeat the design, sounds like nothing less than Utopia. We can leverage existing datasets, with no duplicate models, no duplication of business logic, and no duplication of effort. The promise of this capability is that data models may be referenced from other data models without duplicating data. So, is this really possible?

Read on for Paul’s thoughts and how they’ve changed over the past couple of months with new updates.

Comments closed

Learning about RDDs in Spark

Tomaz Kastrun continues a series on Spark. Part 7 ties in R and gives us sample plotting in R and Python:

Let’s look into the local use of Spark. For R language, sparklyr package is availble and for Python pyspark is availble.

Part 8 gets us into the key data structure behind Spark’s success, the Resilient Distributed Dataset:

Spark is created around the concept of resilient distributed datasets (RDD). RDD is a fault-tolerant collection of files that can be used in parallel. RDDs can be created in two ways:
– parallelising an existing data collection in driver program
– referencing a datasets in external storage (HDFS, blob storage, shared filesystem, Hadoop InputFormat,…)

In a simple way, Spark RDD has two opeartions:
– transformations – creating a new RDD dataset on top of already existing one with the last transformation
– actions – to the action, and return a value to the driver program after running a computation on the dataset.

Part 9 looks a bit more at transformations and actions:

Two types of operations are available with RDD; transformations and actions. Transformations are lazy operations, meaning that they prepare the new RDD with every new operation but now show or return anything. We can say, that transformations are lazy because of updating existing RDD, these operations create another RDD. Actions on the other hand trigger the computations on RDD and show (return) the result of transformations.

Most modern work in Spark won’t directly use RDDs, though everything is built on top of them and it’s good to understand the foundation even if you don’t need to write all of those map(), fold(), and reduceByKey() operations yourself.

Comments closed

Restoring a Database in Standby Mode

David Alcock points out a useful database restoration mode:

Here’s a scenario. A user has made several modifications to a database and now needs to restore the database back to a particular point. The problem is that they don’t know the particular time to restore back to, just that they need the database back to before a particular change was made.

If the database is in simple recovery then there’s no options to play with, the database can only go back to the last full and maybe differential backup if they’ve been taken. If the database is using full recovery (I’m skipping over BULK-LOGGED for this post) then we can then apply the transaction log backups taken after the full backup to get back to a point in time by restoring the database with NORECOVERY and then restoring the necessary log backup files until we reach a particular point.

But one of the disadvantages of NORECOVERY is that it doesn’t give us a readable database until we restore with RECOVERY and at that point we can’t restore further log backups to our database so if we have missed anything we’d need to start the whole restore process from the beginning.

Read on for an alternative restore mode which fits the bill.

Comments closed

Timeouts in Power Query Functions

Chris Webb reminds us to look at timeouts in Power Query functions:

In the first post in this series I showed how the Power BI Service applies a limit on the total amount of time it takes to refresh a dataset in the Power BI Service, except when you initiate your refresh via an XMLA Endpoint. In this post I’ll look at the various timeouts that can be configured in Power Query functions that are used to access data.

Every time a Power BI Import mode dataset connects to a data source it goes through a Power Query query, and inside the code of that Power Query query will be an M function that connects to a specific type of data source. Most – but not all – of these M functions have the option to set timeouts. 

Read on to learn more about these timeouts, as well as other Power Query functions which have timeouts by default.

Comments closed

Clearing a Data File with EMPTYFILE

Chad Callihan gets rid of secondary data files:

As I was working on a recent tempdb blog post, I encountered an error when trying to remove data files. Let’s look into the issue you may have removing data files and the solution to get those files cleaned up.

Click through to see how you can empty a data file and remove it without receiving error messages. I’m going to guess that this works better on lightly-used databases more than slammed ones.

Comments closed

A Data Governance by any other Name

Matthew Roche wants a re-naming:

To successfully implement managed self-service business intelligence at any non-trivial scale, you need data governance. To build and nurture a successful data culture, data governance is an essential part of the success.

Despite this fact, and despite the obvious value that data governance can provide, data governance has a bad reputation. Many people – likely including the leaders you need to be your ally if you’re working to build a data culture in your organization – have had negative experiences with data governance in the past, and now react negatively when the topic of data governance is raised.

They now treat data governance as a four-letter word.

Read the whole thing, though I do disagree with Matthew. Changing the name does not change the underlying problems; all it does is make the new name just as hated as the old one because the problems are still there. Call it Data Enablement if you’d like, but if the process is the same and the tools are the same, the outcome is the same, regardless of the name.

Comments closed