Press "Enter" to skip to content

Author: Kevin Feasel

Get the Previous Non-NULL Value in DAX

Kristyna Hughes calculates the lagged value in DAX:

If you are an avid report designer or user, you may have wanted to see the percent difference between one row and the previous one. Sounds easy right? Well let’s break it down. Percent difference follows a consistent formula: (current value – previous value)/previous value. Current value? Easy peasy. Previous value? That’s where this blog comes into play. It’s fairly simple to get the previous month value or previous date period value because Power BI can tell it exactly what the filter would be (September minus one month is always August ). However, there may not have been any invoices in August, so how can we compare September with the last month that had an invoice?

Click through to see how.

Comments closed

Preventing Concurrent Pipeline Execution in Azure Data Factory

Dave Ruijter and Laura de Bruin want to prevent concurrent runs of a pipeline:

For scheduled triggers, there is nothing out-of-the-box that can help you to prevent concurrent pipeline runs. For tumbling window triggers there is a maxConcurrency property, but keep in mind that this will create a queue/backlog of pipeline runs. It will not cancel any pipeline runs. It depends on your use case if you really want that behavior. 

Instead, the two look at a pair of designs and this post is all about the first one.

Comments closed

Spark DataFrames

Tomaz Kastrun continues a series on working with Apache Spark. Part 10 looks at the DataFrame construct:

We have looked in datasets and seen that a dataset is distributed collection of data. A dataset can be constructer from JVM object and later manipulated with transformation operations (e.g.: filter(), Map(),…). API for these datasets are also available in Scala and in Java. But in both cases of Python and R, you can also access the columns or rows from datasets.

On the other hand, dataframe is organised dataset with named columns. It offers much better optimizations and computations and still resembles a typical table (as we know it from database world). Dataframes can be constructed from arrays or from matrices from variety of files, SQL tables, and datasets (RDDs). Dataframe API is available in all flavours: Java, Scala, R and Python and hence it’s popularity.

Part 11 looks at external R and Python packages and DataFrame support:

When you install Spark, the extension of not only languages but also other packages, systems is huge. For example with R, not only that you can harvest the capabilities of distributed and parallel computations, you can also extend the use of R language.

Part 12 gets into Spark SQL:

Spark SQL is a one of the Spark modules for structured data processing and analysing. Spark provides Spark SQL and also API for execution of SQL queries. Spark SQL can read data from Hive instance, but also from datasets and dataframe. The communication between Spark SQL and execution engine will always result in a dataset or datafrane.

These formats are interchangeable. So interacting with SQL against result from a different API is possible, respectively. Plugging in the Java JDBD or standard ODBC drivers will also give your SQL interface access to different sources. This unification means that developers can easily switch back and forth between different APIs based on which provides the most natural way to express a given transformation.

With API unification, user can access Spark SQL using Scala spark-shell, using Python pyspark or using R sparkR shell.

DataFrames are so popular that they’ve become the de facto standard for working with data in Spark, and .NET languages only work with DataFrames, not with the raw RDDs.

Comments closed

Learning Experiences from Transactional Replication

Ned Otter shares war stories:

I’ve dealt with SQL replication for decades, and in a sense, not a lot has changed. I mean this from a basic configuration and troubleshooting perspective, though it has in some ways been extended a bit through the years, for new SQL Server features (like In-Memory OLTP, Azure, etc.).

Many refer to replication as the the Swiss Army Knife of SQL Server, and I can understand why, but with this “extreme flexibility” comes “extreme shortcomings”, and this post will delve into some of the issues you should be aware of.

Click through for plenty of useful tips.

Comments closed

Finding SQL Servers with the MAP Toolkit

Lee Markum goes to active sonar:

A full tutorial on using the MAP Toolkit is available here on Microsoft Learn. MAP Toolkit installation is fairly straight forward but is also included in the  tutorial from Microsoft that I linked to above. You can install it on your own desktop and then scan Active Directory for SQL Servers. Please be sure to notify your System Administrators BEFORE you run the scan as it will set off intrusion detection alarms.

Sure, it’s no nmap sweep, but any blog post which includes a warning like that is worth the read.

Comments closed

Strongly Type those Parameters

Erik Darling has a recommendation:

When working with ORMs, care has to be taken to strongly type your parameters to match the data type, length, precision, and scale of the columns those parameters will be compared to. Time and time again, I see the same patterns with string parameters:

– They’re unnecessarily typed as Unicode/nvarchar

– They’re not defined with an appropriate length

– They’re used as catch-all parameters for temporal types (dates, etc.)

Spoiler: these aren’t benefits.

Comments closed

Give Only Table Creators Access to Tables

Ronen Ariely takes us to crazy town:

From time to time someone come to the forum with an interview questions which are not a real scenario on live servers and in other cases the requirement on live servers should be implemented and you better re-design your system so you will not need this recruitment. What ever the reason is, you might want to know how the task can be done and this is what we have in the following request on stackoverflow question.

So.. if someone ask, let’s provide the answer…

I find this interesting in a macabre fashion. I’d really hate to be in a position where the information is useful, though.

Comments closed

Composite Models via DirectQuery over Power BI Datasets

Paul Turley is living the dream:

Last year I wrote this post about the new composite model feature in Power BI that enables datasets to be chained together using a featured called “DirectQuery for PBI Datasets and AS“. The prospect of creating a data model that virtually layers existing data models together without having to repeat the design, sounds like nothing less than Utopia. We can leverage existing datasets, with no duplicate models, no duplication of business logic, and no duplication of effort. The promise of this capability is that data models may be referenced from other data models without duplicating data. So, is this really possible?

Read on for Paul’s thoughts and how they’ve changed over the past couple of months with new updates.

Comments closed

Importing Azure active Directory Users into Power BI

Reza Rad gets an assist:

There are two main methods to fetch the Azure Active Directory information; Microsoft Graph, or PowerShell Cmdlets. Both methods are very useful. However, explaining both in one article will be overwhelming. In this article, I’ll focus on how you can fetch the information using PowerShell Cmdlets. The method I explain here is manual. However, the PowerShell scripts can be automated to run as a scheduled process (I might explain that later in another article too). Let’s see how it works.

The method explained here is exporting the AAD users into a CSV file first, and then Power BI imports data from the CSV. You can use any other intermediate data sources such as Excel, SQL Server, etc if you want to. You just need to use their PowerShell cmdlets or parameters to do that.

Special thanks to Aaron Nelson for helping on preparing the demo for this article. Anytime I have a PowerShell question, he is the master who just finds a way to do it in a few seconds. Connect with him using his blogTwitterGitHub, or LinkedIn profile.

Click through for the Powershell-based solution.

Comments closed

Learning about RDDs in Spark

Tomaz Kastrun continues a series on Spark. Part 7 ties in R and gives us sample plotting in R and Python:

Let’s look into the local use of Spark. For R language, sparklyr package is availble and for Python pyspark is availble.

Part 8 gets us into the key data structure behind Spark’s success, the Resilient Distributed Dataset:

Spark is created around the concept of resilient distributed datasets (RDD). RDD is a fault-tolerant collection of files that can be used in parallel. RDDs can be created in two ways:
– parallelising an existing data collection in driver program
– referencing a datasets in external storage (HDFS, blob storage, shared filesystem, Hadoop InputFormat,…)

In a simple way, Spark RDD has two opeartions:
– transformations – creating a new RDD dataset on top of already existing one with the last transformation
– actions – to the action, and return a value to the driver program after running a computation on the dataset.

Part 9 looks a bit more at transformations and actions:

Two types of operations are available with RDD; transformations and actions. Transformations are lazy operations, meaning that they prepare the new RDD with every new operation but now show or return anything. We can say, that transformations are lazy because of updating existing RDD, these operations create another RDD. Actions on the other hand trigger the computations on RDD and show (return) the result of transformations.

Most modern work in Spark won’t directly use RDDs, though everything is built on top of them and it’s good to understand the foundation even if you don’t need to write all of those map(), fold(), and reduceByKey() operations yourself.

Comments closed