Press "Enter" to skip to content

Month: July 2020

Comparing Integration Services and Azure Data Factory

Tim Mitchell compares SQL Server Integration Services to Azure Data Factory:

For the better part of 15 years, SQL Server Integration Services (SSIS) has been the go-to enterprise extract-transform-load (ETL) tool for shops running on Microsoft SQL Server. More recently, Microsoft added Azure Data Factory (ADF) to its stable of enterprise ETL tools. In this post, I’ll be comparing SSIS and Azure Data Factory to share how they are alike and how they differ. I’ll also review the strengths and shortcomings of each, including the architectures in which each of these is likely to do well.

Read on for Tim’s thoughts on the subject. Tim lays out his biases up-front but also gives you a good feel for where both products are in their lifecycles.

Comments closed

Getting Started with Redshift

Rahul Mehta has a primer on AWS Redshift:

AWS Redshift is a columnar data warehouse service on AWS cloud that can scale to petabytes of storage, and the infrastructure for hosting this warehouse is fully managed by AWS cloud. Redshift operates in a clustered model with a leader node, and multiple worked nodes, like any other clustered or distributed database models in general. It is based on Postgres, so it shares a lot of similarities with Postgres, including the query language, which is near identical to Structured Query Language (SQL). This Redshift supports creating almost all the major database objects like Databases, Tables, Views, and even Stored Procedures. In this article, we will explore how to create your first Redshift cluster on AWS and start operating it.

I’m not really the biggest fan of Redshift around, but Rahul does a good job walking us through the basics of the product.

Comments closed

Handling Time Zones in Power BI

Teo Lachev has some tips on working with time zones in Power BI:

A client has Power BI models connected to Dynamics Online. Dynamics stores all dates in UTC instead of keeping the time offset, such as 7/14/2020 1:21:29 AM +00:00. Naturally, the users want to see dates localized to the US Eastern Time zone. Easy, right? Use the Power Query ToLocal time transformation (in the Transform ribbon, expand Time, and then click To Local) to offset with the desired number of hours. But there are a few issues with this approach:

Read on to understand why this approach won’t cut it, and to get one which will.

Comments closed

Incremental Pipline Development with Azure Data Factory

Andy Leonard shows how you can incrementally develop Azure Data Factory pipelines:

A friend pinged me recently to ask about rolling back Azure Data Factory (ADF) pipeline versions. My response was a question: Are you using source control with ADF? That did not help the current situation.

I thought of the way I often build ADF pipelines and shared my methodology, which is relatively simple (it has to be simple for me to understand it!):

Click through for Andy’s approach.

Comments closed

Getting Details by the Max Value of a Column in Power BI

Ed Hansberry solves a problem in a world without window functions:

When you group data in Power Query, you are able to do a number of aggregations, like count, sum, min, max, etc. The problem is each of those aggregations only operate on that column. Let me show you what I mean. Suppose you have this data:

You need to return a table to the DAX model (or an Excel table) that only has one record per product, and you want the latest date, and the sales person associated with that date. So for product A, it is June 18, 2020 and Suresh, for B it is June 14, 2020 and Ana, and so on.

In the SQL world, this is where a combination of common table expression and ROW_NUMBER() could get the top record partitioned by product ordered by date descending. Click through for Ed’s DAX-based solutions.

Comments closed

Contrasting ANOVA against Regression

Stephanie Glen contrasts ANOVA against typical regression techniques using a picture:

If you scour the internet for “ANOVA vs Regression”, you might be confused by the results. Are they the same? Or aren’t they? The answer is that they can be the same procedure, if you set them up to be that way. But there are differences between the two methods. This one picture sums up those differences.

Click through to see that image.

Comments closed

Survival Analysis in Spark

Rab Saker and Bryan Smith hit on a topic close to my heart:

These patterns seem to indicate that KKBox could actually differentiate between customers based on their lifetime potential using information known at the time of acquisition. This information might help inform or steer specific discounts or promotions to customers as they register for a trial. This information might also inform KKBox of which offerings or capabilities to discontinue as some, e.g. Initial Payment Method 35 or the 7-day payment plan as shown in Figure 3, align with exceptionally high churn rates in the first 30-days with little long-term survivorship.

Of course, there are relationships between these factors so that we should be careful in viewing them in isolation. By deriving a baseline risk (hazard) of customer churn (Figure 4), we can calculate the influence of different factors on the baseline in such a manner that each factor may be considered an independent hazard multiplier.  When combined (through simple multiplication) against the baseline, we can plot the a specific customer’s chances of abandoning a subscription by a given point in time (Table 1).

Click through for the story as well as a set of notebooks.

Comments closed

Minimum Permissions Required for Get-DbaDbUser

Shane O’Neill walks us through wants to figure out minimum permissions required for the Get-DbaDbUser cmdlet in dbatools:

I’m not going to sugarcoat things – the person that sent me the request has more access than they rightly need. The “public” access worker did not need any of that access so I wasn’t going to just give her the same level.

Plus, we’re supposed to be a workforce that has embraced the DevOps spirit and DevOps is nothing if it doesn’t include Security in it.

So, if I could find a way to give the user enough permission to run the command and not a lot more, then the happier I would be.

Shane takes us through the process so we don’t have to.

Comments closed

A Warning on Power BI Custom Visuals

Martin Schoombee gives us a warning around relying upon free custom visuals:

Before you get the impression that I’m against custom visuals, let me say this: I love custom visuals! I myself have used many custom visuals in the past and have been very quick to look for a custom visual when I couldn’t get something to display or work the way I needed it to in Power BI.

Custom visuals fill an important gap where the base product is not yet where it needs to be, and what better way for Microsoft to see what people need and where they need to invest more time from a visualization standpoint? It’s an awesome concept and I like it.

Unfortunately there are a few BUT’s to follow, but let me first tell you my story…

Read the whole thing. I like custom visuals a lot, but there are risks in a corporate world, and I don’t necessarily mean security.

Comments closed