Press "Enter" to skip to content

Author: Kevin Feasel

Reducing Costs in Azure

Matt Robertshaw has a few tips for saving money in Azure:

4. Enterprise Dev/Test subscriptions

Enterprise Dev/Test subscriptions are provided as part of Microsoft Enterprise Agreements (EA).  They’re designed for teams of Visual Studio subscribers to run development and test workloads in Azure but at discounted rates, specifically on Windows virtual machines and exclusive gallery images.  This is significant because I see many clients using regular Azure subscriptions for development and test, which means they’re potentially paying more than they need to.

One additional point I’d make, though it kind of fits in with points 7 and 9, is to try to make your own services “cloud-first.” In other words, a typical cloud migration takes services built for on-prem data centers, converts the servers into VMs, ships those VMs up into Azure/AWS/Google/whatever, and then you end up paying more than you did on-prem.

Instead, read up a bit on cloud architecture and see how you might be able to change a service to fit that model. Instead of having a server running all the time, is it possible to store messages in a queue and have functions process these messages on a given schedule? Can you use expensive tools like Azure Synapse Analytics to perform nightly data processing and move the results to a much less expensive Azure SQL Database?

Comments closed

Optimizing Power BI Merge Performance with Table.Join

Chris Webb shows us another way to optimize Power BI merge performance:

The SortMerge algorithm, last in the list above, is the focus of this blog post. I mentioned in my earlier posts that the reason that merge operations on non-foldable data sources are often slow is that both of the tables used in the merge need to be held in memory. There is an exception though: if you know that the data in the columns used to join the two tables is sorted in ascending order, you can use the Table.Join function and the SortMerge algorithm and the data from both sources can be streamed rather than held in memory, which in turn results in the merge being much faster.

That’s the same in the relational world: merge joins are the fastest, assuming that your data is pre-sorted in the proper manner.

Comments closed

Using UDFs in Spark without Registration

Sourabh Mehta shows how we can immediately call a user-defined function in Spark without registering it first:

Here, we will demonstrate the use of UDF via a small example.

Use Case: We need to change the value of an existing column of DF/DS to add some prefix or suffix to the existing value in a new column.

I’m actually not sure what benefit you gain from not registering the UDF, but there probably is one.

Comments closed

Building a Stream Processing App with ksql

The Hadoop in Real World team walks us through event streaming with ksql:

ksqlDB is an event streaming database that enables creating powerful stream processing applications on top of Apache Kafka by using the familiar SQL syntax, which is referred to as KSQL. This is a powerful concept that abstracts away much of the complexity of stream processing from the user. Business users or analysts with SQL background can query the complex data structures passing through kafka and get real-time insights. In this article, we are going to see how to set up ksqlDB and also look at important concepts in ksql and its usage.

Event streaming has become a lot easier over the past couple of years, as Kafka, Spark, and Flink have all matured.

Comments closed

Unit Testing Azure Data Factory Pipelines

Richard Swinbank walks us through what it takes to run a unit test against an Azure Data Factory pipeline:

In part three of this series I looked at functional tests for ADF pipelines: verifying, in isolation, that pipelines are “doing things right”. In this post I’ll be testing isolated pipelines to check that they’re “doing the right things” – this is one description of a unit test. In a general-purpose programming language, unit tests might be used to verify that an individual line of code is executed, or that it has a particular effect. In Azure Data Factory, the smallest unit of development – a “line of code” – is a pipeline activity. I will be writing tests to verify that specific activities are executed (or not) and to inspect their results.

There’s a fair bit involved in this sort of test.

Comments closed

Bulk Loading SQL Server from .NET

Adrian Hills walks us through the SqlBulkCopy class:

Ever been in a situation where rumblings of “process X is too slow” suddenly build into a super-high priority ball of urgency when that next step up in data volume hits? Yeah, that can be fun. No, really, it can be fun because we have strategies to sort this stuff out, right?

In this blog post, I’m going to talk about one particular piece of functionality—SqlBulkCopy—that can help you with bulk data loading. If I had to single out my favorite .NET class, SqlBulkCopy would be at the top of the list. My goal is to introduce you to this class so that maybe it can become a part of your tool belt, too.

Click through to see how it works. If you’re familiar with SSIS, you’re already familiar with the concept if not the specifics.

Comments closed

Power Query Performance Differences in When You Remove Columns

Chris Webb continues a series on optimizing Power Query merge performance:

In my last post I demonstrated how the size of a table affects the performance of Power Query merge operations on non-foldable data sources in Power BI. Specifically, I showed that removing columns from the tables involved in a merge before the merge took place improved performance. But does it matter when you remove the columns? Is it enough to only select the columns you need when you expand the nested table returned by a merge, for example, or just to remove columns after the merge step? So, today’s question is:

Does it make a difference to Power Query merge performance if you remove unwanted columns from your source tables in the step before the merge or in the step afterwards?

Read on for the result, as well as a pleasant surprise around Power BI’s capabilities.

Comments closed

Backing Up SQL Server Instance Configuration

Claudio Silva has started a series on backing up your SQL Server instance configuration. Part 1 walks us through the basic process:

If you have never used this command, you can test for a single instance by running the following:

Export-DbaInstance -SqlInstance "devInstance" -Path "D:\temp"

This will create all scripts in the D:\temp folder. A folder named “devInstance-{date}” will be created.
In this folder, you will find 1 file per ‘object type’. The file names are in the form of “#-.sql” where the # is a number that represents the iterator on the order that the internal calls of the underlying functions happen.

Be sure to read the sections around passwords!

Part 2 introduces parallelism into the mix:

There are a couple of options, like the native PowerShell cmdlets Start-Job/Stop-Job a.k.a background jobs, Runspaces jobs and Thread jobs but I will just mention two of them. One is a nice addition to the most recent version of PowerShell (v7) and the other using a PowerShell module.

In case you don’t know, with PowerShell v7 it’s possible to use a new option -Parallel with ForEach-Object. Check PowerShell’s team blog post PowerShell ForEach-Object Parallel Feature.

However, because I don’t have (yet :-)) PS7, I will keep leveraging on PoshRSJob module, which uses runspaces, created by Boe Prox (T | B).

There’s a lot of meat in that second part, so check it out.

Comments closed

Portfolio Optimization with SAS and Python

Sophia Rowland shows off the sastopypackage:

I started by declaring my parameters and sets, including my risk threshold, my stock portfolio, the expected return of my stock portfolio, and covariance matrix estimated using the shrinkage estimator of Ledoit and Wolf(2003). I will use these pieces of information in my objective function and constraints. Now I will need SWAT, sasoptpy, and my optimization model object.

Read on for a demo.

Comments closed