Press "Enter" to skip to content

Author: Kevin Feasel

Reducing Costs in Azure

Matt Robertshaw has a few tips for saving money in Azure:

4. Enterprise Dev/Test subscriptions

Enterprise Dev/Test subscriptions are provided as part of Microsoft Enterprise Agreements (EA).  They’re designed for teams of Visual Studio subscribers to run development and test workloads in Azure but at discounted rates, specifically on Windows virtual machines and exclusive gallery images.  This is significant because I see many clients using regular Azure subscriptions for development and test, which means they’re potentially paying more than they need to.

One additional point I’d make, though it kind of fits in with points 7 and 9, is to try to make your own services “cloud-first.” In other words, a typical cloud migration takes services built for on-prem data centers, converts the servers into VMs, ships those VMs up into Azure/AWS/Google/whatever, and then you end up paying more than you did on-prem.

Instead, read up a bit on cloud architecture and see how you might be able to change a service to fit that model. Instead of having a server running all the time, is it possible to store messages in a queue and have functions process these messages on a given schedule? Can you use expensive tools like Azure Synapse Analytics to perform nightly data processing and move the results to a much less expensive Azure SQL Database?

Comments closed

Using UDFs in Spark without Registration

Sourabh Mehta shows how we can immediately call a user-defined function in Spark without registering it first:

Here, we will demonstrate the use of UDF via a small example.

Use Case: We need to change the value of an existing column of DF/DS to add some prefix or suffix to the existing value in a new column.

I’m actually not sure what benefit you gain from not registering the UDF, but there probably is one.

Comments closed

Building a Stream Processing App with ksql

The Hadoop in Real World team walks us through event streaming with ksql:

ksqlDB is an event streaming database that enables creating powerful stream processing applications on top of Apache Kafka by using the familiar SQL syntax, which is referred to as KSQL. This is a powerful concept that abstracts away much of the complexity of stream processing from the user. Business users or analysts with SQL background can query the complex data structures passing through kafka and get real-time insights. In this article, we are going to see how to set up ksqlDB and also look at important concepts in ksql and its usage.

Event streaming has become a lot easier over the past couple of years, as Kafka, Spark, and Flink have all matured.

Comments closed

Bulk Loading SQL Server from .NET

Adrian Hills walks us through the SqlBulkCopy class:

Ever been in a situation where rumblings of “process X is too slow” suddenly build into a super-high priority ball of urgency when that next step up in data volume hits? Yeah, that can be fun. No, really, it can be fun because we have strategies to sort this stuff out, right?

In this blog post, I’m going to talk about one particular piece of functionality—SqlBulkCopy—that can help you with bulk data loading. If I had to single out my favorite .NET class, SqlBulkCopy would be at the top of the list. My goal is to introduce you to this class so that maybe it can become a part of your tool belt, too.

Click through to see how it works. If you’re familiar with SSIS, you’re already familiar with the concept if not the specifics.

Comments closed

Unit Testing Azure Data Factory Pipelines

Richard Swinbank walks us through what it takes to run a unit test against an Azure Data Factory pipeline:

In part three of this series I looked at functional tests for ADF pipelines: verifying, in isolation, that pipelines are “doing things right”. In this post I’ll be testing isolated pipelines to check that they’re “doing the right things” – this is one description of a unit test. In a general-purpose programming language, unit tests might be used to verify that an individual line of code is executed, or that it has a particular effect. In Azure Data Factory, the smallest unit of development – a “line of code” – is a pipeline activity. I will be writing tests to verify that specific activities are executed (or not) and to inspect their results.

There’s a fair bit involved in this sort of test.

Comments closed

Power Query Performance Differences in When You Remove Columns

Chris Webb continues a series on optimizing Power Query merge performance:

In my last post I demonstrated how the size of a table affects the performance of Power Query merge operations on non-foldable data sources in Power BI. Specifically, I showed that removing columns from the tables involved in a merge before the merge took place improved performance. But does it matter when you remove the columns? Is it enough to only select the columns you need when you expand the nested table returned by a merge, for example, or just to remove columns after the merge step? So, today’s question is:

Does it make a difference to Power Query merge performance if you remove unwanted columns from your source tables in the step before the merge or in the step afterwards?

Read on for the result, as well as a pleasant surprise around Power BI’s capabilities.

Comments closed

Backing Up SQL Server Instance Configuration

Claudio Silva has started a series on backing up your SQL Server instance configuration. Part 1 walks us through the basic process:

If you have never used this command, you can test for a single instance by running the following:

Export-DbaInstance -SqlInstance "devInstance" -Path "D:\temp"

This will create all scripts in the D:\temp folder. A folder named “devInstance-{date}” will be created.
In this folder, you will find 1 file per ‘object type’. The file names are in the form of “#-.sql” where the # is a number that represents the iterator on the order that the internal calls of the underlying functions happen.

Be sure to read the sections around passwords!

Part 2 introduces parallelism into the mix:

There are a couple of options, like the native PowerShell cmdlets Start-Job/Stop-Job a.k.a background jobs, Runspaces jobs and Thread jobs but I will just mention two of them. One is a nice addition to the most recent version of PowerShell (v7) and the other using a PowerShell module.

In case you don’t know, with PowerShell v7 it’s possible to use a new option -Parallel with ForEach-Object. Check PowerShell’s team blog post PowerShell ForEach-Object Parallel Feature.

However, because I don’t have (yet :-)) PS7, I will keep leveraging on PoshRSJob module, which uses runspaces, created by Boe Prox (T | B).

There’s a lot of meat in that second part, so check it out.

Comments closed

Understanding Scatterplots

Alex Velez describes the nature of the scatterplot:

A scatterplot is a niche chart, but it’s one of my favorites! If you are a statistician or work in a technical field, a scatterplot might be your go-to graph type. However, if you don’t perform a lot of statistical analysis, then these charts may be unfamiliar. Regardless of your current comfort level, scatterplots are extremely useful to focus on the relationship between two series—a scenario that is common in both technical and non-technical fields. Let’s explore some of the basics of scatterplots via an example; I’ll also cover tips for designing more effective ones and discuss common variations (bubble charts, connected scatterplots, etc.), too!

Read on for a good explanation of what scatterplots are, variants on the theme, and when they make sense to use.

Comments closed

Portfolio Optimization with SAS and Python

Sophia Rowland shows off the sastopypackage:

I started by declaring my parameters and sets, including my risk threshold, my stock portfolio, the expected return of my stock portfolio, and covariance matrix estimated using the shrinkage estimator of Ledoit and Wolf(2003). I will use these pieces of information in my objective function and constraints. Now I will need SWAT, sasoptpy, and my optimization model object.

Read on for a demo.

Comments closed