Logging in Azure

Rolf Tesmer has a detailed post covering how and what to log when using Azure for a modern data warehouse:

In my view – what often doesn’t get enough attention up front are the critical aspects of monitoring, auditing and availability. Thankfully, these are generally not too difficult to plug-in at any point in the delivery cycle, but as like with most things in cloud there are just so many different options to consider!

So the purpose of this blog is to focus on the key areas of Azure Services Monitoring and Auditing for the Azure Modern Data Platform architecture.

Click through for examples from a number of different Azure services.

Drawing SSIS Packages as SVGs

Bartosz Ratajczyk continues a series on taking SSIS packages and generating SVGs from their control flows:

To make things harder, the layout of the sequences and tasks is not some nested XML structure. All of the elements have the same parent – <GraphLayout>, meaning all of them are at the same tree level. Also – there is no attribute showing where a particular object belongs. Almost. In the example with the sequences, I see two regularities:
– the outer container is placed later in the XML, than the inner container
– the @Id attributes show the nesting of the objects

I’m not sure how often I’d use this in practice, but if you want to understand some of the internals of SSIS, this is an interesting series to follow.

Getting to DevOps

Grant Fritchey takes us through some of the baby steps in getting started with DevOps:

However, easy by comparison doesn’t mean just simply easy. There’s a lot of work involved and making mistakes early in the process has repercussions for every later step.

For example, where do your put your code?

Yeah, yeah, I know. Source control. I mean, where in source control do you put the code? What do you call the project and solution? Is it in git, github, Azure, or somewhere else? Choose wisely because every single step of automation you set up after this will be completely dependent on that first choice. Further, putting in github, or example, has repercussions for how you implement automation in Azure DevOps Pipelines.

But hey, no pressure.

Removing Text Between Delimiters in Power Query

Imke Feldmann has a new M function for us:

While there is a native function to fetch text between 2 delimiters in Power Query, there is no such function that removes the text instead. Therefore I’ve created a custom function Text.RemoveBetweenDelimiter. It even lets you choose to remove the delimiters themselves as well via the optional 4th parameter.

Click through for the script and sample uses.

SQL Graph and Hierarchies

Hasan Savran shows how you can use SQL Server graph tables to handle hierarchical data:

The > determines the direction of the query. By changing this arrow into other direction, you can use the same relation table to find who manages employee. For example, let’s rewrite this query and look who manages East Manager

SELECT emp.Name as Emp, emp2.Name as Mgr
FROM Hr as emp, ReportsTo as r, Hr as emp2 
WHERE MATCH(emp<-(r)-emp2) and emp.empid = 8

There are a few relational approaches which work well for hierarchies, though they all come with tradeoffs in terms of read complexity, write complexity, and development complexity. Click through for Hasan’s take on the graph-based approach.

Powershell and Windows Terminal Profiles

Jeffery Hicks shows how you can modify your Windows Terminal profile using Powershell:

I recently updated my Windows 10 systems to the 1903 release. One of the reasons is that I wanted to try out the new Windows Terminal preview. You can find it in the Windows Store. This is bleeding edge stuff and far from complete but promises to a great addition. Now you will be able to have all your command terminals, in one tabbed application and easily be able to switch between them. As I said, this is far from being a finished and polished product. Right now, if you want to add a new profile, that is another terminal, you have to manually edit a json file. If you have VS Code installed, the file will open in that.  Otherwise, I’m assuming you’ll get whatever application is associated with the .json extension.

Read on for a Powershell one-liner which lets you create a terminal profile.

How .NET Code Talks to Spark

Ed Elliott has a great diagram showing how user-written .NET code communicates with Spark over the Java VM:

4. Spark-dotnet Java driver listens on tcp port
The spark-dotnet Java driver listens on a TCP socket. This socket is used to communicate between the Java VM and the dotnet code, the dotnet code doesn’t run in the Java VM but is in a separate process communitcating with the Java VM via that TCP postrt. The year is 2019, we serialize and deserialize data all the time and don’t even know it, hell notepad probably even does it.

It’s serialization & deserialization as well as TCP sockets all the way down.

A Quick Keras Example

Shubham Dangare takes us through a quick example using Keras and TensorFlow in Python:

Keras is a high-level neural networks API, written in Python and capable of running on top of Tensorflow, CNTK  or Theano. It was developed with a focus on enabling fast experimentation. In this blog, we are going to cover one small case study for fashion mnist.

Fashion-MNIST is a dataset of Zalando’s article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28×28 grayscale image, associated with a label from 10 classes. Zalando intends Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.

The end result wasn’t that great, but Shubham was using a sequential model rather than a convolutional neural network, so you can probably take this as a starting point and improve upon it.

Diving Into Clustered Index Seeks

Hugo Kornelis looks in detail at the clustered index seek:

The Clustered Index Seek operator uses the structure of a clustered index to efficiently find either single rows (singleton seek) or specific subsets of rows (range seek). Because a clustered index always contains all columns in a table, a Clustered Index Seek is one of the most efficient ways for SQL Server to find single rows or small ranges, provided there is a filter that can be used efficiently.

The behavior of the Clustered Index Seek operator is in fact exactly the same as the behavior of the Index Seek operator, with only a very few differences as noted below. Though these two operators do have different names, not only in the graphical execution plan but also in the underlying XML, I suspect that in reality they are both using the same internal logic and not a copy of it.

Read the whole thing.

SentryOne Plan Explorer in Azure Data Studio

Aaron Bertrand gives us the behind-the-scenes look at SentryOne’s Plan Explorer extension for Azure Data Studio:

Like the desktop application, the Plan Explorer extension is designed to provide you with richer graphical execution plans for your real-time queries against SQL Server. It is based on a modest subset of functionality; we’ve started with just the plan diagram, a basic statements grid, tooltips, and access to the XML (so you can open the plan in other tools). We will add more features to the extension over time to try to get you as close to full parity with the desktop client as possible.

I gave it a quick try this weekend and had to pop XML results into the desktop client to get what I really wanted to see, but I’m excited over what this looks like medium-term.


July 2019
« Jun