Press "Enter" to skip to content

Month: September 2019

Azure Data Factory and Schema Drift

Mark Kromer walks us through two techniques we can use in Azure Data Factory to deal with schema drift:

Azure Data Factory’s Mapping Data Flows have built-in capabilities to handle complex ETL scenarios that include the ability to handle flexible schemas and changing source data. We call this capability “schema drift“.

When you build transformations that need to handle changing source schemas, your logic becomes tricky. In ADF, you can either build data flows that always look for patterns in the source and utilize generic transformation functions, or you can add a Derived Column that defines your flow’s canonical model.

Click through for the discussion and comparison. Schema drift has been the bane of Integration Services’s existence, so it’s good to see them tackling the idea in Azure Data Factory.

Comments closed

Hidden Modals and SSIS

Jana Sattainathan knows how to make hidden modals rise to the surface in SSMS:

I am not sure if run into this but I do often enough on my SSMS 17.9.1 version – suddenly, the SSMS window will not register any mouse-clicks although it will seemingly look fine. The reason is because there is a Modal Window somewhere hidden. Sometimes, ALT-TAB followed by ESC will do the trick but not always.  In the beginning, when I did not know about the modal window, I used to kill SSMS and restart it but you don’t have to.

Click through to see how to do this and save yourself a task killing.

Comments closed

Workload Analysis with Query Store

Erin Stellato shows how you can mine the Query Store tables to learn more about your workload:

The query text and plan are stored in two separate tables, sys.query_store_query_text and sys.query_store_plan, respectively.  The text is stored as a nvarchar(max) data type, and the plan is stored as varbinary(max).  This means that the data can be mined, looking for patterns and explicit use of objects.  Want to know what queries use an index?  Look for it in the plans.  Want to know what queries have a RECOMPILE hint on them?  Look for it in the query text.  Anything you want to find with regard to query text or plans is there, you just have to know how to query the data.

It’s hard to tune queries if you don’t know what’s running.

Comments closed

Custom Formatting Numbers in Power BI

Chris Webb shows how you can use custom formats to display numbers more easily in Power BI:

Now that we can apply custom format strings to fields and measures in Power BI in the September 2019 release, I thought it would be useful to provide some examples of what’s possible with this very flexible new feature because the existing documentation for VBA isn’t easy to make sense of. In fact there’s so much to say I’m going to have to write a series of blog posts to cover everything! In this first post I’m going to look at formatting numbers.

When you need an exact number, a thousands separator goes a long way.

Comments closed

Flink’s State Processor API

Seth Wiesman and Fabian Hueske show off Apache Flink’s State Processor API:

The State Processor API that comes with Flink 1.9 is a true game-changer in how you can work with application state! In a nutshell, it extends the DataSet API with Input and OutputFormats to read and write savepoint or checkpoint data. Due to the interoperability of DataSet and Table API, you can even use relational Table API or SQL queries to analyze and process state data.

For example, you can take a savepoint of a running stream processing application and analyze it with a DataSet batch program to verify that the application behaves correctly. Or you can read a batch of data from any store, preprocess it, and write the result to a savepoint that you use to bootstrap the state of a streaming application. It’s also possible to fix inconsistent state entries now. Finally, the State Processor API opens up many ways to evolve a stateful application that were previously blocked by parameter and design choices that could not be changed without losing all the state of the application after it was started. For example, you can now arbitrarily modify the data types of states, adjust the maximum parallelism of operators, split or merge operator state, re-assign operator UIDs, and so on

Read on to learn more about how this works.

Comments closed

Creating Big Data Clusters with Azure Data Studio

Niels Berglund takes us through the creation of a Big Data Cluster by using Azure Data Studio to generate a notebook:

I wrote a blog post back in November 2018, about how to install and deploy SQL Server 2019 Big Data Cluster on Azure Kubernetes Service. Back then SQL Server 2019 Big Data Cluster was in private preview, (CTP 2.1 I believe), and you had to sign up, to get access to the “bits”. Well, you did not really get any “bits”; what you did get was access to Python deployment scripts.

Now, September 2019, the BDC is in public preview (you do not have to sign up), and it has reached Release Candidate (RC) status, RC 1. The install method has changed, or rather, in addition to installing via deployment scripts, you can now also install using Azure Data Studio deployment notebooks, and that is what this blog post is about.

Having gone through this myself, there’s quite a bit of reading involved in the setup, but they make the process pretty smooth. This also shows off one of the key benefits of notebooks: documentation and code together.

Comments closed

Key Concepts of Convolutional Neural Networks

Srinija Sirobhushanam takes us through some of the key concepts around convolutional neural networks:

How are convolution layer operations useful?
CNN helps us look for specific localized image features like the edges in the image that we can use later in the network Initial layers to detect simple patterns, such as horizontal and vertical edges in an image; and deeper layers detect complex patterns.

This is a rather high-level description of the topic, but it serves to give an understanding of what you need to know before building and using these networks.

Comments closed

Using dbatools Instead of RDP

Garry Bargsley shows how we can use dbatools to do some of the things which we might naturally do with Remote Desktop:

So you hear this spread across the Twitterverse and Blogosphere. You should not RDP your SQL Servers to do administrative work. My nature has always been to troubleshoot issues from the server in an RDP session.

When I received a disk space alert on a development system I was about to RDP and do my thing. But I said wait, let me approach this from a different perspective…

To the rescue is dbatools as always seems to be the case these days.

Remoting is much less resource-intensive and it lets you scale out to dozens, hundreds, or thousands of servers without any more effort on your part. It’s rare that you get constant scaling, so take advantage of it where you can.

Comments closed

File Sizes in dbatools

Chrissy LeMaire gives us several ways to format file sizes with dbatools:

Within dbatools, you may notice file sizes are pretty and human-readable.

That was some C# based magic created by Microsoft PFE and creator of PSFrameworkFred Weinmann. In the background, SQL Server often gives us different types of numbers to represent file sizes. Sometimes it’s bytes, sometimes it’s megabytes. We wanted to standardize the sizing in dbatools, and thus the dbasize type was born.

Human-readable file sizes are great but they make it difficult to compare when piping sets of data to Format-Table. Knowing how to override this when necessary gets you the best of both worlds.

Comments closed