Press "Enter" to skip to content

Month: November 2019

Azure Synapse Analytics, Nee Azure SQL DW

John Macintire explains Azure Synapse Analytics:

A cloud native, distributed SQL processing engine is at the foundation of Azure Synapse and is what enables the service to support the most demanding enterprise data warehousing workloads. This week at Ignite we introduced a number of exciting features to make data warehousing with Azure Synapse easier and allow organizations to use SQL for a broader set of analytics use cases.

There’s a fair amount of marketing-speak in here, but the gist is Azure SQL Data Warehouse + Spark + on-demand serverless queries (so you can, among other things, write T-SQL against your HDFS data). I think it has a better chance of long-lasting success than Azure SQL Data Warehouse.

Comments closed

Licensing for SQL Server Big Data Clusters

Mohammad Darab tackles the licensing question for Big Data Clusters:

One of the biggest questions I had when I first started diving into Big Data Clusters was, “What about licensing….how will that work?” With so many different instances running on the storage pool, data pool and compute pool nodes will licensing cost too much? The answer I got from Microsoft was that it will “be competitive”.

Well, with the general availability of SQL Server as of this week, Microsoft is making it way more financially attractive than I thought. Below is a summation of the SQL Server 2019 Licensing Guide for Big Data Clusters.

Click through for the explanation. It really is pretty simple, all things considered.

Comments closed

Debugging Azure Data Factory Data Flows

Mark Kromer takes us through debugging Azure Data Factory Data Flows:

When you are designing your mapping data flows in ADF, you are working against a live Azure Databricks Spark cluster. The size of that cluster is configurable via the Azure Integration Runtime. If you do not configure a custom Azure IR, then you will use the default Azure IR. That sets a very small cluster size by default of 4 cores for a single worker node and 4 cores for a single driver node. In most cases, while debugging and using data preview, that should be fine. But when you start exploring your data with column statistics or increase the sampling size in debug settings, you may find that you’ve exceeded the capacity on that small default cluster. Below are the steps you need to take to increase the size of your debug cluster.

Click through for step-by-step instructions.

Comments closed

SQL Server Assessment API Generally Available

Ebru Ersan announces the general availability of the SQL Assessment API in Powershell:

SQL Assessment API is a new mechanism to evaluate configuration of your SQL Server for best practices. The API methods are used by means of a SQL Server Management Object (SMO) extension and new cmdlets in SqlServer PowerShell module. API is delivered with a ruleset that is highly customizable and extensible. It can be used to assess SQL Server versions 2012 and higher, both on Windows and Linux, as well as Azure SQL DB Managed Instance. More products will be supported in future releases.
 
SQL Assessment API is shipped as part of SqlServer PowerShell module (21.1.18206) and SMO NuGet Package (150.18208.0).

It’s a set of reasonably good practices and also lets you customize for your own environment, so check it out.

Comments closed

Troubleshooting Tez Performance

Dmitry Tolpeko digs through Tez logs to figure out a performance issue:

Why did it take so long to run the job? Is there any way to improve its performance?

Tez Application Master Log
I am going to use the Tez AM log to investigate vertex performance and find possible bottlenecks.
Note that there is the Timeline Server REST API that you can use to get the statistics for Tez jobs, but the application master log is “event-driven”, shows the exact order of all events and contains much more details in general.

Click through for the process.

Comments closed

Preventing Overfitting in ML Models

Tom Jordan gives us four techniques to reduce the likelihood of overfitting in our models:

Dropout
This technique is exclusively used within the training of neural networks, so isn’t applicable to all machine learning models, however can be used in the production of extremely effective neural network models. During the start of each step in the training process, each sub unit of the model, the neuron, has a probability of being included in that step or not. If it doesn’t make the cut, it is effectively deleted from the network for that step, and then reintroduced on the next step.

There are some good techniques here.

Comments closed

Troubleshooting Deadlocks using Extended Events

Jamie Wick helps us figure out what’s causing deadlocks:

Recently I started getting random alerts that a job on one of the SQL servers was failing because of a deadlock problem.

The source of the problem wasn’t immediately discernible as there wasn’t any pattern to when the job was failing. Troubleshooting was further complicated by the database being written/maintained by a 3rd party vendor that encrypts all of their stored procedures.

So… How to find out what was causing the deadlock?

Extended Events are an ideal solution for this situation.

Read on to learn how.

Comments closed

Fun with Markdown in Azure Data Studio

Dave Bland takes us through some of the formatting options available in Azure Data Studio notebooks:

When working in a Notebook you have two types of cells, text and code.  The focus of this post is how to format the text cell.  Of course text goes into this cell so that part is easy and of course the text can say anything you would like to say.  When we work with text in Word, there is a format tool bar that we can use to make it look like we want it.  The text cells do not have this toolbar.

You might be asking, without the format toolbar, does that mean we can’t format the text?  That answer is no….we can still format the text, we just need to do it slightly different.  Rather than use a toolbar, we need to use characters.

There’s a lot of power in Markdown.

Comments closed