Press "Enter" to skip to content

Month: December 2022

Finding Data Factory Objects in Synapse Studio

Kevin Chant pulls out the magnifying glass and compass:

In this post I want to cover where you can find Azure Data Factory objects in Synapse Studio. I want to do this post for a couple of reasons.

First reason is that at the start of the year I published a post on how to automate a Data Factory pipeline migration to an Azure Synapse Analytics workspace using Azure DevOps.

Even though I showed one way that you can automate the migrations of a Data Factory to a Synapse workspace I did not show where you can view them in Synapse Studio.

Second reason is because I keep telling everybody they can use the same pipelines in Azure Synapse Analytics but the objects can be found in different places.

Read on for the two places to find Data Factory objects.

Comments closed

Degenerate Dimensions and Power BI DirectQuery

Chris Webb provides a warning:

A few weeks ago my colleague Dany Hoter wrote a post on the Azure Data Explorer blog about how using columns on fact tables as dimensions in DirectQuery mode can lead to errors in Power BI. You can read it here:

https://techcommunity.microsoft.com/t5/azure-data-explorer-blog/to-star-or-not-to-star-more-about-data-modeling-in-power-bi/ba-p/3689933

In the post he mentioned that he could reproduce the same behaviour in SQL Server, so I thought it would be good to show an example of this to raise awareness of the issue because I think it’s one that anyone using DirectQuery mode on any data source is likely to run into.

Read on to understand what might lead to (expected) errors and what you can do about it.

Comments closed

Troubleshooting Memory Constraints in SQL Server

Ajay Dwivedi is running low on memory:

Memory Pressure on SQL Server is very common and is presented as Memory Grant Pending metric in the above dashboard screenshot. In my previous blog Troubleshooting SQL Server With High Memory Grants, I discussed what actions should be taken in order to get the server out of constant memory grant issues in real-time.

For long-term resolution, there are various metrics to be validated before making any decision.

Read on for several tips to help you decide the best course of action.

Comments closed

Working with R in AML v2

Tomaz Kastrun ends the advent of Azure ML on a downer:

R language and Azure Machine Learning SDK for R was deprecated a year ago (end of 2021). But R can be still used for training and deployment by using Azure Machine learning CLI 2.0!

Furthermore, R language can be used in Machine Learning Designer, for data preparation, data wrangling and statistical analysis.

You can work with R but they make sure everything is more difficult.

Comments closed

Script Activity Outputs to ForEach Inputs with ADF

Meagan Longoria links in a script:

In early 2022, Microsoft released a new activity in Azure Data Factory (ADF) called the Script activity. The Script activity allows you to execute one or more SQL statements and receive zero, one, or multiple result sets as the output. This is an advantage over the stored procedure activity that was already available in ADF, as the stored procedure activity doesn’t support using the result set returned from a query in a downstream activity.

However, when I went to find examples of how to reference those result sets, the documentation was lacking. 

Click through as Meagan corrects a gap in documentation.

Comments closed

Finding Rogue Line Feeds

Bill Fellows finds out whose line (feed) it was, anyway:

I ran into an issue today that I wanted to write about so that maybe I remember the solution. We ran into a case where the source data in a column had an unprintable character. In this case, it was a line feed character, which is ASCII value 10, and they had 7 instances in this one row. “How did that get in there? Surely that’s an edge case and we can just ignore it,” and dear reader, I’ve been around long enough to know that this is likely a systemic situation.

Click through for two ways of answering this against data already in SQL Server, as well as one additional route explained but sans demonstration.

Comments closed

Apache Spark Performance Tuning Tips

Amit Kumar shares a few tips with us:

RDD does serialisation and de-serialisation of data whenever it distributes the data across clusters such as during repartition and shuffle, and we all know that serialisation and de-serialisation are very expensive operations in spark.
On the other hand, DataFrame stores the data as binary using off-heap storage, no need for deserialization and serialization of data when it distributes to clusters. We see a big performance improvement in DataFrame over RDD

Click through for several additional tips.

Comments closed

Creating an Azure Function for Cosmos DB

Hasan Savran needs a function:

Azure Cosmos DB’s Change Feed feature triggers an event for Inserts and Updates in a collection. The easiest way to handle these events is, by executing an Azure Function. In this post, I will focus on creating an Azure Function for Azure Cosmos DB by using VsCode.

Read on for step-by-step instructions. The wizard for creating Function apps and then Azure Functions is pretty well-designed.

Comments closed