Press "Enter" to skip to content

Day: December 14, 2022

Isolated Spark Testing with lakeFS

Adi Polak demonstrates lakeFS:

This tutorial demonstrates how to build a development and testing environment for validating your logic on a full-blown production data volume and variety, working with lakeFS and Spark. You will walk through the journey of creating a repository and building a Spark application while using lakeFS capabilities. You will learn how to data changes, revert them in cases of mistakes or other hiccups, and lately merge separate branches to reflect data changes from the isolated environments.

Not too long ago, I had a couple conversations with developers and data engineers about decentralized development and devs having their own environments and data. This seems like it would be a good approach to that common problem, and it works for Azure Synapse Analytics as well.

Comments closed

AutoML and Model Registration in AML

Tomaz Kastrun continues an advent of Azure Machine Learning. Day 13 covers the topic of Automated ML:

Automated ML is a no-code automated machine learning task. It iterates over many combinations of algorithms and hyperparameters in order to find the best model for your dataset and your prediction variable(s). The final solution is a model, that can be downloaded and later reused. So Automated ML is not just giving you the best model out of a family of algorithms, but lets you use the model, generate the scripts and create the artefacts.

Day 14 concerns model registration:

Important asset is the “Models” in navigation bar. This feature allows you to work with different model types -> custom, MLflow, and Triton. What you do here is, you register a model from different locations (e.g.: local file, AzureML Datastore, AzureML Job, MLflow Job, Model asset in AzureML workspace, and Model asset in AzureML Registry).

Once you open the Models asset, you will see, that you can do many things here. I have already model register from the running the notebook on day4.

Comments closed

Column Exclusion and Rename in Snowflake

Kevin Wilkie plays duck-duck-goose with columns:

With Snowflake, we could do many different things that we’re not used to seeing with a SELECT statement. We’re all used to seeing this – SELECT * and it shows all kinds of columns.

With Snowflake, we can tell Snowflake NOT to show certain columns by using the EXCLUDE operator.

Read on to see how it works and specific requirements around operation. In addition, Kevin shows a way to perform aliasing.

Comments closed

Identifying Rows in sp_wait_for_database_copy)_sync

Jose Manuel Jurado Diaz troubleshoots a problem:

As you could see in this public documentation Auto-failover groups overview & best practices – Azure SQL Database | Microsoft Learn about sp_wait_for_database_copy_sync “sp_wait_for_database_copy_sync prevents data loss after geo-failover for specific transactions, but does not guarantee full synchronization for read access. The delay caused by a sp_wait_for_database_copy_sync procedure call can be significant and depends on the size of the not yet transmitted transaction log on the primary at the time of the call.”

Our customer asked about several scenarios to understand this behaviour and also, verify if there is possible to identify the rows that have not been synced. For this, I developed a POC to test it:

Read on to see what you’d need to do.

Comments closed

FHIR and Azure Health Services

Steve Hughes provides an overview of FHIR and what Azure has to offer:

With the recent updated mandates in the healthcare environment in the United States, Microsoft has continued to expand its capability to support the FHIR standard for integrating healthcare data. While the standard is well documented and Microsoft’s capabilities are expansive, it falls on data professionals to interpret that data and build meaningful reports and produce meaningful insights from the data as it is collected and integrated across environments. This requires a good working knowledge of JSON in SQL to manipulate complex data models. In the session, we did a short review of the FHIR standard and the overall implementation of FHIR in Azure. From there we reviewed the resulting data in the data lake and in Synapse. That was followed up with an overview into the heart of complex SQL using JSON functions in Synapse. Whether or not you are active in healthcare today, this will be an enlightening session on how to use JSON SQL functions within the Azure SQL platforms.

Read on to learn more.

Comments closed

Finding “Stolen” CPU on SQL Server

Ajay Dwiveldi has a public service announcement for us:

The above panel is showing any SQLServer that needs DBA attention. But there is this server highlighted in the above screenshot showing a significant gap b/w OS CPU and SQL CPU. Any gap of over 15-20% b/w OS & SQL CPU consumption is something that should be investigated. This was an indicator that there is some other non sqlservr process that is consuming high CPU on the host.

Friends don’t let friends run anything other than SQL Server on a SQL Server machine.

Comments closed