Press "Enter" to skip to content

Curated SQL Posts

Pipelines and Jobs in Azure ML

Tomaz Kastrun continues an advent on Azure ML. Day 11 covers pipelines:

A pipeline is set of instructions (or a workflow) for executing particular work of a machine learning task. The idea behind pipelines is that will help the team of data scientists and machine learning engineers standardize workflow and incorporate best practices of preparing data, producing training models, executing the models and deploying them. Pipelines will help improve and build workflow efficiently and in such a way that it can be reusable.

And the idea behind it, is to split a machine learning process into smaller tasks, a multistep workflow, where each step is a separate component than can be developed, upgraded, optimised, configured, automated, and deleted separately. And these steps, connected through interfaces, form a workflow.

Day 12 makes us get a job:

An Azure ML job executes a task against a specified compute target. This is also how the job is created. By configuring a new job, you can also scale out model training, since there are single node and distributed training available.

A simple job command would be to execute a command in a Docker container. And further parameter sweeping can be executed, by specifying it in the job itself. 

Comments closed

DCountIf in KQL

Robert Cain continues a series on KQL:

In the previous post of this series, Fun With KQL – DCount, we saw how to use the dcount function to get an estimated count of rows for an incoming dataset.

It’s common though to want to filter out certain rows from the count. While you could do the filtering before getting to the dcount, there’s an alternative function that allows you to do the filtering right within it: dcountif.

Read on to learn more about how this function works, as well as several useful examples.

Comments closed

Sharing Excel Reports Connected to Power BI Datasets

Chris Webb makes us take a step back and think:

love Excel as an alternative to Power BI reports, especially now that Excel Online allows you to interact with browser-based Excel reports that use Power BI datasets or the Excel Data Model (aka Power Pivot) as a source. After all, why try to recreate a PivotTable in a Power BI report when you can give your users an actual PivotTable? You can publish Excel reports to Power BI and view them there; however if you want to share these reports using OneDrive for Business, outside Power BI, you’ll need to think carefully about how you do so before you allow other people to view the report.

Click through for more information.

Comments closed

Disabling Inline Database Log Flushes

Bob Dorr talks about a trace flag:

SQL Server optimizes database log file flush requests, performing these flush requests inline on the active worker. Certain patterns of log record activity may encounter increased spinlock contention while performing the log cache block flush activities.

Trace flag -T8904 (startup only trace flag) disables inline log flush, limiting the contention possibility from many workers to the subset of background LogWriter workers.  When the trace flag is enabled, the worker adding log records mark the log cache block to be flushed and a background LogWriter worker performs the flush activity.

Read on to learn more, including a call-out to Curated SQL favorite Lonny Niederstadt.

Comments closed

Changes to the Power BI Datamarts UI

Dennes Torres notices the difference between the two pictures:

I wrote an article about Power BI Datamarts which was released together the feature, on the exact moment it was being announced in a conference. Of course, this feature has being evolved since that time.

I would emphasize the performance. The general UI performance seems way better than when it was first released. Let’s talk about some more news on the Power BI Datamarts, specially on the UI.

Read on for an overview of those changes.

Comments closed

Encryption Features in SQL Server

Matthew McGiffen gives us a high-level overview of what’s available in terms of encryption in SQL Server:

SQL Server has had the native ability to encrypt data since SQL Server 2005. This included functionality that could be used to encrypt individual items and columns of data as well as the Transparent Data Encryption (TDE) feature which was available with the enterprise edition of SQL Server and could be used to encrypt all data where it is stored on disk. We then didn’t see significant additions to the features available for encryption until 2016 when Microsoft added Always Encrypted for column encryption. In SQL Server 2019 Microsoft made TDE available in standard edition, and also in SQL Server 2019 the ability to use enclaves was added to Always Encrypted to improve the available functionality for interacting with encrypted data. Finally in SQL Server 2022 further enhancements were made to the set of functionality available when working with Always Encrypted with enclaves.

Click through for a table and some details.

Comments closed

C# Text Classification via ML.NET 2

Matt Eland tries out ML.NET 2.0:

Recently ML.NET 2.0 was released, giving us a bevy of new features for the open source machine learning library for dotnet applications.

The release improved ML.NET’s text processing capabilities and improved some aspects of the already fantastic automated machine learning capabilities it had. Moreover, the release seemed to reaffirm ML.NET’s determination to be relevant for advanced machine learning tasks, including deep learning and transformer-based architectures.

In this article we’ll explore ML.NET 2.0’s new text classification capabilities and see how you can use C# to analyze sentiment, match utterances to intents, or otherwise classify textual data without having to write a lot of custom code.

Read on to learn more about ML.NET and plenty of turtles.

Comments closed

Running Python Code from R via Reticulate

Rick Pack crosses the streams:

I wanted a REPL (read-evaluate-print-loop) so that I could quickly experiment with Python without, for the moment, leaping over what some consider one of the biggest hurdles to Python usage: Work environment set up.

The reticulate R package by Posit enables the use of Python while working within the R Studio IDE. One can find a Posit tutorial here.

Read on for Rick’s notes.

Comments closed

Working with the AML Python SDK

Tomaz Kastrun continues a series on Azure Machine Learning. Day 9 takes us through a piece of the Python SDK:

Python SDK namespace is azureml.core.environment. Environments specify the set of Python packages, environment variables, and software settings around your training and scoring scripts. In addition to Python, you can also configure PySpark, Docker and R for environments.

You can use namespace  Environment (or created object/asset) to make deployment and code reusable for training purposes at given docker images, configurations and compute type.

Day 10 shows us how to work with the Python SDK via VS Code or a local Jupyter notebook:

Let’s continue to explore the power of SDK and the namespaces. And we will look into namespace that will help you connect to Azure ML resources with Python SDK.

Comments closed

Managing Azure Log Analytics Data Access

Bruno Gabrielli wants to limit data access to Log Analytics:

I am back with another important topic arising from my customers’ visits. How can I give very specific access to Log Analytics data, whether they be Security or Monitoring data?

Tricky one, isn’t it? A very simplistic answer could be: “manage your access list through IAM on the workspace”, but this is not enough. Say, for instance, that you would give scoped access to data coming from specific resources or, even more complicated, you would like that given the same resource one team can see some info and another one all the rest.

Looks complicated, but hey … good news: this is doable

Read on to learn how.

Comments closed