Press "Enter" to skip to content

Author: Kevin Feasel

Restarting Azure Data Factory Triggers

Andy Leonard provides an after-action report:

During delivery of the class, I popped over to a much older data factory and fired up a couple integration runtimes (IRs). You see, on this older data factory, I trigger a couple pipelines that check to see if I’ve left an IR running. If so, each pipeline will shut down its respective IR. The trigger fires each evening. I blogged about the pipeline design almost two years ago in a post titled  Stop an Azure-SSIS Files Integration Runtime (Safely).

Read on for the full report, some takeaways on how to limit the risk, and possible next steps if you find yourself in a situation like Andy did.

Comments closed

Connecting Kafka Cross-Network

Praful Khandelwal sets up a hybrid Kafka cluster:

In this article, we will be talking about a simple set-up involving local machine (macOS) and Azure VM. We’ll discuss the step-by-step procedure to produce events from local machine to Kafka broker hosted on Azure VM and also to consume those events back in local machine. While this does not cover the exact scenario described above, it gives a fair idea about how the Kafka messages can be exchanged across the network.

Kafka is pretty chatty, so I’d hope to have really good network connectivity, such as a Direct Connect (for AWS) or Express Route (Azure) in place.

Comments closed

The Basics of Azure Storage Explorer

Manvendra Singh takes us through Azure Storage Explorer:

This article will explain Azure storage explorer, its installations, and details of how to start working with this application to access Azure storage services. Azure storage provides a flexible solution to store various types of data at a massive scale in the cloud environment. If you have many storage accounts in Azure storage, then it will be difficult to manage them. Microsoft has recognized this problem and developed a desktop application Azure storage explorer to manage Azure storage accounts easily. It can be installed on Windows, Linux, and macOS operating systems.

This is a rather useful tool.

Comments closed

Merging ADX Queries in Power BI

Dany Hoter joins two Azure Data Explorer tables using Power Query:

The merge operation (Table.NestedJoin) is the M language equivalent to creating relationships between tables in the model.

The resulting ADX operation is join.

You can join ADX tables by writing KQL, by using relationships or by merging queries in Power Query.

In this article I’ll show how to use merge in a way that produces efficient KQL queries without the need to write any KQL syntax.

Click through for the process.

Comments closed

Finding the Latest in the Powershell Gallery

Jeffrey Hicks does some marketing:

It is easy to search the PowerShell Gallery online and from a PowerShell prompt. But I felt there could be more. To address this “marketing” need, I’ve started PSGallery Report.

This is not a PowerShell module. I have set up a process to generate daily reports on what has recently been published in the PowerShell Gallery. I’m generating a set of reports in markdown and posting them to Github.

This is an interesting idea, though you might need to browse to yesterday’s version of the code (if you’re reading this today, 2022-04-11) because the latest run appears to have removed everything.

Comments closed

KEEP PLAN: A Not-So-Useful Query Hint

Tom Zika digs into a query hint:

Like Dwarves of Moria, I delved too greedily and too deep. I found the answer in a Plan Caching and Recompilation in SQL Server 2012 whitepaper before trying it out on my own. Relevant excerpt:

KEEP PLAN

The KEEP PLAN query hint changes the recompilation thresholds for temporary tables, and makes them identical to those for permanent tables. Therefore, if changes to temporary tables are causing many recompilations, this query hint can be used.

But since it doesn’t show the proof, I decided to test it out anyway.

And it’s a good thing, too, as it turns out this isn’t quite how things work anymore.

Comments closed

Saving Space with 6NF in SQL Server

Aaron Bertrand has a two-parter. Part one sets up the problem:

We often build logging or other insert-only tables where we store large strings like URLs, host names, or error messages. It’s usually not until the table has terabytes of data that we realize there might have been a better way. If we are logging traffic or exceptions for our own application, it’s likely that we record the same URL, host name, or error message on millions of rows. What if we only had to write that URL or host name or message text once, the first time we saw it? In this tip, I want to share one idea for abstracting away recurring values, reducing storage, and making search queries faster (especially those with wildcards) without requiring immediate changes in the application layer.

Part two maximizes the savings:

In my previous tip, I showed how we can make a growing logging table leaner by moving large, repeating strings to their own dimension tables. The solution there involved an AFTER INSERT trigger and assumed that we could change the applications to recognize the new table structure in relatively short order.

Check out both posts for more details. If you’re confused about my calling this 6NF and Aaron mentioning dimension tables, the answer is that he’s talking about the end result and I’m describing the process.

Comments closed

Logging in Python

Daniel Chung shows off the logging module in Python:

Note that now all five messages were output, so the default level that the root logger logs is now “DEBUG.” The log record attributes (such as %(asctime)s) that can be used to format the output can be found in the logging documentation.

Although there is a default logger, we usually want to make and use other loggers that can be configured separately. This is because we may want a different severity level or format for different loggers. 

Next on the todo list is to implement the Reader monad to hide that logging deeper in your code base so that you a) don’t need to see logging code everywhere, and b) don’t forget to include logging in some function.

Comments closed

Topic Modeling with Python

Sanil Mhatre takes us through topic modeling:

Topic modeling is a powerful Natural Language Processing technique for finding relationships among data in text documents. It falls under the category of unsupervised learning and works by representing a text document as a collection of topics (set of keywords) that best represent the prevalent contents of that document. This article will focus on a probabilistic modeling approach called Latent Dirichlet Allocation (LDA), by walking readers through topic modeling using the team health demo dataset. Demonstrations will use Python and a Jupyter notebook running on Anaconda. Please follow instructions from the “Initial setup” section of the previous article to install Anaconda and set up a Jupyter notebook.

The second article of this series, Text Mining and Sentiment Analysis: Power BI Visualizations, introduced readers to the Word Cloud, a common technique to represent the frequency of keywords in a body of text. Word Cloud is an image composed of keywords found within a body of text, where the size of each word indicates its frequency in that body of text. This technique is limited in its ability to discover underlying topics and themes in the text, because it only relies on the frequency of keywords to determine their popularity. Topic modeling overcomes these limitations and uncovers deeper insights from text data using statistical modeling for discovering the topics (collection of words) that occur in text documents.

Read on for an informative article with plenty of code.

Comments closed