Press "Enter" to skip to content

Author: Kevin Feasel

Merging ADX Queries in Power BI

Dany Hoter joins two Azure Data Explorer tables using Power Query:

The merge operation (Table.NestedJoin) is the M language equivalent to creating relationships between tables in the model.

The resulting ADX operation is join.

You can join ADX tables by writing KQL, by using relationships or by merging queries in Power Query.

In this article I’ll show how to use merge in a way that produces efficient KQL queries without the need to write any KQL syntax.

Click through for the process.

Comments closed

Finding the Latest in the Powershell Gallery

Jeffrey Hicks does some marketing:

It is easy to search the PowerShell Gallery online and from a PowerShell prompt. But I felt there could be more. To address this “marketing” need, I’ve started PSGallery Report.

This is not a PowerShell module. I have set up a process to generate daily reports on what has recently been published in the PowerShell Gallery. I’m generating a set of reports in markdown and posting them to Github.

This is an interesting idea, though you might need to browse to yesterday’s version of the code (if you’re reading this today, 2022-04-11) because the latest run appears to have removed everything.

Comments closed

KEEP PLAN: A Not-So-Useful Query Hint

Tom Zika digs into a query hint:

Like Dwarves of Moria, I delved too greedily and too deep. I found the answer in a Plan Caching and Recompilation in SQL Server 2012 whitepaper before trying it out on my own. Relevant excerpt:

KEEP PLAN

The KEEP PLAN query hint changes the recompilation thresholds for temporary tables, and makes them identical to those for permanent tables. Therefore, if changes to temporary tables are causing many recompilations, this query hint can be used.

But since it doesn’t show the proof, I decided to test it out anyway.

And it’s a good thing, too, as it turns out this isn’t quite how things work anymore.

Comments closed

Saving Space with 6NF in SQL Server

Aaron Bertrand has a two-parter. Part one sets up the problem:

We often build logging or other insert-only tables where we store large strings like URLs, host names, or error messages. It’s usually not until the table has terabytes of data that we realize there might have been a better way. If we are logging traffic or exceptions for our own application, it’s likely that we record the same URL, host name, or error message on millions of rows. What if we only had to write that URL or host name or message text once, the first time we saw it? In this tip, I want to share one idea for abstracting away recurring values, reducing storage, and making search queries faster (especially those with wildcards) without requiring immediate changes in the application layer.

Part two maximizes the savings:

In my previous tip, I showed how we can make a growing logging table leaner by moving large, repeating strings to their own dimension tables. The solution there involved an AFTER INSERT trigger and assumed that we could change the applications to recognize the new table structure in relatively short order.

Check out both posts for more details. If you’re confused about my calling this 6NF and Aaron mentioning dimension tables, the answer is that he’s talking about the end result and I’m describing the process.

Comments closed

Topic Modeling with Python

Sanil Mhatre takes us through topic modeling:

Topic modeling is a powerful Natural Language Processing technique for finding relationships among data in text documents. It falls under the category of unsupervised learning and works by representing a text document as a collection of topics (set of keywords) that best represent the prevalent contents of that document. This article will focus on a probabilistic modeling approach called Latent Dirichlet Allocation (LDA), by walking readers through topic modeling using the team health demo dataset. Demonstrations will use Python and a Jupyter notebook running on Anaconda. Please follow instructions from the “Initial setup” section of the previous article to install Anaconda and set up a Jupyter notebook.

The second article of this series, Text Mining and Sentiment Analysis: Power BI Visualizations, introduced readers to the Word Cloud, a common technique to represent the frequency of keywords in a body of text. Word Cloud is an image composed of keywords found within a body of text, where the size of each word indicates its frequency in that body of text. This technique is limited in its ability to discover underlying topics and themes in the text, because it only relies on the frequency of keywords to determine their popularity. Topic modeling overcomes these limitations and uncovers deeper insights from text data using statistical modeling for discovering the topics (collection of words) that occur in text documents.

Read on for an informative article with plenty of code.

Comments closed

Logging in Python

Daniel Chung shows off the logging module in Python:

Note that now all five messages were output, so the default level that the root logger logs is now “DEBUG.” The log record attributes (such as %(asctime)s) that can be used to format the output can be found in the logging documentation.

Although there is a default logger, we usually want to make and use other loggers that can be configured separately. This is because we may want a different severity level or format for different loggers. 

Next on the todo list is to implement the Reader monad to hide that logging deeper in your code base so that you a) don’t need to see logging code everywhere, and b) don’t forget to include logging in some function.

Comments closed

Accessing Twitter’s V2 API via R

Bryan Shalloway dives into the mess known as Twitter:

academictwitteR is probably the most established package that provides a quickstart entry point to the V2 API. However it requires creating an academic account in twitter, i.e. the user must be affiliated with a university. I also stumbled onto RTwitterV2 and voson.tcn which both also provide quickstarts on the V2 API, but did not explore these.

Instead I followed the tutorial Getting started with R and v2 of the Twitter API by Twitter Developer Advocate Jessica Garson that uses {httr} to interact more directly with the API. I highly recommend reading her tutorial. The code below is mostly just copied from there but changed to provide an example of getting the usernames of those that liked a tweet.

Read on for a how-to and some notes.

Comments closed

Tokenizing Results in SSIS

Peter Schott shows off a handy trick:

I’ve worked with SSIS for some time now, but a recent question and post pointed out an expression I’d never used before. We often have a need to split strings and pick out some portion of that string. Sometimes we have a need to pull in everything in the “nth” occurrence of a string. The TOKEN expression can be used to get that particular value. I tested this out by mocking up a really simple package.

Read on to see how.

Comments closed

Building a Gantt Chart with Power BI Paginated Reports

Paul Turley unlocks one of my guilty pleasures:

A Gantt chart is a running list of activities with the duration for each displayed as a horizontal bar depicting the beginning and ending day along a horizontal scale. The challenge is that this is not a standard chart type in either Power BI or SSRS/Paginated Reports. Furthermore, project planners may prefer to see activities as rows in the format of a printed page, as you can see in this example:

I like Gantt charts more than is probably healthy. Paul shows a method which isn’t exactly easy but it does the trick.

Comments closed