Press "Enter" to skip to content

Month: May 2021

Goals in Power BI

Treb Gatte starts a new series:

Microsoft released a new feature called Power BI Goals to public preview. In this blog series, you’ll learn about using Power BI Goals and Scoreboards to monitor your business, learn ways to structure your Scoreboards, understand how to implement this functionality and be clear on requisite licensing and security questions. I will refer to relevant books and blogs which will benefit you as you implement Power BI Goals.

The image attached to this blog post is a real Scoreboard developed for a manufacturing CEO. It spans the organization and provides a glanceable view of the health of the organization. Many of the due dates are set to the end of the fiscal year as they are annual in nature.

Read on for the first post in the series, which introduces the concept of SMART goals and goals in Power BI.

Comments closed

Securing Amazon Managed Streaming for Kafka

Stephane Maarek has some security advice for us:

AWS launched IAM Access Control for Amazon MSK, which is a security option offered at no additional cost that simplifies cluster authentication and Apache Kafka API authorization using AWS Identity and Access Management (IAM) roles or user policies to control access. This eliminates the need for administrators to run an unfamiliar system to control access to Apache Kafka on Amazon MSK, and learn intricate details and specific commands to manage Apache Kafka access control lists (ACLs).

This is a game-changer from a security perspective for AWS customers who use Apache Kafka: I recommend Amazon MSK customers use IAM Access Control unless they have a specific need for using mutual TLS or SASL/SCRAM authN/Z.

Read on to see how it works.

Comments closed

Finding Jobs Run over a Timeframe

Kenneth Fisher has a query for us:

Short post today. Simple query that will tell you every job that ran between two datetime stamps. Note: this of course assumes that the data exists. Job history is only kept for so long so depending on your settings the data may already have been deleted. Also if a job is currently running it’s first step it’s not going to show up in the history table so obviously it won’t show up in the results for this query.

Click through for the query. It uses CROSS APPLY twice, so I like it twice as much.

Comments closed

Types of Fragmentation on Index Pages in SQL Server

Deepthi Goguri explains what sorts of fragmentation can occur on an index in SQL Server:

Logical Fragmentation occurs when the logical order of the leaf level pages (logical order meaning the next key values in order) no longer the continuous page to the next physical data file page. Because of these pages which are out of order will affect the read ahead mechanism and the scan performance. Because of this logical fragmentation, read ahead have to do smaller read ahead reads.

If the logical fragmentation pages are already in the memory than the read ahead mechanism will not be affected in that case. Logical fragmentation will cause the problem for bigger indexes and not for the smaller ones usually (smaller indexes having pages 1000-5000 pages). You can monitor the amount of the index logical fragmentation by using the DMV sys.dm_db_index_physical_stats.

Read on to learn more about logical fragmentation, as well peers extent fragmentation and low page density.

Comments closed

Creating XML from SQL Server

Barney Lawrence shows off how to build XML from data in SQL Server:

In my experience creating XML is a less common task than reading it with one semi-exception (that being the old trick of using FOR XML PATH(”) to concatenate strings) but it can be an important one. The biggest piece of XML I’ve built was for a government mandated return, it was the only XML I built on that job but a sizeable proportion of the entire department’s work culminated in it. The switch from a set of Excel macros which took over eight hours to run to something that could rebuilt in minutes had a huge impact on our efficiency, particularly when there would be multiple rounds of validation failures against initial attempts at submission.

There are a few variants when it comes to converting queries into XML but in my opinion the only one you really need to know is the XML PATH approach, the others being either fiddly or not providing enough control.

Click through to learn more about the process. FOR XML PATH(): it’s not just for concatenating strings…

Comments closed

Range Filtering in Elasticsearch

The Hadoop in Real World team shows off some Elasticsearch skills:

Filtering based on a range like greater than, less than, greater than equal etc. are pretty common requirements when you work with data. In this post we will see how to perform range based filtering with Elasticsearch.

Knowing the specific syntax makes it easy to follow along. And it does help that Powershell has similar comparison flags with -gt, -gte, and the like.

Comments closed

Functional Data Analysis in R

Joseph Ricker gives us a gentle introduction to a not-so-gentle topic:

This plot might depict 80 measurements for a participant in a clinical trial where each data point represents the change in the level of some protein level. Or it could represent any series of longitudinal data where the measurements are take at irregular intervals. The curve looks like a time series with obvious correlations among the points, but there are not enough measurements to model the data with the usual time series methods. In a scenario like this, you might find Functional Data Analysis (FDA) to be a viable alternative to the usual multi-level, mixed model approach.

This post is meant to be a “gentle” introduction to doing FDA with R for someone who is totally new to the subject. I’ll show some “first steps” code, but most of the post will be about providing background and motivation for looking into FDA. I will also point out some of the available resources that a newcommer to FDA should find helpful.

Read on to learn more.

Comments closed

Selecting Columns Based on Data Types in Power Query

Sandeep Pawar solves an interesting problem:

It’s not uncommon to have a table with hundreds of columns with different column types such as numerical, text, date, percentage etc. You can select columns by using the UI option “Choose Columns” and then selecting which columns to keep. But if you have have tens of columns of the same type, you will have to manually go through the list of columns and select the columns to keep. As far as I know, there is no direct way to only keep, for example, numerical columns. In Pandas, you can use select_dtypes method and pass which columns to choose based on the data types. No such option or function is available in Power Query.

In this example, I will show how I achieved it. You can customize it or turn it into a function based on your use case.

Read on for two classes of technique which do the trick.

Comments closed