Press "Enter" to skip to content

Author: Kevin Feasel

Cumulative Means in R

Steven Sanderson performs a moving average:

The cumulative mean, also known as the running mean or moving average, provides us with a dynamic view of how the average value of a dataset changes as new observations are added incrementally. It is an invaluable tool in time-series analysis, trend identification, and smoothing noisy data.

Imagine you have a series of numeric values, and you want to find the average of the first observation, then the average of the first two observations, followed by the average of the first three, and so on. This iterative process generates the cumulative mean, painting a picture of how the data behaves over time.

Often times, we care about the moving average over a specific window, such as the last n periods. This particular post covers the moving average over the entire set of data.

Comments closed

Full-Text Search in Cosmos DB via Cognitive Services

Hasan Savran performs a search:

Incorporating Full-Text Search functionality into your application can enable users to locate what they are searching for effortlessly. Searching for specific words or phrases within a database has always been a difficulty, particularly for relational databases. Throughout my career, I’ve had countless discussions/arguments with DBAs about the importance of implementing full-text search in a relational database. We are in totally different times, now users want to search by voice, image, or video.

     Full-Text Search functionality is not part of Azure Cosmos DB’s Database Engine. Firstly, we must establish the Azure Cognitive Search service and link the data from Azure Cosmos DB to the Search Service. The process of setting up Azure Cognitive Search is relatively straightforward. Like other Azure services, you will need to answer similar types of questions beforehand. (Subscription, Resource Group, a name for the service, region, and tier)

By the way, Azure Cognitive Search is very similar to Elasticsearch, for those of you familiar with that technology.

Comments closed

A Primer on Postgres Database Security

Murtaza Umair provides guidance:

Keeping your database up to date with the latest PostgreSQL release is vital in maintaining the security of your database. Once every year, PostgreSQL comes out with a new release, which includes new features, security enhancements, and performance improvements. Each major release is supported for five years, during which PostgreSQL releases quarterly minor updates to fix bugs and patch security issues. The schedule for new updates and more information is given on PostgreSQL’s website, at https://www.postgresql.org/developer/roadmap/

Nothing in this is earth-shattering but it is a solid overview.

Comments closed

Azure SQL MI and the WAF: Performance Pillar

Niko Neugebauer looks at one of the pillars of the Well-Architected Framework with respect to Azure SQL Managed Instance:

baseline is a known value against which later measurements and performance can be compared. Baseline helps us define what is a normal database performance and thus comparing against the baseline provides us with insights into any abnormalities. Ideally, one should take performance measurements at regular intervals over time, even when no problems occur, to establish a server performance baseline. Compare each new set of measurements with those taken earlier.

Click through for additional guidance and recommendations.

Comments closed

Near-Real Time Reports with Power BI and KQL

Dany Hoter can’t wait:

Real time and near real time are subjective terms.

For some businesses, real time is up to 1 ms latency and for other cases 10 minutes latency is considered close to real time.

Lately I encountered a few cases in which Page refresh in PBI was used on a dataset using Direct Query against Kusto/ADX/RTA.

In this article I’ll cover a few best practices to ensure that such an implementation will be successful and conserve on resources.

I’m a bit of a stickler for the term “real-time” so I appreciate Dany’s preface here. The way I learned real-time versus online (versus batch) is, online is when you expect a result quickly but real-time is something you’d put in a fighter plane. And it turns out that, when you explain what the bill will look like, very few companies have the need for true real-time results.

Comments closed

Role Analysis: DBA, Data Engineer, DB Reliability Engineer

Kendra Little disambiguates roles:

Much has changed: development patterns transformed from Waterfall to Agile, DevOps drives automation and shared ownership of code, and cloud services have made many more kinds of PAAS databases, data lakes, and data lakehouses available to organizations of all sizes.

These changes have introduced new and varied career paths for data folks which have different emphases on skill sets. In this post, I talk through the commonalities and differences between DBAs, Database Reliability Engineers (DBREs), and Data Engineers (DEs). Whether you’re a hiring manager or data professional, it’s worth knowing about these roles.

And a fourth one I’d include is Database Engineer, which I’ve also called a development DBA (versus a production DBA): focus on one platform, like SQL Server, and specialize in code development and tuning on that platform. I’d expect a Data Engineer to be familiar with at least one or two non-SQL programming languages—Python, Scala, Java, C#, F#, pick your poison—but that wouldn’t necessarily hold for a DBE.

Comments closed

Testing Multi-User PBI Row-Level Security in Excel

Gilbert Quevauvilliers performs a test:

I have been doing a fair amount of work on Row Level Security (RLS) where the requirements were quite complex to implement.

Once I had completed the implementation of RLS there were two outstanding items I needed to test.

  1. Make sure that the users are only viewing the data they should see.
  2. Ensure that the performance of a selected user is still as fast as possible.

I would like to mention that this certainly can be done using DAX studio to get the query performance stats, I wanted to test this using Excel.

Gilbert had to do a fair amount here, but there’s a nice walkthrough indicating how to do it all.

Comments closed

Variety in Power BI Bar Chart Designs

Kurt Buhler builds a series of charts from the Power BI bar chart visual:

These are only a few examples; there are many more. Knowing how to create these different bar chart variants is important to address these use-cases. While many of these visuals can be created custom in Deneb or from SVGs, you can also create them in the core visuals. However, it’s important to know the limitations of the core visuals in Power BI, and when this “Macguyvering” of the core visual formatting settings goes too far.

Kurt gives us a template (in pbib format) and explains various options we have around customizing visuals, not just taking what’s built in.

Comments closed

Backup Buffers in SQL Server

Andy Yun continues a series on backup internals:

Welcome to Part 2 of SQL Server Backup Internals. This blog series is a companion piece to my How to Accelerate Your Database Backups for MSSQLTips.com. In Part 1 of this blog series, I introduced the parts of a BACKUP operation.

Now let’s start focusing on performance characteristics. We can impact the performance of a BACKUP operation by making changes to or more of the following:

Click through for that list, as well as an extended analogy on buffer count and max transfer size.

Comments closed