Press "Enter" to skip to content

Day: July 28, 2023

Cumulative Means in R

Steven Sanderson performs a moving average:

The cumulative mean, also known as the running mean or moving average, provides us with a dynamic view of how the average value of a dataset changes as new observations are added incrementally. It is an invaluable tool in time-series analysis, trend identification, and smoothing noisy data.

Imagine you have a series of numeric values, and you want to find the average of the first observation, then the average of the first two observations, followed by the average of the first three, and so on. This iterative process generates the cumulative mean, painting a picture of how the data behaves over time.

Often times, we care about the moving average over a specific window, such as the last n periods. This particular post covers the moving average over the entire set of data.

Comments closed

Full-Text Search in Cosmos DB via Cognitive Services

Hasan Savran performs a search:

Incorporating Full-Text Search functionality into your application can enable users to locate what they are searching for effortlessly. Searching for specific words or phrases within a database has always been a difficulty, particularly for relational databases. Throughout my career, I’ve had countless discussions/arguments with DBAs about the importance of implementing full-text search in a relational database. We are in totally different times, now users want to search by voice, image, or video.

     Full-Text Search functionality is not part of Azure Cosmos DB’s Database Engine. Firstly, we must establish the Azure Cognitive Search service and link the data from Azure Cosmos DB to the Search Service. The process of setting up Azure Cognitive Search is relatively straightforward. Like other Azure services, you will need to answer similar types of questions beforehand. (Subscription, Resource Group, a name for the service, region, and tier)

By the way, Azure Cognitive Search is very similar to Elasticsearch, for those of you familiar with that technology.

Comments closed

A Primer on Postgres Database Security

Murtaza Umair provides guidance:

Keeping your database up to date with the latest PostgreSQL release is vital in maintaining the security of your database. Once every year, PostgreSQL comes out with a new release, which includes new features, security enhancements, and performance improvements. Each major release is supported for five years, during which PostgreSQL releases quarterly minor updates to fix bugs and patch security issues. The schedule for new updates and more information is given on PostgreSQL’s website, at https://www.postgresql.org/developer/roadmap/

Nothing in this is earth-shattering but it is a solid overview.

Comments closed

Azure SQL MI and the WAF: Performance Pillar

Niko Neugebauer looks at one of the pillars of the Well-Architected Framework with respect to Azure SQL Managed Instance:

baseline is a known value against which later measurements and performance can be compared. Baseline helps us define what is a normal database performance and thus comparing against the baseline provides us with insights into any abnormalities. Ideally, one should take performance measurements at regular intervals over time, even when no problems occur, to establish a server performance baseline. Compare each new set of measurements with those taken earlier.

Click through for additional guidance and recommendations.

Comments closed

Near-Real Time Reports with Power BI and KQL

Dany Hoter can’t wait:

Real time and near real time are subjective terms.

For some businesses, real time is up to 1 ms latency and for other cases 10 minutes latency is considered close to real time.

Lately I encountered a few cases in which Page refresh in PBI was used on a dataset using Direct Query against Kusto/ADX/RTA.

In this article I’ll cover a few best practices to ensure that such an implementation will be successful and conserve on resources.

I’m a bit of a stickler for the term “real-time” so I appreciate Dany’s preface here. The way I learned real-time versus online (versus batch) is, online is when you expect a result quickly but real-time is something you’d put in a fighter plane. And it turns out that, when you explain what the bill will look like, very few companies have the need for true real-time results.

Comments closed