Kevin Feasel – Page 306

Comparing pgvector and Postgres ARRAY

Published 2024-04-02 by Kevin Feasel

Ernst-Georg Schmid makes a comp based on a mass spectrometry database:

As said in the introduction, mass spectrometry is one, if not the tool to identify unknown compounds, to quantify known compounds, and to determine the structure of molecules. But it is a lot of work, and you need reference spectra to compare against.

So, there are curated databases of validated spectra available, like MassBank Japan, MassBank Europe and the NIST mass spectral libraries. Laboratories might also want to store their own libraries for future use.

However, such databases often come in their own formats and with their own retrieval software. If you need to efficiently connect spectra to other data, e.g. chemical structures or genomic data, this calls for central management and a common API.

Read on to see the comparison of the pgvector extension versus built-in functionality with ARRAY.

Comments closed

Finding Duplicate Statistics in SQL Server

Published 2024-04-02 by Kevin Feasel

Jose Manuel Jurado Diaz searches for clones:

Some time ago, we encountered a support case where a customer experienced significant delays in updating auto-created and user-created statistics. I would like to share the insights gained from this experience, including the underlying causes of the issue and the potential solutions we identified and implemented to address the problem effectively.

Read on for a demo to set up the scenario and the cause of the problem, as well as how to fix it.

Comments closed

Maintaining Dynamic IP Rules for Azure Network Security Groups

Published 2024-04-02 by Kevin Feasel

Daniel Hutmacher shares a couple scripts:

Recently, my home ISP has started changing my public IP address. This causes me some headache because I have a couple of Azure Network Security Group rules (think of them as firewall rules) that specifically allow my home IP access to all of my Azure resources. When my home IP changes, those rules have to be updated accordingly.

So I made a PowerShell-based solution to automatically maintain them.

Read on for the process.

Comments closed

Quantile Normalization in R

Published 2024-04-01 by Kevin Feasel

Steven Sanderson has achieved normality:

Before we dive into the code, let’s understand the concept behind quantile normalization. At its core, quantile normalization aims to equalize the distributions of multiple datasets by aligning their quantiles. This ensures that each dataset has the same distribution of values, making meaningful comparisons possible.

This is a bit different from normalizing individual data points in one dataset, as you can see in the post.

Comments closed

Power BI Paginated Reports over Excel and Web Services

Published 2024-04-01 by Kevin Feasel

Chris Webb shares an announcement:

By far the most exciting announcement for me this week was the new release of Power BI Report Builder that has Power Query built in, allowing you to connect to far more data sources in paginated reports than you ever could before. There’s a very detailed blog post and video showing you how this new functionality works here:

Given that this seems to be the spiritual replacement for SSRS, it’s good to see them still working on it.

Comments closed

SSMS 20 Minus Azure Data Studio

Published 2024-04-01 by Kevin Feasel

Erik Darling makes an observation:

I was quite publicly mystified by the coupling of Azure Data Studio into the SQL Server Management Studio installer. They’re different tools for different people.

This isn’t a tirade against Azure Data Studio, nor is it a victory lap since apparently the feedback item I posted was part of the decision to remove it.

This is purely informational, since the announcement that it’s not in there anymore is nearly as quiet as the announcement that it was being included back in SQL Server Management Studio 18.7, and only slightly louder than the availability of a command line switch to skip installing it.

I say this as someone whose primary SQL tool is Azure Data Studio: good. There’s a lot of functionality overlap between the two but it’s better making both of them optional.

Also, I’m subscribed to that same RSS feed Erik mentioned and fully agree with Erik’s assessment of Erin. Even if she is a Michigan fan.

Comments closed

Using Azure SQL Database Elastic Pools

Published 2024-04-01 by Kevin Feasel

Josephine Bush shares some tips on using elastic pools in Azure SQL DB:

Main points about elastic pools

Elastic pools enable you to purchase resources for a pool shared by multiple databases to accommodate unpredictable usage periods by individual databases.

You can create multiple pools on a server, but you can’t add databases from different servers into the same pool.

Pools are well-suited for many databases with specific utilization patterns. These patterns are characterized by low average utilization with infrequent utilization spikes for a given database. Conversely, multiple databases with persistent medium-high utilization shouldn’t be placed in the same elastic pool.

All the databases in the elastic pool share the DTU and storage. This is very apparent when multiple persistent medium-high utilization databases are in the same pool, which is why Microsoft advises against it.

Read on to see information about storage, DTUs, noisy neighbors, and more.

Comments closed

Designing for Direct Lake Mode

Published 2024-04-01 by Kevin Feasel

Paul Turley shares some advice:

Since the introduction of Power Pivot for Excel, SQL Server Analysis Services Tabular, Azure Analysis Services and Power BI; the native mode for storing data in a semantic data model (previously called a “dataset” in Power BI) has been a proprietary file structure consisting of binary and XML files. These file structures were established in the early days of multidimensional SSAS back in 2000 and 2005. When an Import mode model is published to the Power BI service, deployed to an SSAS server or when Power BI Desktop is running, data for the model is loaded into memory where it remains as long as the service is running. When users interact with a report or when DAX queries are run against the model, results are retrieved very quickly from the data residing in memory. There are some exceptions for very large models or when many models in the service don’t all fit into memory at the same time, the service will page some or all of the model in and out of memory to make sure that the most-often used model pages remain in memory for the next user request. But, for argument’s sake, the entire semantic model sits in memory, waiting for the next report or user request.

Rather than the proprietary SSAS file structure, Direct Lake models use the native Delta-parquet files that store structured data tables for a Fabric lakehouse or warehouse in One Lake. And rather than making a copy of the data in memory, the semantic model is a metadata structure that shares the same Delta-parquet file storage. As soon as a report runs against a model, all of the model data is paged into memory which then behaves a lot like an Import mode model. This means than while the model remains in memory, performance should about the same as Import, with a few exceptions.

Read on to see what the capabilities of Direct Lake mode are today, as well as a few design considerations for your Microsoft Fabric architecture.

Comments closed

A Bayesian Approach to CATPCHAs

Published 2024-03-29 by Kevin Feasel

John Cook claims to be human:

I set up a GitHub account for a new employee this morning and spent a ridiculous amount of time proving that I’m human.

The captcha was to listen to three audio clips at a time and say which one contains bird sounds. This is a really clever test, because humans can tell the difference between real bird sounds and synthesized bird-like sounds. And we’re generally good at recognizing bird sounds even against a background of competing sounds. But some of these were ambiguous, and I had real birds chirping outside my window while I was doing the captcha.

You have to do 20 of these tests, and apparently you have to get all 20 right. I didn’t. So I tried again. On the last test I accidentally clicked the start-over button rather than the submit button. I wasn’t willing to listen to another 20 triples of audio clips, so I switched over to the visual captcha tests.

Read on to see how a Bayesian approach to the problem could make things a bit less annoying.

Comments closed

Using the map() Function in purrr

Published 2024-03-29 by Kevin Feasel

Steven Sanderson reads the map():

In the world of data manipulation and analysis with R, efficiency and simplicity are paramount. One function that epitomizes these qualities is map(). Whether you’re a novice or a seasoned R programmer, mastering map() can significantly streamline your workflow and enhance your code readability. In this guide, we’ll delve into the syntax, usage, and numerous examples to help you harness the full power of map().

Click through for examples of how this works in R.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Author: Kevin Feasel