Kevin Feasel – Page 249

The Internals of DATETIME2

Published 2024-08-09 by Kevin Feasel

I noticed in sys.column_store_segments the min_data_id and max_data_id columns store very large bigint values in the segments for datetime2 columns. After doing a bit more googling and tinkering, I found for bit/tinyint/smallint/int/bigint it stores the min/max of the actual values rather than dictionary lookup values. So I assume it’s likely doing the same for date/time/datetime/datetime2 and storing some sort of bigint representation of the actual value.

This post is going to focus on datetime2(7) datatypes mainly because that’s what I was dealing with. Though I’m sure it wouldn’t be much work to figure out the other types.

Click through to learn more about the datatype and see how this wraps into a discussion of temporal table cleanup and columnstore indexes.

Comments closed

Tips for Hyperparameter Tuning

Published 2024-08-07 by Kevin Feasel

Bala Priya C shares some tips and techniques:

If you’re familiar with machine learning, you know that the training process allows the model to learn the optimal values for the parameters—or model coefficients—that characterize it. But machine learning models also have a set of hyperparameters whose values you should specify when training the model. So how do you find the optimal values for these hyperparameters?

You can use hyperparameter tuning to find the best values for the hyperparameters. By systematically adjusting hyperparameters, you can optimize your models to achieve the best possible results.

This tutorial provides practical tips for effective hyperparameter tuning—starting from building a baseline model to using advanced techniques like Bayesian optimization. Whether you’re new to hyperparameter tuning or looking to refine your approach, these tips will help you build better machine learning models. Let’s get started.

Read on for those techniques. Incidentally, one of my “Old man yells at clouds” takes is that I dislike the existence of hyperparameters and consider them a modeling failure, essentially telling the implementer to do part of the researcher’s work. Knowing that they are necessary to work with for so many algorithms, there’s nothing to do but learn how to work with them effectively, but there’s a feel of outsourcing the hard work to users that I don’t like about the process. For that reason, I have extra respect for algorithms that neither need nor offer hyperparameters.

Comments closed

Chat with Your Own Data in Streamlit and Azure Open AI

Published 2024-08-07 by Kevin Feasel

I have a new video:

In this video, I show how we can make a GPT-4 deployment aware of our own custom data, without needing to fine-tune the model. I talk about meta prompts and the Retrieval Augmented Generation (RAG) pattern, and then show how you can set this up using Azure AI Search and Azure OpenAI. Then, I bring it back to Streamlit and give users the option between chatting with a generic GPT-4 deployment and chatting over custom data.

I try to make my videos 10 minutes in length. They usually end up at 15-18 minutes. This one clocks in at more than 30 minutes and there’s very little fluff.

Comments closed

Filesystem Access for Database Restoration via dbatools

Published 2024-08-07 by Kevin Feasel

Andy Levy shares a lesson learned:

While performing an instance migration this spring, I happened upon something I didn’t expect in [dbatools](https://dbatools.io/). It should have been a simple backup/restore copy of the databases, with the backup files residing on a fileshare on the destination server after being copied there. I kept getting a warning that the backup files I was attempting to restore couldn’t be read, and the restores (via Restore-DbaDatabase) wouldn’t execute.

I checked permissions on the server over and over again. Both on the filesystem and for the share that I was attempting to read from. Even more curious, if I executed the restore database statements directly from within Management Studio, the databases restored without issue.

After doing quite a bit of digging, I managed to find the reason.

Read on to learn more about necessary permissions, as well as the issue Andy hit, as well as the solution.

Comments closed

Managed Private Endpoints and Trusted Workspace Access for All

Published 2024-08-07 by Kevin Feasel

Wolfgang Strasser is very pleased with a recent announcement:

In times of data breaches and millions of customer entries breached, the security of your data platform is one of the things you need to consider upfront and – preferably in all your data solutions.

When Microsoft Fabric was announced the concepts of connecting to other parts of your already secured data platform in Azure was not possible. The options to (securely) connect Fabric to other parts of your Azure platform were not available initially.

Read on to learn more about Managed Private Endpoints and Trusted Workspace Access, the initial problem with them both, and how Microsoft has definitely improved things recently.

Comments closed

Microsoft Fabric Lakehouse Access Control

Published 2024-08-07 by Kevin Feasel

Koen Verbeeck lets us into the lakehouse:

We’re doing a proof-of-concept with Microsoft Fabric, building our data model in a lakehouse. We’d like to give people access to the data inside it so they can build their own reports with whatever tool they want. Is there an easy way to share access to a lakehouse (preferably not by giving access to the entire workspace)?

Read on to learn how.

Comments closed

Reading a Lakehouse Table from another Microsoft Fabric Workspace

Published 2024-08-07 by Kevin Feasel

Gilbert Quevauvilliers spans the gap:

I was doing some work recently for a customer and they had data stored in different Lakehouse’s which was in a different App Workspace.

I was pleasantly surprised that this can be quite easy to do.

In my example below I am going to show you how in my notebook I can read a table in a Lakehouse table when it is not attached to any Lakehouse.

It’s good that this is so easy to do, considering that current advice leans toward having multiple workspaces and not cramming everything into one.

Comments closed

Systematic Sampling in R

Published 2024-08-06 by Kevin Feasel

Steven Sanderson continues a series on sampling:

In this post, we will explore systematic sampling in R using base R functions. Systematic sampling is a technique where you select every (k^{th}) element from a list or dataset. This method is straightforward and useful when you want a representative sample without the complexity of more advanced sampling techniques.

Let’s dive into an example to understand how it works.

In very technical circles, this is also known as the “eenie-meenie-meiney-moe technique” and is very similar to the “duck-duck-goose” algorithm, though that has an additional stochastic input.

Comments closed

Fun(?) with CPU Affinity

Published 2024-08-06 by Kevin Feasel

Rod Edwards plays with fire:

Here we have the options, to set the processor affinity mask for all process, and also the I/O affinity mask. The default is to let SQL handle this automatically as above, as it will provide better performance for most workloads.

On a basic level, these settings restrict which CPUs can be used when restricting SQL server processing and/or SQL IO operations.

There are very specific use cases in which setting CPU affinity makes sense. Rod does a really good job of showing just why mucking with CPU affinity is not for the faint of heart.

Comments closed

Instrumenting Postgres

Published 2024-08-06 by Kevin Feasel

Grant Fritchey goes looking:

I’m still learning PostgreSQL and one of the things I’ve been looking at a lot lately is instrumentation. What do I mean? Well, if you’re in SQL Server, think, Dynamic Management Views (DMV), Extended Events, Query Store, <hack, spit> Trace <spit>. How do we know how long a query took to run? PostgreSQL can tell you, but, and this is one of those wild, cool, but, honestly, slightly frustrating things about PostgreSQL, not natively.

Read on for what Grant’s seen.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Author: Kevin Feasel