2024-10-24 – Curated SQL

Comparing Snowflake vs SQL Server E-Mail Configuration

Published 2024-10-24 by Kevin Feasel

Today, I want to talk about all the effort that goes into setting up the ability to email in SQL Server and Snowflake.

First is our old friend – SQL Server. I’ll leave this one to the experts at Microsoft. As has been the case over the last few years, they have some great documentation at Learn.Microsoft.com – especially when it comes to SQL Server.

I don’t know anything about sending e-mails via Snowflake (other than what Kevin mentions here), though I imagine a lot of the difference in complexity is that SQL Server allows arbitrary SMTP selection and requires an existing SMTP server.

Comments closed

RandomWalker 0.2.0 Release

Published 2024-10-24 by Kevin Feasel

Steven Sanderson makes an announcement:

In the ever-evolving landscape of R programming, packages continually refine their capabilities to meet the growing demands of data analysts and researchers. Today, we’re excited to announce the release of RandomWalker version 0.2.0, a minor update that brings significant enhancements to time series analysis and random walk simulations.

RandomWalker has been a go-to package for R users in finance, economics, and other fields dealing with time-dependent data. This latest release introduces new functions and improvements that promise to streamline workflows and provide deeper insights into time series data.

Read on to see what has changed.

Comments closed

Variable Types in Postman

Published 2024-10-24 by Kevin Feasel

Huyen Maithi talks variables:

Variables enable you to store and reuse values. Postman is a powerful API development tool that offers a feature known as environment variables. These variables help you work efficiently, collaborate with teammates in testing and development by allowing users to easily manage dynamic values across requests.

Click through for an overview of the types of variables you can create for Postman requests.

Comments closed

Dynamically Start a Collection of Child Pipelines in Fabric Data Factory

Published 2024-10-24 by Kevin Feasel

Andy Leonard continues a series on Microsoft Fabric Data Factory:

In this post, I modify the dynamic parent pipeline from the previous post to explore calling several child pipelines that may be called by a parent pipeline. In this post, we will:

Clone the child pipeline (twice)

Copy the cloned child pipeline id values

Clone the dynamic parent pipeline from the previous post

Add and configure a pipeline variable for an array of child pipeline ids

Add and configure a ForEach

Move the “Invoke Pipeline (Preview)” activity

Configure the “ForEach”

Configure the “Invoke Pipeline (Preview)” Activity to Use “ForEach” Items

Test the execution of a dynamic collection of child pipelines

Andy’s got quite a bit in this post, so check it out.

Comments closed

Prod Data in Dev

Published 2024-10-24 by Kevin Feasel

Brent Ozar looks at survey results:

No matter which way you slice it, about half are letting developers work with data straight outta production. We’re not masking personally identifiable data before the developers get access to it.

It was the same story about 5 years ago when I asked the same question, and back then, about 2/3 of the time, developers were using production data as-is:

Brent covers some of the challenges involved, and I can add one more: the idea of environments gets really squishy when talking about data science. My development model still needs production data (unless the dev data has the same structural attributes and data distributions as prod), and I don’t really want to train different models in dev/test/prod because, even with the same default data, many algorithms are stochastic in nature: if I run it multiple times, I can end up with different results. And even if I can get the same results by re-running and using a consistent seed, that also introduces a structural instability because I’m relying on a specific seed.

In short, I agree with Brent: this is a tough nut to crack.

Comments closed

The Power of Pre-Attentive Attributes

Published 2024-10-24 by Kevin Feasel

Elena Drakulevska is seeing pink elephants:

In a world packed with data, how do you make sure your key points don’t get lost in the noise?

Enter the Pink Elephant Principle—a concept that makes sure your most important elements stand out, like a big pink elephant in the middle of a room. It’s impossible to ignore, and that’s exactly what you want for the critical parts of your report!

The irony of this is that the historical term of seeing pink elephants is a person so drunk that he’s hallucinating. Humor of the term aside, Elena drives home a very important principle around ensuring you take advantage of pre-attentive attributes to ensure users see what’s important with the least cognitive effort.

Comments closed

A Primer on Medallion Architecture in Microsoft Fabric

Published 2024-10-24 by Kevin Feasel

Kenneth Omorodion builds a warehouse:

Data warehouses are essential components of modern analytics systems, offering optimized storage and processing capabilities for large volumes of data. When integrated with a Lakehouse architecture, you can combine the best of both worlds—structured, schema-enforced data storage with the flexibility and scalability of data lakes. Microsoft Fabric provides an excellent environment for implementing the Medallion Architecture, a design pattern for building efficient data processing pipelines by layering data into bronze, silver, and gold zones.

Click through for the process.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Day: October 24, 2024

Comparing Snowflake vs SQL Server E-Mail Configuration

RandomWalker 0.2.0 Release

Variable Types in Postman

Dynamically Start a Collection of Child Pipelines in Fabric Data Factory

Prod Data in Dev

The Power of Pre-Attentive Attributes

A Primer on Medallion Architecture in Microsoft Fabric