Warehousing – Page 5

Comparing Snowflake vs SQL Server E-Mail Configuration

Published 2024-10-24 by Kevin Feasel

Today, I want to talk about all the effort that goes into setting up the ability to email in SQL Server and Snowflake.

First is our old friend – SQL Server. I’ll leave this one to the experts at Microsoft. As has been the case over the last few years, they have some great documentation at Learn.Microsoft.com – especially when it comes to SQL Server.

I don’t know anything about sending e-mails via Snowflake (other than what Kevin mentions here), though I imagine a lot of the difference in complexity is that SQL Server allows arbitrary SMTP selection and requires an existing SMTP server.

Comments closed

A Primer on Medallion Architecture in Microsoft Fabric

Published 2024-10-24 by Kevin Feasel

Kenneth Omorodion builds a warehouse:

Data warehouses are essential components of modern analytics systems, offering optimized storage and processing capabilities for large volumes of data. When integrated with a Lakehouse architecture, you can combine the best of both worlds—structured, schema-enforced data storage with the flexibility and scalability of data lakes. Microsoft Fabric provides an excellent environment for implementing the Medallion Architecture, a design pattern for building efficient data processing pipelines by layering data into bronze, silver, and gold zones.

Click through for the process.

Comments closed

Data Masking in Snowflake: Tagging Policies

Published 2024-10-17 by Kevin Feasel

Kevin Wilkie creates some tags:

In our last post, we discussed data masking and setting it up for specific columns. Now, I don’t know about y’all, but I deal with hundreds of tables with several columns in each on a daily basis that truly need data masking. If I have to go through and set up a masking policy one by one, I might go crazier than I normally am.

Let’s say that I have an existing table, Employee, in my database that I want to apply a policy. As I’m sure many of you know, any table with Employee data should have at least some of its data masked. The first thing that we need to do is to set up the tag that we’ll be using for, in our case, a column named Salary.

Read on to learn more about how to create tags and why you might want to.

Comments closed

Implementing Role-Playing Dimensions in Power BI

Published 2024-10-15 by Kevin Feasel

Teo Lachev puts on a mask:

Role-playing dimensions are a popular business requirement but yet challenging to implement in Power BI (and Tabular) due to a long-standing limitation that two tables can’t be joined multiple times with active relationships. Declarative relationships are both a blessing and a curse and, in this case, we are confronted with their limitations. Had Power BI allowed multiple relationships, the user must be prompted which path to take. Interestingly, a long time ago Microsoft considered a user interface for the prompting but dropped the idea for unknown reasons.

Given the existing technology limitations, you have two implementation choices for implementing subsequent role-playing dimensions: duplicating the dimension table (either in DW or semantic model) or denormalizing the dimension fields into the fact table. The following table presents pros and cons of each option:

Click through for that table, as well as some thoughts on viable approaches, including an edge case.

Comments closed

Data Masking in Snowflake

Published 2024-10-09 by Kevin Feasel

Kevin Wilkie puts on the mask:

The premise behind data masking is fairly simple – make it so that only certain people can see the data in a certain column. For example, social security numbers or credit card numbers probably shouldn’t be seen by everyone who has access to your database.

I like to set it up as follows: (Obviously, this can change depending on your business’ policies.)

Admins have full access.

Call Center Representatives have access to the last 4 digits of the social.

All others see nothing useful.

Read on to see how. I’ll even save my normal “data masking isn’t really a security feature” rant. This time.

Comments closed

Reading Data from Azure Blob Storage in Snowflake

Published 2024-10-07 by Kevin Feasel

Arun Sirpal explains a common architectural pattern:

Let’s go back to data platforms today and I want to talk about a very common integration I see nowadays, Azure Blob Storage linked to Snowflake via a storage integration which then we can access semi structured files via external tables, it is a good combination of technology I have to say.

Click through for an architecture diagram and example of the code you’d need.

Comments closed

Memoizing Functions with Snowflake

Published 2024-10-02 by Kevin Feasel

Kevin Wilkie is speaking my language (that is, the language of functional programming):

If you’ve been working with data for several years like I have – mostly using the SQL language – then I have a term for you that other languages, like JavaScript or Python, have had for a few years. The term is “memoizable” and it means, in a nutshell, to remember. A memoizable function caches the results so that it can return the resultset in record time, given the same parameters.

Yeah, it’s a fancy term that basically states, “Instead of calculating the result each time, I’ll just create a lookup table of all possible inputs and what the output is.” It’s really helpful when you have a small number of possible inputs and generating a result takes a while.

Read on to learn more about how this works in Snowflake, including several limitations.

Comments closed

Implementing a Star Schema in a Microsoft Fabric Lakehouse

Published 2024-09-30 by Kevin Feasel

Nikola Ilic builds a lakehouse:

But, what is a star schema in the first place? I have good and bad news for you:)…The bad news is that I’m not covering it in this article because this one focuses on explaining how to implement a star schema in Fabric L akehouse (assuming that you already know what star schema is). The good news is: I’ve already written about it, so go and read this article first, if you’re not sure what star schema represents in the world of data modeling…

In one of the previous articles, I also shown how to implement a star schema in Power BI, by leveraging Power Query Editor.

Now, let’s get our hands dirty and build a star schema by using PySpark in the Fabric notebook!

Click through to see how.

Comments closed

Role-Playing Dimensions in Direct Lake

Published 2024-09-30 by Kevin Feasel

Chris Webb puts on a mustache and changes his shirt really quickly:

Note that the Sales fact table has two date columns, OrderDate and ShipDate.

If you create a DirectLake semantic model using the Web Editor and add these two tables you could rename the Date table to Order Date and build a relationship between it and the OrderDate column on the Sales table:

What about analysing by Ship Date though? You could create a physical copy of the Date table in your Lakehouse and add that to the model, but there’s another option.

Read on for that answer. Interesting that, as of right now, the primary way to do this is with third-party software.

Comments closed

Tracking Python Packages in Snowflake

Published 2024-09-25 by Kevin Feasel

Kevin Wilkie takes a peek:

When working with one of the many modern computer languages that use libraries, one of the many things to be aware of – as a developer – is the version of the libraries available for your usage.

Since there are multiple languages in Snowflake that use libraries, let’s go over how to check out the versions that come installed and how to install one yourself.

Read on for those answers. Well, one answer and one conundrum.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Category: Warehousing