Press "Enter" to skip to content

Category: Warehousing

Data Masking in Snowflake

Kevin Wilkie puts on the mask:

The premise behind data masking is fairly simple – make it so that only certain people can see the data in a certain column. For example, social security numbers or credit card numbers probably shouldn’t be seen by everyone who has access to your database.

I like to set it up as follows: (Obviously, this can change depending on your business’ policies.)

  1. Admins have full access.
  2. Call Center Representatives have access to the last 4 digits of the social.
  3. All others see nothing useful.

Read on to see how. I’ll even save my normal “data masking isn’t really a security feature” rant. This time.

Comments closed

Reading Data from Azure Blob Storage in Snowflake

Arun Sirpal explains a common architectural pattern:

Let’s go back to data platforms today and I want to talk about a very common integration I see nowadays, Azure Blob Storage linked to Snowflake via a storage integration which then we can access semi structured files via external tables, it is a good combination of technology I have to say.

Click through for an architecture diagram and example of the code you’d need.

Comments closed

Memoizing Functions with Snowflake

Kevin Wilkie is speaking my language (that is, the language of functional programming):

If you’ve been working with data for several years like I have – mostly using the SQL language – then I have a term for you that other languages, like JavaScript or Python, have had for a few years. The term is “memoizable” and it means, in a nutshell, to remember. A memoizable function caches the results so that it can return the resultset in record time, given the same parameters.

Yeah, it’s a fancy term that basically states, “Instead of calculating the result each time, I’ll just create a lookup table of all possible inputs and what the output is.” It’s really helpful when you have a small number of possible inputs and generating a result takes a while.

Read on to learn more about how this works in Snowflake, including several limitations.

Comments closed

Implementing a Star Schema in a Microsoft Fabric Lakehouse

Nikola Ilic builds a lakehouse:

But, what is a star schema in the first place? I have good and bad news for you:)…The bad news is that I’m not covering it in this article because this one focuses on explaining how to implement a star schema in Fabric Lakehouse (assuming that you already know what star schema is). The good news is: I’ve already written about it, so go and read this article first, if you’re not sure what star schema represents in the world of data modeling…

In one of the previous articles, I also shown how to implement a star schema in Power BI, by leveraging Power Query Editor.

Now, let’s get our hands dirty and build a star schema by using PySpark in the Fabric notebook!

Click through to see how.

Comments closed

Role-Playing Dimensions in Direct Lake

Chris Webb puts on a mustache and changes his shirt really quickly:

Note that the Sales fact table has two date columns, OrderDate and ShipDate.

If you create a DirectLake semantic model using the Web Editor and add these two tables you could rename the Date table to Order Date and build a relationship between it and the OrderDate column on the Sales table:

What about analysing by Ship Date though? You could create a physical copy of the Date table in your Lakehouse and add that to the model, but there’s another option.

Read on for that answer. Interesting that, as of right now, the primary way to do this is with third-party software.

Comments closed

Tracking Python Packages in Snowflake

Kevin Wilkie takes a peek:

When working with one of the many modern computer languages that use libraries, one of the many things to be aware of – as a developer – is the version of the libraries available for your usage.

Since there are multiple languages in Snowflake that use libraries, let’s go over how to check out the versions that come installed and how to install one yourself.

Read on for those answers. Well, one answer and one conundrum.

Comments closed

Discerning a Star Schema from an Existing Report

Kelly Broekstra describes a common flow for business intelligence projects:

I have worked as a business intelligence developer for several years, and I’m always asked: “How do you convert user requirements to a functioning data model?”

I follow the Kimball methodology. For more information, check out the official pages.

But, here are some specific tips on what works for me.

Click through for those tips.

Comments closed

Data Type Changes in Snowflake

Kevin Wilkie makes some changes:

When working with data, I usually have an idea of what type of data I will push into a field. Sometimes, for whatever reason, it is decided to change the type of data allowed in the field. Today, I want to show how that’s done in Snowflake.

Click through to learn how, and how it’s not quite the same as SQL Server.

Comments closed