Press "Enter" to skip to content

Category: Data

Dynamic Data Masking and Granular Unmasking

Dennes Torres points out a change to dynamic data masking in Azure SQL DB:

Dynamic data mask is a very interesting security feature allowing us to mask critical fields such as e-mail, phone number, credit card and so on. We can decide what users will be able to see the value of these features or not.

This feature faced many flaws when it was released, but I believe it’s stable now, although It’s not the main security feature you should care about, it can still be very useful.

However, until very recently, this feature was not very useful. If you mask many fields in many different tables, the fields may require different permission levels in order to be unmasked.

I agree that this is definitely not a security feature. But hey, at least it’s a bit more useful than it was before.

Comments closed

T-SQL Tuesday 144 Roundup

Victoria Holt recaps T-SQL Tuesday #144:

This month’s T-SQL Tuesday attracted some great responses! Thank you to everyone who participated!

My invitation for this month’s #tsql2sday was 3 fold on sharing your experiences on data governance

– The current cost of data governance versus its benefits

– The amazing things data governance has enabled you to achieve or will enable you to achieve in the future

– The potential uses for Azure Purview within your estates and the automated deployment options for that

Read on for the recap.

Comments closed

The Importance of Data Governance

Rob Farley riffs on another T-SQL Tuesday topic:

But the checks that we do are more about things that the database can allow, but are business scenarios that should never happen.

Plenty of businesses seem to recognise these scenarios all too well, and can point them out when they come across them. You hear phrases like “Oh, we know that’s not right, it should be XYZ instead”. And they become reasons why they don’t really trust their data. It’s a data quality issue, and every time someone comes across a data quality issue, they trust the data a little less.

Click through for Rob’s thoughts.

Comments closed

Generating Artificial Data with Databricks Generator

Ust Oldfield shows off a new tool:

Databricks Labs is a relatively new offering from Databricks which showcases what their teams have been creating in the field to help their customers. As a Consultant, this makes my life a lot easier as I don’t have to re-invent the wheel and I can use it to demonstrate value in partnering with Databricks. There’s plenty of use cases that I’ll be using, and extending, with my client but the one I want to focus on in this post is the Data Generator.

Read on for an example of how this works. Something not in Ust’s post but worth mentioning is that you can control the distribution of random numeric features. That’s a piece of functionality you often don’t see in data generators.

1 Comment

Generating Mock Data for SQL Server

Chad Callihan has a few options for creating fake data:

It’s easy enough to create a handful of records for testing in SQL Server. What if you want 100 rows or 1000 rows? What if you want data that looks more legitimate compared to gibberish? In this post, we’ll look at different ways to generate mock data.

One of the trickiest things about creating mock data is getting the distributions right. For example, ABS(CHECKSUM(NEWID()) is great (just as RAND(CHECKSUM(NEWID())), but the results follow a uniform distribution because of the nature of checksums and random number generators. This makes charting numeric values look unnatural. Here’s an example I put together of generating data off of a normal distribution. It does take more effort, but if you’re generating this fake data to show it to users in tools like Power BI or Tableau, having data follow reasonable distributions is a good thing. That is, use whatever distribution makes sense for the particular data element: uniform, normal, Pareto (power law), gamma, etc.

Comments closed

A Show about Nothing

Joe Celko has a moment of zen:

Human beings are not very good at handling nothing. The printing press didn’t just lead to civilization as we know it, but it also changed our mindset about text. When we wrote text manually on paper, a blank or space was not seen as a character. It was just the background upon which characters were written.

It was centuries before the zero was accepted as a number. After all, it represents the absence of a quantity or magnitude or position; how could it possibly be a number? Before it was accepted as a number, it was considered a symbol or mark in a positional notation to indicate that there was nothing in that position.

It’s an interesting riff, so check it out.

Comments closed

Creating a CSV File from a Table via BCP

Kenneth Fisher shows how to use the bcp command to create a file from a table:

This is a pretty handy little tool in your arsenal. I’ve talked about using bcp to transfer data from one instance to another before and this is another really great use for bcp. If you haven’t used it before bcp stands for Bulk Copy Protocol and is a command line tool for transferring data in and out of SQL Server. In this case you can use this command to generate a csv file from DBName.SchemaName.TableName:

Click through to see the command, as well as some helpful hints.

Comments closed

From Excel to SQL Server via Powershell

Kevin Wilkie combines Excel and dbatools like a mad scientist:

Those ways work great if you’re inserting one tab of an Excel spreadsheet. I don’t know about y’all, but I don’t want to have to do all of this work for each tab – especially if I have 30+ tabs to import into SQL Server (and yes, I’ve been asked to do that quite a few times.)

One of the easiest ways I’ve seen to insert a lot of data (once you’ve made sure that the data is how you want it) is to use the power of PowerShell.

Read on to see how.

Comments closed

Data Hubs, Warehouses, and Lakes

Trevor Legg compares and contrasts data hubs, data warehouses, and data lakes:

Data hubs, data warehouses, and data lakes are significant investment areas for data and analytics leaders and are vital to support increasingly complex, distributed, and varied data workloads.

Gartner finds that 57% of data and analytics leaders are investing in data warehouses, 46% are using data hubs, and 39% are using data lakes. However, they also found that these same data and analytics leaders don’t necessarily understand the difference between the three…

To best support specific business requirements, it’s vital to understand the difference and purpose of each type of structure, and the role it can play in modern data management infrastructure.

Click through for the definitions and comparisons.

Comments closed

Power Query Data Profiling

Ed Hansberry takes us through the data profiler in Power Query:

A solid green bar is usually best. It means there are no issues in that column, as shown in the Discounts field. On the far right in the COGS field, there is a tiny bit of gray. That means there is no data, or a null, in one or more records. The bigger the gray bar, the higher percentage of nulls there are. In the middle though we have a problem. Anything in red is an error and must be fixed. These may cause more errors in further transformations downstream, and will usually cause refresh errors.

Before we address this, let’s get a bit more information about our data. Go to the View ribbon, and there are 3 settings we want to look at.

I really like what the data profiler provides us. If you’re a regular Power BI user, I highly recommend checking it out if you haven’t already.

Comments closed