Press "Enter" to skip to content

Author: Kevin Feasel

Searching for Column-Level Metadata

Aaron Bertrand takes us through a few DMVs and catalog views which will help us find SQL Server metadata:

Let’s say we are interested in each column of each table: the ordinal position, the name, the data type, and whether it is nullable. In all currently supported versions of SQL Server (2012 and up), there is a handy function that allows you to get all the columns for a T-SQL query, sys.dm_exec_describe_first_result_set. This function is superior to querying the information from sys.columns, mainly because you don’t have to join to sys.types, or embed complicated logic to deal with translation; for example, CASE expressions that change -1 to max, eliminating types that have synonyms, or cutting nvarchar lengths in half – but only if they aren’t max. (You can see the type of query you end up in an earlier tip, “SQL Server Data Type Consistency.”)

This use of sys.dm_exec_describe_first_result_set() is new to me.

Comments closed

Sorting by Month in Power BI

Cecilia Brusatori shows how you can sort properly by month using Power BI:

Are your visualizations sorting the month field alphabetically? Don’t worry, I’ve been there when I started with Power BI.

If your model has a Dimension Date Table, first congrats, you’re on the right path in what respects to data model, second, you want to use the Month Name from that table, but you noticed that the visualizations don’t sort this column in the way we all wanted, from Jan to Dec.

Click through to see how.

Comments closed

Tracking Security Changes with the Default Trace

Jack Vamvas shows how we can use the default trace with SQL Server to find security-related events:

Question: There were some recent user security changes on a SQL Server login user account. Is there a way to grab this information – via the default trace? I haven’t got a dedicated Extended events monitor set up – so I’m relying on the default trace . Management want to understand the details related to this security change. 
 
Answer: It is possible to report on security changes administered on SQL Server via the default trace. In fact – it’s excellent – except if the default trace configuration is set up to rollover the files.

Read on to see how.

Comments closed

Generating Unique File Names

Slava Murygin gives us unique file names:

That is pretty common task to generate new files with a timestamp in their names.
It gives you ability to easily identify them, sort them and make them pretty unique.
However, if you have a very busy process it is possible that duplicate name will be produced and you might loose some data.

To avoid that situation I’ve came up with following solution.

It is difficult to envision this solution going wrong.

Comments closed

Architecting a Data Lake in AWS

Gaurav Mishra takes us through data lake architecture on AWS:

Landing zone: This is the area where all the raw data comes in, from all the different sources within the enterprise. The zone is strictly meant for data ingestion, and no modelling or extraction should be done at this stage.

Curation zone: Here’s where you get to play with the data. The entire extract-transform-load (ETL) process takes place at this stage, where the data is crawled to understand what it is and how it might be useful. The creation of metadata, or applying different modelling techniques to it to find potential uses, is all done here.

Production zone: This is where your data is ready to be consumed into different application, or to be accessed by different personas. 

This is a nice overview of data lake concepts and worth the read if you’re using AWS. Even if not, the same principles (if not the same technologies) apply for Azure, other clouds, and on-prem.

Comments closed

From Data Lake to Delta Lake

Anmol Sarna explains the benefits of Spark’s Delta Lake:

Traditionally data has been residing in silos across the organization and the ecosystem in which it operations (external data). That’s a challenge: you can’t combine the right data to succeed in a big data project if that data is a bit everywhere in and out of the cloud. This is where the idea – and reality – of (big) data lakes comes from.

data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data.

Although the data lakes serve as a central ingestion point for the plethora of data that organizations seek to gather and mine, it still has various limitations or challenges.

Software is made up of tradeoffs. What you gain by dumping relational structure, you lose in dumping relational structure (and this applies in the opposite direction as well).

Comments closed

Thinking Like the SQL Server Engine

Brent Ozar has started a series based on a video of the same name:

7,405 pages is about 15 reams of paper.
You know those 500-page packs of paper that you put into the copier or the printer? (No? Do you remember copiers and printers? Honestly, me neither.) The Users table is one of the smallest tables in the Stack Overflow database export, but it’s still 15 of those packs.

As we work through demos in the upcoming posts, I want you to visualize a stack of 15 reams of paper over in the corner of your room. When I ask you to query the table, I want you to think about how you’d execute that as a human being facing data spread across 15 reams of paper. It’d be a hell of a lot of work, and you wouldn’t be so eager to go grab the first piece of paper to start work. You’d wanna build a really good plan before you go tackle that stack of paper.

That’s a lot of paper.

Comments closed

Installing Kubernetes

Anthony Nocentino has an updated version of his Kubernetes installation guide:

Kubernetes is a distributed system, you will be creating a cluster which will have a master node that is in charge of all operations in your cluster. In this walkthrough we’ll create three workers which will run our applications. This cluster topology is, by no means, production ready. If you’re looking for production cluster builds check out Kubernetes documentation. Here and here. The primary components that need high availability in a Kubernetes cluster are the API Server which controls the state of the cluster and the etcd database which persists the state of the cluster. You can learn more about Kubernetes cluster components here. If you want to dive into Kubernetes more check out my Pluralsight Courses here! Where I have a dedicated course on Installation and Configuration.

In our demonstration here, the master is where the API Server, etcd, and the other control plan functions will live. The workers/nodes, will be joined to the cluster and run our application workloads. 

Read the whole thing.

Comments closed

Keeping Track of ADS Extension Updates

Dave Bland shows how you can find out-of-date extensions in Azure Data Studio:

As you add more and more extensions to Azure Data Studio,  keeping them up to date could present a challenge.  Of course we will want to install all the extensions that we think might be helpful to us.  Luckily Azure Data Studio has a nice way to let us know if there is an extension that is out of date.

Click through to see how.

Comments closed

Bulk Removing Reshare Permissions in Power BI

Gilbert Quevauvilliers takes us through bulk changing dataset permissions in Power BI:

Whilst this sometimes is what is required, I always suggest to people to remove this option and rather manage it by exception.

If you do leave this option enabled, it means if I had to share a report with Bob, Bob would then be able to reshare the report with someone else. This could potentially lead to other people gaining access to a report or data that they should not see.

Fortunately, I found a way to quickly remove the reshare option in the Power BI Service.

Click through to see how it’s done.

Comments closed