Press "Enter" to skip to content

Author: Kevin Feasel

Memory Grants for CHAR vs VARCHAR

Erik Darling has a head-slapping moment:

While working with a client recently, we found that someone, at some point in time, probably during the original migration from Access, had chosen CHAR columns rather than VARCHAR columns.

Okay, fine. How bad could it be?

Bad enough that… A whole bunch of columns that only had a single character in them were stored in CHAR(1000) columns.

I like CHAR…well, to be specific, NCHAR. But only when you’ll need exactly that many characters.

Comments closed

TempDB Configuration Tips

Jeff Iannucci adds on to tempdb configuration advice:

Step 1: Provision a separate drive. Put your TempDB away from your precious user data files, as well as the other system databases. This probably isn’t your job, so ask your friends on your infrastructure team to provision and mount a T: drive or whatever of a size that you gauge appropriately. What’s appropriate? Hey, I don’t know your system – that’s why they pay you the big bucks, friend.

Read on for the full set of tips.

Comments closed

Creating a CSV File from a Table via BCP

Kenneth Fisher shows how to use the bcp command to create a file from a table:

This is a pretty handy little tool in your arsenal. I’ve talked about using bcp to transfer data from one instance to another before and this is another really great use for bcp. If you haven’t used it before bcp stands for Bulk Copy Protocol and is a command line tool for transferring data in and out of SQL Server. In this case you can use this command to generate a csv file from DBName.SchemaName.TableName:

Click through to see the command, as well as some helpful hints.

Comments closed

Improving Parquet External Table Performance in Dedicated SQL Pools

Jovan Popovic shows us two ways of accessing data in Parquet files in Azure Synapse Analytics dedicated SQL pools:

Azure Synapse Analytics enables you to read Parquet files stored in the Azure Data Lake storage using the T-SQL language and high-performance Parquet readers. The key characteristic of these high-performance Parquet readers is that they are using the native (C++) code for reading Parquet files, unlike the existing Polybase Parquet reader technology that uses the Java code. These native readers are introduced in the serverless SQL pools in Azure Synapse Analytics workspaces.

In many experiments, this native technology that is used in the serverless SQL pools demonstrated better performance compared to the existing Polybase external table in the dedicated SQL pools.

This native technology for reading Parquet files is now also available in the dedicated SQL pools. In the dedicated Pools in Azure Synapse Analytics, you can create external tables that use native code to read Parquet files and improve performance of your queries that access external Parquet files.

Click through for the process, as well as what kind of performance differences you can see. Some of the queries ended up being worse for native tables versus PolyBase tables, but the majority were a good bit better.

Comments closed

Accessing ADLS Gen2 Data in Serverless SQL Pools with SAS Tokens

Neel Ball shows how you can use various techniques, including SAS tokens, to access data stored in Azure Data Lake Storage Gen2 from Azure Synapse Analytics serverless SQL pools:

You have a data lake that contains employee and social feed data. You have data residing in an employee folder that is used by HR team members and twitter for live social feeds that is usually used by marketing folks. If you use SAS token or RBAC, you cannot control to the folder level.

How do you allow users to perform data exploration using synapse serverless with fine grain control on underlying storage.

Read on for one solution.

Comments closed

Using Azure Purview to Catalog S3 Datasets

Gauri Mahajan crosses the clouds:

Data exists in various types of formats and hosted in equally varied type repositories depending on the format of data. With the advent of the cloud, generally, every public cloud provider like Amazon AWS, Microsoft Azure, and Google Cloud Platform provides a variety of data hosting avenues. A large-scale enterprise is very likely to have a multi-cloud footprint, where data is hosted on more than one cloud. As the data footprint on the cloud becomes larger, the need for a data catalog becomes increasingly important. Each cloud provider provides its format of the cloud-native data catalog. But even after employing a cloud provider-specific catalog, an enterprise may struggle to get a 360-degree view of data on the cloud. The reason being each cloud provider providers the ability to catalog data hosted on its own cloud only, which compels the end-users to reconcile or reference data from multiple catalogs, as the data catalog on the cloud may not integrate with cross-cloud data hosting services. As the client needs evolve, so do the cloud services. Azure Purview is the brand-new data catalog and governance-related service on the Azure cloud. It has introduced features to support cataloging data hosted on AWS Simple Storage Service (S3), which is typically considered the storage layer of data lake on AWS.

Read on to see how.

Comments closed

When Expressions Beat Local Variables

Erik Darling talks about a pet peeve of mine:

I see people writing stored procedures that use local variables either because they read that all functions in a where clause are dangerous, or because they need to perform a calculation on one of the parameters that get passed in.

In nearly all cases, you’re better off… not doing that.

Here are a couple examples of why.

I think a lot of this stems from advice to avoid non-SARGable predicates, but miss the idea that expressions of the sort Erik shows are perfectly cromulent.

Comments closed

Creating a dacpac for a Dedicated SQL Pool

Kevin Chant shows us how to build out a dacpac file for an Azure Synapse Analytics dedicated SQL Pool:

In reality, you can create a dacpac for a database that’s inside an Azure Synapse Analytics dedicated SQL Pool using a lot of the methods that you use to create them for SQL Server databases.

Azure Data Studio can be an appealing alternative SQL Server Data Tools (SSDT) for tasks like this. Due to various reasons. For instance, it’s a multi-platform solution that is easy to install.

With this in mind, I decided in this post to cover how to create a dacpac for an Azure Synapse Analytics dedicated SQL Pool using Azure Data Studio.

Click through to see how.

Comments closed