Press "Enter" to skip to content

Curated SQL Posts

Getting Started with SQL Server

Lee Markum helps out people new to SQL Server:

I see a lot of questions on data related Reddit forums and data science groups on LinkedIn about how to get started with SQL. Certainly these people mean that they want to learn the SQL language. I don’t think you can do that long term without setting up a home lab. You need your own place to experiment beyond what you can do in an online tutorial where you’re typing into a web browser, for example.

By the time you finish this post, you will be able to install SQL Server for use as a home lab for learning T-SQL and administration of the server environment.

Click through for a step-by-step guide. There’s a lot to learn after walking through this guide but you’ve got to start somewhere.

Comments closed

Tips for using Synapse Database Templates

James Serra provides some guidance:

I had previously blogged about Azure Synapse Analytics database templates, and wanted to follow-up with some notes and tips on that feature as I have been involved on a project that is using it:

– Purview does not yet pull in the metadata for database templates (table/field descriptions and table relationships). Right now it pulls in the metadata as if it was a SQL table or as if it was a file in ADLS. Both just have the basic information supported by those types. The SQL one is probably preferred

– Power BI does not import the table and field descriptions when connecting to a lake database (where the database templates are stored), but it does import the table relationships. You can see the table descriptions by hovering over the table names in the navigator when importing tables using the “Azure Synapse Analytics workspace (Beta)” connector. Note you are not able to see the table descriptions when hovering over the table names using the “Azure Synapse Analytics SQL” connector. Also note the “Select Related Tables” button does not work in the navigator

Click through for more notes from the field.

Comments closed

T-SQL Tuesday 154 Recap

Glenn Berry summarizes what people are doing with SQL Server 2022:

Back on September 5, 2022, I posted the invitation for T-SQL Tuesday #154 Invitation – SQL Server 2022, which was due on Tuesday, September 12, 2022. I ended up getting eleven blog post responses that I am aware of (including mine). Thank you to everyone who participated! This post will be my T-SQL Tuesday #154 Recap.

Here are the blog posts for #T-SQL Tuesday #154, in alphabetical order by author.

Glenn goes the extra mile by including author photos as well.

Comments closed

Eager Index Spooling

Chad Callihan has a great analogy for eager spools:

Think of a time when you went to someone for help on a problem. Rather than walk you through how to solve the problem, that someone just groans, speeds through solving it for you, and sends you on your way without any explanation.

Did you learn anything? No.

Will you need help again next time that same problem comes up? You bet.

I also like Erik Darling’s explanation that an eager spool is SQL Server’s passive-aggressive way of telling you that you need an index but no, you’re just too busy to create one so I’ll just keep working here all day and do you even call your poor mother anymore?

Comments closed

Azure Data Explorer September 2022 Updates

Shaf Mahmood has a few updates for us:

The  optimized autoscale feature has been in place and using reactive logic. It has been helping ADX users by adjusting the cluster size when there is an increase in resources due to ingestion or query load.  The optimized autoscale feature has been further improved with predictive logic. This logic monitors the same metrics as the reactive logic and over time builds up the cluster usage pattern and uses this to forecast and plan the size of the cluster.  The reactive logic is still used to ensure any forecast anomalies or usage pattern changes are still autoscaled appropriately.

There are a few cost-related updates but also updates to visuals, data ingestion, and more.

Comments closed

Setting a Loading Message for a DataTable in RMarkdown

Thomas Williams wants you to wait patiently:

Waits are inevitable, whether getting data from an API or database, or manipulating data in an interactive R Markdown document. Showing a “loading” or “updating” message is a beneficial incremental improvement to users’ experience.

The code at https://github.com/thomasswilliams/r-markdown-snippets/blob/main/loading-message-datatable.Rmd, when run from RStudio, demonstrates using CSS and pseudo-elements to display text in a DataTables (DT package).

Read on to see an example of how to use this.

Comments closed

Loading the Synapse Data Explorer Pool

Gauri Mahajan loads some event data for analysis:

In my previous article, Getting started with Data Explorer pools in Azure Synapse, we learned how to create Data Explorer pools in Azure Synapse and the unique value that Data Explorer brings to semi-structured and free-text data. The creation of the Data Explorer pool is the first step in the process. After the pool is created, one can create data structures, ingest data and then use it for consumption. Data Explorer pool interface provides different ways of ingesting data into the pool including one-click ingestion.

Some of these techniques are really straightforward. Others—especially if you’re talking about large amounts of data—do require installing and working with local tooling.

Comments closed

Basic Tier Performance in Azure SQL DB

Reitse Eskens keeps things basic:

When you look at the documentation, the basic tier has low CPU (at max less than one), 1 to 4 IOPS per DTU (translating to 5-20 IOPS in total), a latency of 5ms read and 10 ms write and a maximum of 7 days backup retention. Even though it’s advertised as a production database, I wouldn’t store essential data there. However, I can use it for metadata storage in a Data Factory or Synapse Analytics environment.

About the IOPS, one IOP is a read of a disk cluster, usually 4 Kb. In this case, this means that the database is capable of reading 20 to 80 Kb of data per second.

When they say Basic, they aren’t kidding.

Comments closed

Importing Delta Tables into a Synapse Dedicated SQL Pool

Mark Pryce-Maher does a bit of integration:

In June, Databricks announced that they are open sourcing Delta Lake 2.0Delta Lake is quickly becoming the format of choice in data science and data engineering.

To import Delta Lake into a Synapse dedicated SQL Pool you would need Azure Data Factory/Synapse Pipelines or Spark to handle the Delta Lake files.


This is not ideal because it adds extra overheads of complexity, time, and costs.

As an intellectual challenge, I wondered if it’s possible to import Delta Lake files directly into the dedicated SQL Pool and support features like time-travel. It turned out to be a great little project and a great way of learning about Delta Lake.

This turned out to be a bit more difficult than I would have imagined. Click through for the script and check the comments as well for a preview of upcoming attractions.

Comments closed

sp_helpExpandView

Andy Yun cleans up some nested views:

Yeah, that thing was small. But the processing was a horrific case study in T-SQL worst practices. And the architect that created it LOVED nested views (and scalar functions… and MERGE… on Standard edition).

I spent a good amount of effort trying to unravel those as part of my efforts to improve performance, and as a result, decided to create my own community tool to help with unraveling them – sp_helpExpandView.

I also occasionally deal with a group loving nested views and now I’m going to need to use Andy’s stored procedure because tracking that on my own gets really painful.

Comments closed