Press "Enter" to skip to content

Month: September 2021

Partitioning vs Bucketing in Hive

The Hadoop in Real World team explains the difference between partitioning and bucketing in Apache Hive tables:

Now let’s say you also filter the sales record by sku (stock-keeping unit aka. barcode)  in addition to sale_date and country. Creating a partition on sku will result in many partitions which is not ideal as it might result in uneven and smaller partitions.

Hadoop is not efficient in processing small volumes of data. There is a better way.

Read on to understand when each technique makes sense.

Comments closed

Databricks Notebook Discovery via Notebooks

Darin McBeth creates a meta-noterbook to keep track of notebooks:

Elsevier has been a customer of Databricks for about six years. There are now hundreds of users and tens of thousands of notebooks across their workspace. To some extent, Elsevier’s Databricks users have been a victim of their own success, as there are now too many notebooks to search through to find some earlier work.

The Databricks workspace does provide a keyword search, but we often find the need to define advanced search criteria, such as creator, last updated, programming language, notebook commands and results.

Interestingly, we managed to achieve this functionality using a 100% notebook-based solution with Databricks functionalities. As you will see, this makes it easy to set up in a customer’s Databricks environment.

Read on to see how.

Comments closed

Using Query Labels in Azure Synapse Analytics

Gauri Mahajan shows one of the pieces of functionality in Azure Synapse Analytics dedicated SQL pools that I’d like to see on-premises:

Azure Synapse supports a concept known as “query labels” that allows tagging any DDL or DML queries that are executed on the dedicated SQL pool. These labels can be queried using the dynamic management views (DMVs). One can use these labels to describe the purpose of the query or add any metadata to the query being executed and the same can be used later for instrumenting the queries, specifically to identify the queries that meet the desired search criteria. Let’s walk through a step-by-step exercise to understand this concept practically.

Click through for the process.

Comments closed

Setting File Permissions inside a SQL Server Container

Anthony Nocentino wants to set some permissions:

This post will walk you through setting file permissions on database files copied into a container. The SQL Server process sqlservr running in containers runs as the non-privileged user mssql. The appropriate permissions on files are needed, so the SQL Server process has the proper access to any database files, log files, and backup files.

Click through for the process.

Comments closed

SQL Server Environment Variable Files in Containers

Andrew Pruski shows how you can pre-set environment variables when building a SQL Server on Linux container:

Do we really want to be typing all that out every time we run a container? Ok, we could drop this into a script and execute that but another option is to use environment variable files.

Nice and simple, we take all the environment variables in the statement above and drop them into a file on our Docker host: –

Read on and gauge for yourself how nice and simple it is.

Comments closed

The Limitations of ORMs

Erik Darling gives us some hints on when it might be time to stop using that ORM:

There are, unfortunately, some times when developers refuse to put the ORM down.

I mean, it’s mostly unfortunate for them, because they’ll continue to have performance problems.

Me? I’ll be okay.

The items in this post are issues I’ve run into constantly when working with people who use ORMs, but don’t spend any time looking at the queries they generate.

An important note is that “stored procedure or ORM” is a false choice—most modern ORMs will allow you to generate objects based off of stored procedures, so you can stick with the ORM for the parts it does well but switch to a stored procedure when things get real. Or just use stored procedures across the board and have your ORM act as an auto-mapper for them. That’s an option too.

Comments closed

Toggling Word Wrap in SSMS

Ronen Ariely shows how to enable word wrap in SQL Server Management Studio:

Line breaking, also known as word wrapping, is breaking the displayed of a section of text into lines so that the text will fit into the available width of the editor. When writing queries this feature is not so useful as breaking the script line may make the query less readable, but when writing long comments this feature become one of the most useful feature. 

This post simply shows you how to use word-wrap by default or add a command button to Toggle Word Wrap – it’s a built-in feature which is less known and if you did not used it yet, then it is time to use the power of word wrap

Because T-SQL is not line or whitespace sensitive, my preference is to break lines well before they hit the point where word wrap makes sense. But if you’re working with some lengthy lines of code or on a low-resolution laptop, this can help a lot.

Comments closed

Synapse vs Snowflake

Travis Manning has a throw-down:

Data warehousing has become a hot topic for most organizations as data volume grows exponentially, and yet the capacity to manually manage it all but diminishes. The ecosystem is replete with options, each with a host of features and integrations. In this article, we will discuss two of the most common (and commonly discussed!) data warehousing services, Azure Synapse and Snowflake Data Warehouse (DW). For this article, we will try to focus on use cases, and which option is appropriate in that context.

Click through for the product comparison. One big difference not covered is pricing uncertainty. If you have a good understanding of the number of executions and computational complexity of your queries, as well as data quantities, Snowflake can be very competitively priced. But what can happen is that the competitive price turns into a much-less-competitive price by the time you’re fully up to speed.

1 Comment

Deploying Synapse Artifacts to a Managed vNet Workspace

Rui Cunha takes us through an Azure Synapse Analytics deployment scenario:

In my previous article, I demonstrated how we could easily use the Synapse Workspace Deployment extension to accomplish this second stage of the process. I’m now coming back to this topic as I realized that many of our customers were reporting difficulties in completing this second stage of their Synapse CICD process because they were failing to deploy Synapse artifacts to a Managed VNET Synapse Workspace.

In this particular scenario, the deployment was failing because their target workspace was not allowing access from public networks.

Fortunately, the answer isn’t “Allow access from public networks.” Click through to see what you can do instead.

Comments closed