Press "Enter" to skip to content

Curated SQL Posts

Performance Tuning a Dedicated SQL Pool

Sarath Sasidharan has some guidance for us:

Synapse Dedicated pools have been battle tested at enterprise customers across the globe. We deal with data in the magnitude of PetaBytes. Synapse can provide you with the scale of the cloud and the high performance required for your enterprise-grade requirements.  The key to maximizing your performance is to follow best practices, check out best practices for dedicated SQL pools in Azure Synapse Analytics

Failure to do so causes performance issues. In such scenarios, is it important to understand where the bottlenecks are. This blog focuses on the different steps a query goes through; from the time the query is fired from the client until it returns back.  Delay caused in any of the steps would impact the overall run-time of the query and hence indicate degraded performance.

Click through for a walkthrough of each step along the way, potential problems you could run into, and remediations for those problems. Much of the advice is similar to what you’d get with SQL Server, though there are differences interspersed throughout each level.

Comments closed

Working with Remote Jupyter Books in Azure Data Studio

Steve Hughes reaches across the internet:

When working with Azure Data Studio and its support of Jupyter books, you will find there is an option for remote Jupyter books. As shown in the image below, you can open that Jupyter book and follow through the dialogue for a couple of Microsoft books that are readily available.

Click through to see how this option differs from standard Jupyter books (which are themselves different from Jupyter notebooks) and how you can create one.

Comments closed

Data Pipelines and Data Mesh

Jean-Georges Perrin answers a burning question:

I keep having questions about data pipelines. Data pipelines in Data Mesh is a topic I should tackle. So… Is the data pipeline the root of all evil?

Jean-Georges’s answer is quite in line with one of my favorite phrases: “Short answer: no, with an ‘if’; long answer: yes, with a ‘but.'” Read on for some thoughts on data pipelines and what the data mesh concept does to minimize harm.

Comments closed

Creating an Elasticsearch Pipeline

The Big Data in Real World team builds a pipeline:

A pipeline is a definition of a series of processors that are to be executed in the same order as they are declared. 

Think of a processor as a series of instructions that will be executed.

In this post we are going to create a pipeline to add a field named doc_timestamp to all the documents that are added to the index.

Click through for the process. In Elasticsearch, ingest pipelines aren’t for moving data but rather for performing some common operations or tasks prior to indexing the data.

Comments closed

Role-Based Access Controls in Amazon OpenSearch

Scott Chang and Muthu Pitchaimani show how to assign rights in Amazon OpenSearch to IAM groups:

Amazon OpenSearch Service is a managed service that makes it simple to secure, deploy, and operate OpenSearch clusters at scale in the AWS Cloud. AWS IAM Identity Center (successor to AWS Single Sign-On) helps you securely create or connect your workforce identities and manage their access centrally across AWS accounts and applications. To build a strong least-privilege security posture, customers also wanted fine-grained access control to manage dashboard permission by user role. In this post, we demonstrate a step-by-step procedure to implement IAM Identity Center to OpenSearch Service via native SAML integration, and configure role-based access control in OpenSearch Dashboards by using group attributes in IAM Identity Center. You can follow the steps in this post to achieve both authentication and authorization for OpenSearch Service based on the groups configured in IAM Identity Center.

Click through for the process.

Comments closed

SQL Server 2022 CU2 Released

Srinivas Kandibanda shares the news:

The 2nd cumulative update release for SQL Server 2022 RTM is now available for download at the Microsoft Downloads site. Please note that registration is no longer required to download Cumulative updates.

Click through for a link to get the latest CU, as well as a link leading to notes on what’s in it. One interesting PolyBase-related note is that SQL Server 2022 CU2 finally supports using TNS files when connecting to Oracle databases. That was the norm the last time I semi-seriously used Oracle (quite a while ago), but for PolyBase, you had to specify all connection details separately.

Comments closed

Tips for Power BI Modeling with ADX

Dany Hoter shares some tips on creating star schema models with Azure Data Explorer:

Relationships between DQ tables are created as M:M by default. This is not a problem and even recommended with single direction.

Read on for several tips. What’s interesting as I read this is just how radically different the advice is for ADX utilization versus Power BI utilization, such as using strings to join dimensions to facts. That would be heresy in a Kimball-style model and is a common cause for slow-down in Power BI. Yet that’s the recommendation here for working with ADX, unless I’m misunderstanding Dany’s post.

Comments closed

Using the Log Replay Service to Migrate to Azure SQL MI

Rob Carrol makes a move:

The Log Replay Service (LRS) is a new Azure service that allows you to migrate your databases from SQL Server on-premises, SQL Server on Azure Virtual Machines, Amazon EC2, Amazon RDS for SQL Server, or Google Compute Engine to Azure SQL Managed Instance. LRS is a free cloud service that uses log shipping technology to enable custom migrations of databases from SQL Server 2008 through 2022.

Read on for some configuration options and tips on how to use the service.

Comments closed

Power BI Group By Columns

Marco Russo and Alberto Ferrari bundle things together:

In Power BI you can specify the unique identifier of a column value by using another column or another set of columns. This feature is currently used by the Fields Parameter feature in Power BI, but it may also be used for other purposes in a model. However, there are several limitations – such as the incompatibility with MDX queries – that reduce one’s ability to use Group By Columns property in many scenarios, so it cannot be used with Excel as a client.

Read on to learn more about how grouping works in Power BI and some of the limitations.

Comments closed

QA Refreshes via CI/CD

Hiram Fleitas rebuilds the QA environment:

In this post I am going to cover how to automatically refresh a lower environment commonly used for testing as part of your release (CD) pipeline.

Well, why? – you may be asking.

  1. In some cases, developers and testers need to test their application code-changes against a fresh copy of production-like data. This helps them do validations prior to publishing their changes to production where their apps are bombarded by end-user live workloads.
  2. Also, the lower environment may be used for testing, and we can’t overwrite the test data constantly. It needs to be a hot-standby refresh, made available when necessary.

Click through for notes on the process.

Comments closed