Press "Enter" to skip to content

Day: July 12, 2022

Parameterizing Queries with Amazon Athena

Blayze Stefaniak, et al, architect a service to provide data via Amazon Athena:

Customers tell us they are finding new ways to make effective use of their data assets by providing data as a service (DaaS). In this post, we share a sample architecture using parameterized queries applied in the form of a DaaS application. This is helpful for many types of organizations, whether you’re working with an enterprise making data available to other lines of business, a regulator making reports available to your industry, a company monetizing your data assets, an independent software vendor (ISV) enabling your applications’ tenants to query their data when they need it, or trying to share data at scale in other ways. In DaaS applications, you can provide predefined queries to run against your governed datasets with values your users input. You can expand your DaaS application to break away from monolithic data infrastructure by treating data as a product (DaaP) and providing a distribution of datasets, which have distinct domain-specific data pipelines. You can authorize these datasets to consumers in your DaaS application permissions. You can use Athena parameterized queries as a way to predefine your queries, which you can use to run queries across your datasets, and serve as a layer of protection for your DaaS applications. This post first describes how parameterized queries work, then applies parameterized queries in the form of a DaaS application.

Click through to learn how.

Comments closed

Application Names and Database Queries

Tom Zika does not like those missing application names:

Whenever I’m trying to debug a problem using sp_whoisactive or Extended Events (XE) and I see either Core Microsoft SqlClient Data Provider or .Net SqlClient Data Provider, my blood begins to boil.

It means I’ll probably spend hours asking around to try and find the owner. Sometimes knowing the host_name helps, but there can be a multi-purpose host that runs many applications – which one is having the problem?

How do you set the name? Read the post to find out.

Comments closed

Test Those Backups

Kevin Chant comes at us with important advice:

In reality, not being able to restore a database can damage your company’s reputation and you can end up losing colleagues for various reasons. For example, if they ask for restores to be tested and it never happens.

To save your colleagues from a database restore nightmare you must test restoring your database backups at some level. Otherwise, it can lead to serious issues.

As the saying goes, you don’t have a backup until you’ve successfully restored it.

Comments closed

Another Knock against OFFSET/FETCH

Erik Darling won’t play fetch:

The two methods we’re going to look at are OFFSET/FETCH and a more traditional ROW_NUMBER query.

As you may have picked up from the title, one will turn out better, and it’s not the OFFSET/FETCH variety. Especially as you get larger, or go deeper into results, it becomes a real boat anchor.

Click through for the details.

Comments closed

Database Project Versioning and Identification

Eitan Blumin answers an important question:

“What is SSDT“, you ask? Oh, you didn’t? Well, let me tell you anyway! SSDT is the go-to solution from Microsoft for versioning SQL Server databases and performing state-based deployments (and it’s free!). It has many useful capabilities for developing and publishing changes from your SQL Database project to your SQL Database in production (or wherever).

One of the things that are not so clear about SSDT specifically and database versioning in general, is how should one identify which “version” of your database project was last deployed to your server?

Eitan includes several ways of tracking and controlling database versions.

Comments closed

Join Removal Due to Foreign Key Constraint

David Alcock shows a performance benefit from having a foreign key constraint in place:

Foreign keys are used in database design to enforce referential integrity but they also have some performance benefits as well that you might not necessarily notice unless you’re looking into your execution plans.

Let’s take the following query using the AdventureWorks2019 sample database where I’m selecting the BusinessEntityID and JobTitle from the HumanResources.Employee table and by using an inner join I’m only returning rows that have matching values (BusinessEntityID) in both tables:

There are specific rules to table elimination but if you meet the criteria, it can save you a bit of CPU time and I/O.

Comments closed

Gateways and the CPU Cost of Power BI Dataset Refresh

Chris Webb continues experimenting:

After last week’s post on measuring Power Query CPU usage during dataset refresh, someone asked an obvious question that I should have addressed: does using a gateway change anything? After all, if you’re using a gateway to connect to an on-premises data source then all the Power Query queries transforming the data from that source will be executed on the gateway machine and not in the Power BI Service.

Let’s do a quick test to find out. 

Read on to see what Chris found out.

Comments closed