Press "Enter" to skip to content

Author: Kevin Feasel

Spark ELT in Synapse Notebooks

Liliam Leme performs some data movement:

I often receive various requests from customers while working on FastTrack projects, and I have compiled some examples to help you build your solution on top of a data lake using useful tips. Most of the examples in this post use pandas, and I hope they will be helpful for you as they were for me.

Please note that all examples in this post use pyspark.

In my scenario, I exported multiple tables from SQLDB to a folder using a notebook and ran the requests in parallel.

Read on for the examples and some of the things you can do with Spark notebooks in Azure Synapse Analytics.

Comments closed

A Love-Hate Relationship with Triggers

Ryan Booz shares some thoughts on triggers:

By design, plain ANSI SQL is declarative (“hey database, this is the data I want, you figure out how to do it”), not procedural (“Hey database, I want this data and I want you to retrieve it like this”). Early on, there wasn’t a standard design for how to add on additional procedural-like features, although that later came with the definition of SQL/PSM sometime in the mid-90s.

However, through the late 80s and most of the 90s, database vendors were trying to keep pace with very quickly changing requirements and needs in the database space. Even though triggers weren’t officially added until the SQL:99 standard, databases like Oracle had already released their own procedural languages and features. Triggers may have been deferred in the SQL-92 standard, but the Standards team couldn’t ignore them (or the complexity that triggers add to transactional consistency).

Click through for a bit more background, some of the pros and cons of triggers, and a few cases where triggers can make sense.

Comments closed

Ordered Columnstore Indexes in SQL Server 2022

Ed Pollack gives us the scoop on ordered columnstore indexes:

One of the more challenging technical details of columnstore indexes that regularly gets attention is the need for data to be ordered to allow for segment elimination. In a non-clustered columnstore index, data order is automatically applied based on the order of the underlying rowstore data. In a clustered columnstore index, though, data order is not enforced by any SQL Server process. This leaves managing data order to us, which may or may not be an easy task.

To assist with this challenge, SQL Server 2022 has added the ability to specify an ORDER clause when creating or rebuilding an index. This feature allows data to be automatically sorted by SQL Server as part of those insert or rebuild processes. This article dives into this feature, exploring both its usage and its limitations.

I’ve seen a couple places where ordered columnstore indexes make enough sense to use, though not as many as I had first anticipated. That might change over time, as we see additional columnstore development.

Comments closed

Allowing Multiple Users to Use the Same Data Source in a Power BI Dataset

Gilbert Quevauvilliers answers a question:

Recently I got a question on the Power BI Community forum which was “Is it possible to allow multiple users to use the same data source for their report”?

I thought it would be a great blog post so that other users could also benefit from this knowledge, as once you know it is quite a simple process to complete

I’m mildly surprised by the question itself, as one of the core reasons for having datasets be top-level entities in Power BI is to allow people to re-use them.

Comments closed

Listing Filtered Indexes

Tom Collins has a filter for filtered indexes:

Question: How can I find SQL Server Filtered Indexes  ?    We are woking on some migrations  from SQL Server to other engine platform and part of the process includes locating a range of database objects . The identified objects will be recoded into the target engine plaform 

Click through for a query which provides this answer. Tom is even kind enough to include the filter definition.

Comments closed

Audit Operations with Triggers in PostgreSQL

Ryan Lambert creates a trigger:

I rarely use triggers. I don’t hate triggers, I just think they should be used sparingly. Like any specialized tool, you should not expect to use triggers for every occasion where they could be used. However… there is one notable use where case I really like triggers: audit tables. Part of the magic of using triggers for auditing data changes in Postgres is the JSON/JSONB support available.

I think this attitude toward triggers is correct. Triggers are really useful, but they come with a few drawbacks, such as tools not making it readily obvious which tables have triggers associated with them, the effects of triggers being “hidden” until execution, and execution problems (e.g., performance issues, potentially modifying data incorrectly, assuming triggers will only work with one row at a time, etc.).

Comments closed

Well-Architected Framework Cost Optimization

Brandon Wilson cuts costs:

Hi everyone! Brandon Wilson (Cloud Solution Architect/Engineer) here to follow up on the post I authored previously for the Well-Architected Cost Optimization Assessment offering, with another customer offering we have known as the Well-Architected Cost Optimization Implementation. This offering can be considered as a continuation/”part 2” of sorts for the Well-Architected Cost Optimization Assessment, where the goal is to help you implement some of the findings relating to Azure Reservations, Azure Savings Plans, Azure Hybrid Benefits, along with cleaning up some of that cloud waste sitting around.

Just as before (and in case you are a new reader), we’ll touch a little bit on the Azure Well-Architected Framework (WAF), along with the Cloud Adoption Framework (CAF), and then go over what is covered in the Well-Architected Cost Optimization Implementation offering itself.

Some of this is Microsoft-internal tooling, though the WAF assessments themselves are available to the general public and well worth going through.

Comments closed

Building a Time Dimension in Power BI

Allison Kennedy shares a variant of DimTime:

 Today’s post is going to be short and sweet. I’m simply publishing my version of a DimTime table for Power BI.

Thanks to Radacad for posting a great article on why we need a DimTime table, along with their script for creating one in Power BI. 

If you don’t already have a DimDate table, you can find my version in my DimDate: What, Why and How blog article. I update the script occasionally with new requests. 

Click through for the script.

Comments closed

Blamestorming with PostgreSQL Errors

Robert Haas points the finger:

If an error message shows up in the PostgreSQL log, what program is malfunctioning? It’s easy to conclude that the answer is PostgreSQL, but that’s too simplistic.

For example, suppose these messages show up in my log file:

ERROR:  new row for relation “orders” violates check constraint “orders_d_check”
DETAIL:  Failing row contains (1975-03-16, 42).
STATEMENT:  insert into orders values (‘1975-03-16’, 42);

Error logs are always a “reading is fundamental” scenario and an exercise in applied logic. Sometimes the exercise is harder than others, such as the notorious “String or binary data would be truncated” in SQL Server or “Object reference not set to an instance of an object” in .NET. But in many cases, there’s enough information at least to get an idea of culprit and cause.

Comments closed

Provisioning an Azure Key Vault

Andy Leonard takes us through building an Azure Key Vault:

One way to keep confidential information confidential is to store confidential values in Azure Key Vault.

This post describes one way to provision an Azure Key Vault.

In addition to other values, I use key vault to store login usernames – as well as passwords – in key vault. Why? I don’t like storing half of the login information – the username – in plain text. In case I haven’t shared this with you, you should know I use a password generator to create usernames and passwords. In Azure, it’s common to use the same username and password in multiple locations, so when I change access credentials (You are regularly changing passwords, at least, right?), I can update both values in a central location.

One nice thing about most Azure services is that they make Key Vault access fairly easy, especially if you use the managed identity account to grant vault access.

Comments closed