Press "Enter" to skip to content

Author: Kevin Feasel

Understanding Key Lookups

Hugo Kornelis continues a series on SQL Server plan operators:

The Key Lookup operator provides a subset of the functionality of the Clustered Index Seek operator, but within a specific context. It is used when another operator (usually an Index Seek, sometimes an Index Scan, rarely a combination of two or more of these or other operators) is used to find rows that need to be processed, but the index used does not include all columns needed for the query. The Key Lookup operator is then used to fetch the remaining columns from the clustered index.

A Key Lookup operator will always be found on the inner input of a Nested Loops operator. It will be executed once for each row found. Since the key values passed in always come from another index, the requested row will always exist (except in rare race scenarios when read uncommitted isolation level is used).

Click through for a great deal of information on key lookups.

Comments closed

Generating Entity Framework Core Classes from a Database Project

Erik Ejlskov Jensen walks us through generating Entity Framework classes from a Visual Studio database project and from a .dacpac file:

EF Core Power Tools adds the ability to generate code directly from a Database project, without having to publish to a live database first, and having a SQL Server database engine running locally. It can also generate code from live SQL Server, Azure SQL DB, MySQL, Postgres and SQLite database. It has a large number of customization options – pluralization, renaming, file and name space choices and more, which is not available via the EF Core commands. And you do not have to install any design time libraries in your own project.

Read on for a demo of that as well as a dacpac reverse engineering tool.

Comments closed

Adding Time Zone-Adjusted Report Execution Times

Brett Powell shows how you can display a report’s execution time in a particular time zone:

For reports being viewed by users around the world, simply modifying the footer text box expression to note that this time is UTC may be a sufficient. However, for many paginated reports the users are all in one time zone and some of these users may ask to have the time zone conversion handled within the BI solution. The example in this post targets this scenario.

Even if the report serves users in multiple time zones, it’s technically feasible to leverage the UserID global field and a simple user to time zone mapping table to provide a local report execution time to all users. However, I tend to think most projects would not want to commit the time/resources for this logic – UTC date/time is what the users would get.

If you do need local report execution time, Brett has you covered.

Comments closed

Mistakes to Avoid in a BI Platform Migration

Chris Webb covers five things to consider when migrating your BI platform, using Power BI as an example:

Every report has a data source and getting source data in the right format for your BI platform is a substantial task – so much so, that you might be tempted to put Power BI on top of the data sources you have created for your previous BI platform with no changes. However different BI platforms need their data in different formats. Many BI platforms like their data munged together in one big table, sometimes even with data at different granularities in the same table. Power BI, on the other hand, likes its source data modelled as a star schema (you can find out what a star schema is and why it’s important here). If you don’t model your data as a star schema you may find that you see incorrect values in your reports, that report performance is poor, and that it’s a lot harder to write the DAX calculations that you need.

Four out of the five fit just as well with any other data platform technology.

Comments closed

Using Azure Functions Inside Azure Data Factory

Rayis Imayev shows how you can call an Azure Function from inside your Azure Data Factory Pipeline:

Creating a data solution with Azure Data Factory (ADF) may look like a straightforward process: you have incoming datasets, business rules of how to connect and change them and a final destination environment to save this transformed data. Very often your data transformation may require more complex business logic that can only be developed externally (scripts, functions, web-services, databricks notebooks, etc.).

In this blog post, I will try to share my experience of using Azure Functions in my Data Factory workflows: my highs and lows of using them, my victories and struggles to make them work.

This includes a description of the options, a demo function, and additional notes for each technique.

Comments closed

Avoiding Loops in Python with NumPy

Swantika Gupta walks us through vectorization and broadcasting with NumPy:

Vectorization is a powerful ability within NumPy which is used to speed up the code execution without using loop. It expresses operations as occurring on entire arrays rather than their individual elements.

Looping over an array or any data structure in Python has a lot of overhead involved. In NumPy, Vectorized Operations delegates the looping internally to highly optimized C and Fortran functions, making for cleaner and faster Python code. So, vectorization refers to the concept of replacing explicit for-loops with array expressions, which can then be computed internally with a low-level language, like C.

Read on for a few examples of this and broadcasting.

Comments closed

Metadata-Only Column Changes with SQL Server 2016

Paul White takes us through several metadata-only changes which SQL Server 2016 introduced:

These changes can be metadata-only because the underlying binary data layout does not change when Column Descriptor row format is used (hence the need for compression). Without compression, row store uses the original FixedVar representation, which cannot accommodate these fixed-length data type changes without rewriting the physical layout.

You may notice that tinyint is omitted from the integer types list. This is because it is unsigned, while the other integer types are all signed, so a metadata-only change is not possible. For example, a value of 255 can fit in one byte for tinyint, but requires two bytes in any of the signed formats. The signed formats can hold -128 to +127 in one byte when compressed.

This is very interesting, but note the long list of requirements for it to work, notably that compression must be enabled on all indexes and partitions.

Comments closed

Dataflows as an Alternative to Incremental Loading in Power BI

Imke Feldmann gives us an alternative to Power BI’s incremental loading using Dataflows:

If you’ve been following my blog for a while, you might have noticed my interest in incremental load workarounds. It took some time before we saw the native functionality for it in Power BI and it was first released for premium workspaces only. Fortunately, we now have it for shared workspaces / pro licenses as well and it is a real live saver for scenarios where the refresh speed is an issue.

However, there is a second use case for incremental refresh scenarios that is not covered ideally with the current implementation. This is where the aim is to harvest and store data in Power BI that will become unavailable in their source in the future or one simply wants to create a track of changes in a data source. Chris Webb has beaten me to this article here and describes in great detail how that setup works. He also mentions that this is not a recommended setup, which I agree. Another disadvantage of that solution is that this harvested data is only available as a shared dataset instead of a “simple” table. This limits the use cases and might force you to set up these incremental refreshes in multiple datasets.

Read on for more information.

Comments closed

Diagnosing Out of Memory Failures with SQL Server 2017

Lonny Niederstadt had a curious issue:

When [Max Server Memory] can be attained by SQL Server, [Target Server Memory] will be equal to [Max Server Memory].  If memory conditions external to SQL Server make [Max Server Memory] unattainable, [Target Server Memory] will be adjusted downward to an attainable value. As workload is placed on the system, [Total  Server Memory] grows toward [Target Server Memory].  That’s typical, expected behavior.

In this case, the story stays boring until after the following midnight.  There wasn’t enough workload to drive much growth of [Total Server Memory] until about 2:15 am, after which [Total Server Memory] grew fairly rapidly.  [Total Server Memory] reached a plateau between 3:00 am and 3:15 am, and then stepped slightly down later.  [Target Server Memory] was never attained.  That’s curious.

Indeed it is. Read the whole thing. And, given that it is labeled as Part 1, stay tuned for Part 2.

Comments closed