Press "Enter" to skip to content

Month: June 2021

Moving Synapse Databases Across Subscriptions

Steve Hughes hits on one of the tricky administrative bits of Azure Synapse Analytics:

So you can copy Azure SQL Database using the Azure Portal, PowerShell, Azure CLI, and T-SQL. However, this functionality is limited to Azure SQL Database and does not work for Azure Synapse databases (a.k.a. SQL Pools). Early in 2021, the ability to use the copy functionality to copy databases between subscriptions is also supported but requires security work to make sure the permissions in the database servers and networking allow that to happen.

There’s a lot involved in the process, leaving me to provide the sage wisdom that it’s easier not to put it in the wrong subscription to begin with if you can avoid it.

Comments closed

Inlined Financial Functions

Erik Darling has some functions for us:

At just about every client site, I see a common set of financial functions being used to calculate various things. The code is all the same, too.

Some of it comes from published government guidelines, and some of it comes straight out of accounting 101 books.

The big problem is that all of these functions were written as scalar UDFs, and performance becomes dead.

Recently, one of my clients was nice enough to agree to let me publish my rewrites of their functions as inline table valued functions.

Check them out on Erik’s GitHub repo.

Comments closed

Working with Multi-Row Headers in Power Query

Ed Hansberry has the solution to a tricky problem:

It is fairly common for users to format Excel reports with headers that are comprised of two or more rows for the header, rather than using a single cell with word wrap on. I’ve seen text files with similar issues as well. Consider the following example:

Getting this info into Power Query can sometimes be a challenge. I’m going to show you two ways to do it. The first way will be to do it manually mostly using the Power Query user interface. The second way will be a custom function that will do all of the work for you. For simplicity’s sake, I’ll use Power Query in Excel for this since my data is in Excel already, but the same logic would hold if you were importing the data into Power Bi. We do not want to consolidate the headers in Excel. You’ll just have to do it manually again the next time you get a file from someone. Remember – never pre-transform your data before you transform it in Power Query.

The nice thing is that Power Query makes this tricky problem fairly easy to solve.

Comments closed

15 tempdb Notes

Deepthi Goguri summarizes a detailed session from Bob Ward:

While I was preparing for my Tempdb presentation, I learned many interesting facts about Tempdb. Thanks so much Bob Ward (t|g) for providing me with the resources to prepare for my presentation. Bob Ward has presented an amazing 3 hour session about Tempdb for the PASS Summit couple of years ago. This information is invaluable.

Read on for 15 notes of interest.

Comments closed

NameNode and Secondary NameNode in Hadoop

The Hadoop in Real World team hit on a naming scheme that I think is bad:

NameNode is the heart of HDFS. NameNode maintains the metadata of HDFS – files, list of blocks, directories, permissions etc. The metadata is persisted on a file named FSIMAGE. During the start up of NameNode, the FSIMAGE file will be read and loaded into memory. 

Any ongoing changes to the files, directories in FSIMAGE will be written to memory and to a temporary log file. NameNode does not save the ongoing changes to FSIMAGE directly and this is because FSIMAGE file could be big for a big HDFS and updating a big file at runtime will be quite expensive and slow.

Read on to learn what the secondary NameNode does. As a hint, it’s not a secondary NameNode in the sense of high availability. If you’re a new Hadoop administrator, the name can be deceiving, letting you think you have high availability when you really don’t.

Comments closed

Dealing with Key Lookups

Jared Poche shows a good technique for removing key lookups:

A key lookup is an operation that occurs when a query has used a nonclustered index on a given table, but needs to access more columns to complete the query. It may need to check columns not in that index for additional filters, or it may just need to return that column as part of its result set.

In the simple query above, we’re retrieving 100 rows from the seek against a nonclustered index, then performing a key lookup against the clustered index. There is a nested loops operator between the two and understanding how that operates is important; for each row we receive from the first table, we perform the second operation once. So, in this query we are seeking 100 rows from the nonclustered index, then performing the key lookup 100 times. We go through the index once for each row we return, and you can see the cost of the key lookup operator is 99% of the query.

Read on for more information, including how to eliminate key lookups.

Comments closed

Whither Code Page 1

Solomon Rutzky unravels a mystery:

Collation names, at least in SQL Server, are structured in that they contain various pieces of information detailing some of the properties of the collation. The collation name of SQL_Latin1_General_CP1_CI_AS can be broken down as follows:

SQL_ This indicates that the collation is a SQL Server collation, while names without this prefix indicate Windows collations

Latin1_General This is the culture (a.k.a. locale) used for the sorting and comparison rules

CP1 This is the 8-bit [C]ode [P]age

CI This indicates [C]ase- [S]ensitive or [I]nsensitive

AS This indicates [A]ccent- [S]ensitive or [I]nsensitive

Ok, but there’s no code page “1”. So, what’s going on there?

Read on to learn the answer behind this mystery.

Comments closed

Enumerating Breaking Changes to Power BI Reports

Brett Powell gives us a list of things which might cause breaking changes in Power BI reports:

A breaking change, which we can define as any change to a dataset which causes either reports to render errors or the dataset to fail to refresh, can severely impact business workflows and reflect poorly on those responsible for the solution. Given significant investments in other areas of the organization’s data estate such as Azure Synapse Analytics, a simple, easily avoidable oversight in a Power BI deployment may not be tolerated.

Read on for the list.

Comments closed

Speeding Up Azure Data Factory Pipelines

Hiram Fleitas doesn’t have all day to wait for that pipeline to finish:

His issue was pretty much as mentioned on the tile. Our bank’s Azure Data Factory pipeline is running slow moving data from on-prem, we’re copying all tables in a SQL Server database, files from ftp sites and network share drives to Azure SQL DB Managed Instance and to blob storage (our datalake) , do you have some recommendations how to make it go faster? Its around 300GBs and takes over 8 hrs.

So I replied with the following and figured to post it here as it may help others.

Hiram has a video, as well as specific advice to offer.

1 Comment

What to Do with Temp Tables in Stored Procedures

Chad Callihan invokes Betteridge’s Law of Headlines:

Generally speaking, it’s best to put things away that aren’t being used. Don’t keep indexes that aren’t getting utilized because they are taking up disk space and still have to be kept up to date with changes. A table is still loaded up with old data that’s not being used but needs kept? Maybe it’s time for options like an archive database or partitioning.

While it’s not on the same level of importance, one related argument I’ve seen and been in is how to handle temp tables in stored procedures. Do you drop them at the end of a stored procedure or do you leave them to be cleaned up by SQL Server? Is one way better for performance than the other? Let’s do some testing and see what we find out.

Read on for the answer.

Comments closed