Press "Enter" to skip to content

Month: June 2021

Planning a Restore Strategy

Jonathan Kehayias inverts the paradigm:

Do you know how long it takes to RESTORE each database from a FULL backup? Does your run book include critical configuration options like Instant File Initialization or backup compression that can impact how long it takes to restore a database? Does the database have an excessively large transaction log file that will require zero initialization upon creation if the existing database files are not there? Does the database use transparent database encryption (TDE) which will prevent it from being able to instant initialize the size of the data files for the database?

Until you test your RESTORE time for a FULL backup, you don’t know if you can meet your RTO utilizing backups or not, and you may need one of the previously mentioned HA/DR features in SQL Server. However, in an absolute worst case scenario where recovery can only occur from backups, it is still important to know how long each step of the process is going to take so that you can set expectations appropriately. Your minimum recovery time is going to at least be the time required to restore the most recent FULL backup.

Click through for a lot of great advice in this vein.

Comments closed

Exporting Large Tables from Power BI Desktop

Nick Edwards shows us the way:

Have you ever wanted to export a table from Power BI Desktop into Excel just to make sure the DAX you’ve written is performing as expected but ran into this error message: “Data exceeds the limit – Your data is too large. Some data sampling may occur”?

Most probably this has occurred because you’ve got more than 30,000 rows of data in your table that you’re trying to export. In the example shown below I’ve actually got 30,001 rows of data – a row containing column headers plus 30,000 rows of actual data.

Click through to see how you can get around this limitation.

Comments closed

Declarative MLOps with Ludwig

Jacqueline Cardoso announces a new version of Ludwig:

Ludwig abstracts away the complexity of combining all these disparate systems together through its declarative approach to structuring machine learning pipelines. Instead of writing code for your model, training loop, preprocessing, postprocessing, evaluation, and hyperparameter optimization, you only need to declare the schema of your data as a simple YAML configuration:

I’ve long been a fan of declarative approaches to problem-solving, so I’m going to need to dig into this a bit.

Comments closed

Decile Analysis and Logistic Regression

Ridhima Kumar (re-)introduces us to decile analysis:

Decile analysis was once a popularly used technique, however the convention of teaching and bucketing machine learning problems into either ‘classification’ or ‘Regression’ types, lead people to forget Decile analysis type analyses. I am pretty sure, most freshly minted data scientists would not have even heard of Decile analysis. So, coming back to what is Decile Analysis.

Decile Analysis is used to categorize dataset from highest to lowest values or vice versa. (Based on predicted probabilities)

As obvious from the name, the analysis involves dividing the dataset into ten equal groups. Each group should have the same no. of observations/customers.

It ranks customers in the order from most likely to respond to least likely to respond.

Read on to learn the steps and how this ties with the fact that logistic regression is regression.

Comments closed

When the Version Store Fills tempdb

David Fowler takes us through a mental exercise:

Well, there is something else that I’ve seen have a habit of filling TempDB. If you’re using Read Committed Snapshot or RCSI then you’ll also have a version store in your database. I’m not going to go into the details of exactly how version store works, there is plenty of documentation on it out there if you’re interested (perhaps I’ll write a post on it sometime).

The key bit that you need to know is that although SQL will keep version store trimmed down and only keep the rows that are needed, it can only clear rows that are older than the oldest transaction. This is because SQL has no way of knowing what rows are going to be needed by the transaction. Do you see a potential issue here?

Read on for enlightenment.

Comments closed

TDE and Backup Compression

Andy Levy learns the truth:

For years, I thought that native backups of databases using Transparent Data Encryption (TDE) couldn’t be compressed. Between TDE being limited to Enterprise Edition until SQL Server 2019 and my own lack of experience with TDE in prior positions, I hadn’t really experimented with this myself. Some people have even gone so far as to skip compression in their backup jobs for TDE-enabled databases because there’s no need to burn those CPU cycles if you won’t get any compression, right?

But a curious thing happened after I upgraded a portion of my environment to SQL Server 2019 in late 2020. I observed that scheduled backups were compressing for some of my TDE-enabled databases, most notably the newer instances. And when I took ad hoc backups in any environment, they were compressed. So why wasn’t it working everywhere?

Read on for the explanation, though one correction: MAXTRANSFERSIZE is 1MB by default only when the database is not encrypted using TDE (and you aren’t backing up to a tape drive). If the database is encrypted using TDE, the default max transfer size is 64KB, and I think that’s what got Andy.

1 Comment

Bug with Filtered Index on Computed Column

Erik Darling points out a weird bug:

At some point in the past, I blogged about a silent bug with computed columns and clustered column store indexes.

In this post, I’m going to take a quick look at a very loud bug.

Normally, you can’t add a filtered index to a computed column. I’ve always hated that limitation. How nice would that be for so many currently difficult tasks?

Click through to see how you can create a filtered index against a computed column, as well as all of the pain it provides.

Comments closed

Nullable Columns and Power Query

Chris Webb gives us another reason to curse NULL:

Recently I’ve been asked by colleagues with various different types of performance problems why Power BI is generating SQL in a particular way, and the answer has been the presence of nullable columns in the underlying database – whether it’s SQL Server, Snowflake or Databricks. Now I’m not a DBA or any kind of database tuning expert so I can’t comment on why a SQL query performs the way it does on any given platform, but what I can do is show you two examples of how the presence of nullable columns changes the way Power BI and Power Query generate SQL.

Click through to see what happens.

Comments closed

Reinvestment Risk and Yield to Maturity

Sang-Heon Lee looks at reinvestment risk:

From this post, we can learn the reinvestment risk of coupon bond. It is worth noting that 1) YTM is attainable when roll rate is the same as YTM and 2) The argument that coupon rate is equal to YTM at issuance (par yield) is only applied to standard coupon bond with in arrears interest payment schedule. Unlike standard coupon bond, coupon bond with in advance interest payment has a higher YTM than coupon rate at an issuance.

Click through for the explanation as well as the R code used. H/T R-Bloggers.

Comments closed