Press "Enter" to skip to content

Month: July 2018

Querying Data In Temporal Tables

Jeanne Combrinck shows us how to query data stored in temporal tables:

When you want to get latest (actual) state of data in a temporal table, you can query the same way as you query a normal table. If the PERIOD columns are not hidden, their values will appear in a SELECT * query. If you specified PERIOD columns as hidden, their values won’t appear in a SELECT * query. When the PERIOD columns are hidden, reference the PERIOD columns specifically in the SELECT clause to return the values for these columns.

To perform any type of time-based analysis, use the new FOR SYSTEM_TIME clause with four temporal-specific sub-clauses to query data across the current and history tables.

  • AS OF <date_time>

  • FROM <start_date_time> TO <end_date_time>

  • BETWEEN <start_date_time> AND <end_date_time>

  • CONTAINED IN (<start_date_time> , <end_date_time>)

  • ALL

Do check it out.  The big downside I’ve seen to temporal tables is that I can’t query a temporal table as of a per-row timestamp in another table.

Comments closed

The Halloween Problem And Deleting From Self-Referencing Tables

Louis Davdison walks us through a case where deleting from a table can be extremely slow:

I have a great interest in hierarchy solutions, as I plan to write a book on implementing hierarchies in SQL Server in the next year or so (after I see what happens in SQL Server vNext after 2017). Something I didn’t include in my design testing when I built my presentation on hierarchies (available from a link here) a few years back was the cost to remove nodes from a tree. I learned a big lesson today about the cost of deleting from a hierarchy (the hard way). It is definitely something I will be digging deeper into at some point in a lab environment, comparing multiple methods of deleting from a hierarchy, but I wanted to write down the issue with self-referencing table for future use (and hopefully for your use too.)

We have a table with 100s of millions of rows, and deletes from this table were taking forever. There were 17 foreign key constraints to this table, but each of the references were only showing as 1% of the plan. The physical DELETE from the clustered index operator was ~80%, but in the query, there showed up an operator that looked ominous.

Definitely worth a read.

Comments closed

Using Process Monitor With Power Query

Chris Webb has a couple interesting posts on using Process Monitor, a venerable sysinternals tool, to troubleshoot issues with Power Query performance.  First, Chris looks at how often Power Query opens and closes a file for data processing:

Troubleshooting Power Query performance issues in Power BI and Excel can be difficult because it’s a bit of a black box: there’s nothing in the UI to tell you what’s going on inside the Power Query engine and the diagnostic logs are very difficult to interpret. With relational data source like SQL Server you can use tools like SQL Server Profiler to see the queries that are being run by Power Query, and I blogged recently about using Fiddler to troubleshoot OData performance issues; but what about file-based data sources, which often present the most challenges regarding performance?

Process Monitor, a free tool from Microsoft, allows you to monitor file system activity in real-time and even having spent a limited amount of time using it I can already tell that it can provide a lot of information to help identify performance issues with file-based data sources. Take, for example, the scenario I described in my recent post on improving the performance of merge operations. In that post (which I suggest you read before you carry on) I mentioned that it looked as though the Power Query engine was reading data from one of the source files multiple times and Process Monitor confirms that this indeed the case.

Then he shows us just how much data Power Query is reading each time it opens the file:

This post is really just a quick follow-on from my post earlier this week on using Process Monitor to troubleshoot Power Query performance issues with file-based data sources, which I suggest you read before carrying on. I realised, after playing around with Process Monitor some more, that the ReadFile operation actually tells you how much data is being read from a file when a Power Query query is running. For example, here’s a sample of some of the ReadFile operations captured while running the unoptimised version of the query I talked about in my last post:

I get the feeling that really learning a few sysinternal tools can be extremely useful, like how learning Wireshark can make you a better DBA.

Comments closed

Tableau And Power BI: Complementarities

Jen Stirrup argues that there is a good reason to use both Tableau and Power BI in the same environment:

A question I often here is this: Which tool should I use, Tableau or Power BI? The truth is: They are not mutually exclusive.

Tableau is great at business mysteries: ill-defined questions where you have to surf the data for results. Power BI is particularly great at modelling and cleaning the data, with clean, crisp data visualisation and the ability to use custom and open-source data visualizations. This blog isn’t aimed at the technical user, but at the analyst who needs to get information out quickly. I will do another post, aimed at the geeks, another time.

I am all about embracing the power of And.  The downside is, my corporate budget went bankrupt just thinking about the cost of both of them together…

Comments closed

Visualizing Linear Regression Results

Bernardo Lares gives us a few ways of interpreting visually a linear regression result in R:

The most obvious plot to study for a linear regression model, you guessed it, is the regression itself. If we plot the predicted values vs the real values we can see how close they are to our reference line of 45° (intercept = 0, slope = 1). If we’d had a very sparse plot where we can see no clear tendency over that line, then we have a bad regression. On the other hand, if we have all our points over the line, I bet you gave the model your wished results!

Then, the Adjusted R2 on the plot gives us an easy parameter for us to compare models and how well did it fits our reference line. The nearer this value gets to 1, the better. Without getting too technical, if you add more and more useless variables to a model, this value will decrease; but, if you add useful variables, the Adjusted R-Squared will improve.

We also get the RMSE and MAE (Root-Mean Squared Error and Mean Absolute Error) for our regression’s results. MAE measures the average magnitude of the errors in a set of predictions, without considering their direction. On the other side we have RMSE, which is a quadratic scoring rule that also measures the average magnitude of the error. It’s the square root of the average of squared differences between prediction and actual observation. Both metrics can range from 0 to ∞ and are indifferent to the direction of errors. They are negatively-oriented scores, which means lower values are better.

I like this approach to explaining models.

Comments closed

Test-Driven Database Development

Haroon Ashraf walks us through a simplified example of test-driven database development:

In TDDD, business requirements are encapsulated in database unit tests.

In case the requirement is adding a new category to the Category table, it is necessary to implement TDDD according to the following steps:

  1. Creating of database unit test to check the existence of AddNewCategory database object.

  2. Failing of the unit test because of the database object absence.

  3. Creating the AddNewCategory object in order for the unit test to pass.

  4. The unit test determines whether AddNewCategory stored procedure is actually adding a new category or not.

  5. That unit test also fails.

  6. AddNewCategory procedure code changes to add a new category that verifies afterrerunning the unit test, which is able to pass now.

Laying out my biases, I’m not a fan of TDD for application development and definitely not a fan of it for database development.  “Unit testing” inside a database is extremely limited, particularly when there are so many side effects and encapsulation tends to be actively harmful.

Comments closed

Improving AG Database Level Failover

Sourabh Agarwal announces improvements to Availability Groups when it comes to database level failover:

In addition to the existing checks, the new implementation has the following additional checks.

  1. The new implementation stores and uses a historical snapshot of the database state information to decide if a failover should be initiated. The health check routine caches the database state and associated error information, for the last two executions, which is then compared with the state information from the current execution of the health detection routine. If the same error condition (for the below mentioned error codes) exists in three consecutive runs of the health detection routine, a failover is initiated. This implementation is intended to provide safeguards against transient errors and issues which can be fixed by the auto page repair capabilities of the availability groups.

  2. The new implementation checks for following additional errors. Majority of these errors are indicative of a hardware issues on the server. Please note, that this is not an exhaustive list of errors which could impact the database availability. There is an outstanding item to include error 824 to this list.

Great news from the Tiger Team.

Comments closed

Managing SQL Logins Across Different Instances

Raul Gonzalez shows us how to maintain the same login across different SQL Server instances—it’s all about the SIDs:

Most servers out there would have both enabled so sooner or later us, DBA’s, need to deal with SQL logins, but there is more than providing a name and a password (a strong one, of course).

If you also have different environments, most likely you want to create different logins to avoid DEV apps or users connecting to LIVE or vice-versa.

But when you have different logins and by default database users, when you need to refresh your DEV (TEST, QA…) you’d need to apply all the permissions granted again to the right user because the login does not exist in that environment. Does it sound familiar?

In this post I will show you how you can handle this problem in a very simple way.

Click through to read the whole thing.

Comments closed

It’s 10 O’Clock; Do You Know Where Your Backups Are?

Adrian Buckman has a script which makes sure your backups are where msdb says they are:

Here is the information that the script provides:

  1. Warn of restores over the top of the database since its last FULL backup

  2. Show database snapshots currently against the database

  3. Show the last FULL, DIFF and LOG backup for the database including the backup durations and backup age.

  4. Backup file information such as backup start/finish time , file path , first LSN , Last LSN , a status column which states whether the log chain is in tact based on First and last LSN but also if the file exists on disk, and finally a file exists column which will tell you if the file still exists on disk.

This is a great script if you take transaction log backups frequently (typically a good idea).

Comments closed

SSAS Tabular Deployment Fails: Newtonsoft.Json Missing

Alex Whittles walks us through an error deploying a SQL Server Analysis Services tabular model:

Deploying an Analysis Services Tabular model to SSAS Azure using the Analysis Services Deployment Wizard. Both Visual Studio 2017 & SQL Server 2017 installed on the client.

Try and click on the ellipses to change the data source connection string or impersonation information results in a Newtonsoft.json error:

“Could not load file or assembly ‘Newtonsoft.Json, Version 6.0.0.0, Culture=neutral, ……”

As I like to joke, every single .NET project in existence includes Newtonsoft.Json.  As Alex shows, sometimes they don’t reference the right version.

Comments closed