Press "Enter" to skip to content

Month: March 2017

Collapsable Subqueries

Dmitry Pilugin notes a new query simplification rule in SQL Server vNext:

You may see that in the first plan, there are two clustered index scans of the table SalesOrderDetail, however the subquery is exactly the same “exists (select * from Sales.SalesOrderDetail f where f.SalesOrderID = d.SalesOrderID)” but referenced twice.

In the second case, compiled under next compatibility level, the double reference of the subquery is collapsed and we see only one reference to the SalesOrderDetails table and more efficient plan, despite the query still has two subqueries with SalesOrderDetails.

In the third case, also compiled under vNext level, we see the second branch with the SalesOrderDetail again, but that is because we turned off the rule CollapseIdenticalScalarSubquery with an undocumented hint queryruleoff (which I originally described in my blog post).

I think Dmitry has the expected use case nailed:  ORMs.  But I can see people writing (well, copy-pasting) similar queries, so maybe it’ll be useful in more contexts as well.

Comments closed

Checkpoints And Memory-Optimized Filegroups

Jack Li explains why, even without a memory-optimized table, you can see XTP checkpoints in your database:

What are checkpoint files?

They are data and delta files as documented in Durability for Memory-Optimized Tables. When you use disk based tables, the data is written to data files.  Even though data is stored in memory for memory optimized tables, SQL Server still needs to persists data for disaster recovery.  Data for memory optimized tables is stored in what we call checkpoint files.  Data file contains rows from insert and update operations. Delta file contains deleted rows.  Over time, these files can be ‘merged’ increase efficiency.  Unneeded files after the merge can be removed eventually (but this can only happen after a log backup).

Click through for a demo script to see this in action.

Comments closed

Power BI Themes

Gogula Aralingam notes that Power BI can now support basic skinning:

The March 2017 update of Power BI Desktop comes with a preview of Themes. Right now it is in its simplest of forms: You manually create a JSON file that has a very few attributes that can set basic color themes to your reports. So all you have to do is create file that looks like this:

Click through for an example.  This isn’t a true fix for the lack of Color Vision Deficiency support, but you can plug in safe colors (for example, this article includes some) and skirt the issue until there’s real support.

Comments closed

Saving Docker Images

Andrew Pruski shows how to save a Docker image:

What I’m going to do now is save my custom image to a location [C:\temp] on my host server. So I’ll run: –

1
docker save -o myfirstimage.tar myfirstimage

N.B. – I’m saving the file as a .tar as this seems to be the format that works best for loading the image. I’ve tried exporting to other formats (.zip) but it always seems to become corrupt. The .tar format has always worked for me so I’ll stick with it.

If I’m understanding Andrew correctly, that’s because docker save will only generate a tar archive.  Keep reading for more details, including how to restore a Docker image to another machine.

Comments closed

Using Desired State Configuration To Install SQL Server

Chris Lumnah shows how to use Powershell Desired State Configuration to automate installation and configuration of a new SQL Server instance:

So the installation of SQL Server is now fairly straightforward. The wizard does a nice job of guiding you along the way. 2016 even includes best practice suggestions for tempdb and instance file initialization. Along the way, Microsoft as given us ways to automate the installation of SQL Server. You can sysprep an instance, but this does not really automate the installation. It just helps create a template of an instance. At the end of the day, you still need to do things manually. You can also use a configuration file to assist here. This is a great step forward, but it does not allow for all of the things you need to do to configure a SQL server.

Powershell does. Desired State Configuration (DSC) is functionality built into Powershell that allows for the installation and configuration of a SQL Server.

Chris includes his script as well as a link for more information on DSC in case you aren’t familiar with the concept.

Comments closed

Virtual Function Tables

Ewald Cress continues his descent into the bowels of SQL Server, this time looking at vftables:

The first and simpler GetData() overload doesn’t show up in a vftable, but the second does. Oddly, the vftable for the second one lives at an offset of +0x1448 into the class instance – you’re going to have to trust me on this one. So the rcx passed into either variation will actually be the same one! But if the virtual version is called, it needs to find its position relative to +0x1448 dynamically, by doing a data lookup. We can confirm that by peeking at what is saved four bytes earlier at +0x1444, and that is indeed the value zero.

Ewald explains how this is vital to multiple inheritance and this post is only guaranteed to make your brain hurt a little bit.

Comments closed

Garbage Collection In Hadoop

Ranjan Banerjee explains how the Java garbage collector works, using Hadoop as an example:

The reason why we all love Java is due to the fact that we can be careless with memory creations and the work of cleaning the mess is performed by the JVM. On a high level, Java heap memory is classified into two phases:

1) Young (eden) space

2)Old space

The eden space is where newly created objects goto. There are various algorithms for garbage collection, but all of them try to first free memory from the young space and for those long lasting memory objects, they are transferred to the old space.

One common issue that can be noticed in running Map Reduce Applications are GC overhead limit exceeded.

Read on for more, including where you can find GC logs.

Comments closed

Time Series Errors

Alex Smolyanskaya explains some common errors when doing time series analysis:

Non-zero model error indicates that our model is missing explanatory features. In practice, we don’t expect to get rid of all model error—there will be some error in the forecast from unavoidable natural variation. Natural variation should reflect all the stuff we will probably never capture with our model, like measurement error, unpredictable external market forces, and so on. The distribution of error should be close to normal and, ideally, have a small mean. We get evidence that an important explanatory variable is missing from the model when we find that the model error doesn’t look like simple natural variation—if the distribution of errors skews one way or another, there are more outliers than expected, or if the mean is unpleasantly large. When this happens we should try to identify and correct any missing or incorrect model features.

It’s an interesting article, especially the bit about cross-validation, which is a perfectly acceptable technique in non-time series models.

Comments closed

Data Lake 3.0

Vinod Kumar Vavilapalli describes the modern data lake:

During the past few years though, end-to-end business use-cases have evolved to another level.

  • The end-to-end business problems are now mostly solved by multiple applications working together.
  • As the platform matured, users have increasingly started wanting to solely focus on the business application layers, and getting impatient to get on with developing their main business-logic.
  • However, YARN, and for that matter any other related platform, hasn’t catered to this evolving need, leaving the users to unwillingly get involved in the painstaking details of wiring applications together, keeping them up, manually scaling them as need arises etc.

Manual plumbing of all these different colored services in tiresome! Further, there is a clear need for seamless aggregate deployment, lifecycle management and application wireup. This is the gap that needs to be bridged between what these end-to-end business use-cases need from the platform and what the platform offers today. If these features are provided, then the business use cases authors can singularly focus on the business logic.

This is a higher-level “where are we at?” kind of post which could be helpful if you’re new to the data lake concept.

Comments closed

Disabling VMware In-Guest Clock Updates

David Klee explains when VMware will update the internal clocks for VMs and shows how to disable that:

These time sync actions can move a guest’s time backwards as well as forwards. More details about this conflict of settings are found in VMware KB1189. If the host time is out of sync, such as when a BIOS battery fails, bad things can happen. This action is extremely detrimental to the state of SQL Server high availability features, such as Availability Groups and Failover Cluster Instances, which depend on the in-guest time closely aligning with the Active Directory synchronized time. This action must be explicitly disabled to ensure that these maintenance items do not trigger an unexpected failover of the SQL Server HA solution. To disable this action, perform the following tasks.

I’d imagine that the ideal would be everything being synched to a single NTP source.

Comments closed