Press "Enter" to skip to content

Day: September 5, 2016

Cache Eviction Policies

Dan Luu has a great article from a couple years ago on when a random cache eviction policy might be preferable to Least Recently Used:

Once upon a time, my computer architecture professor mentioned that using a random eviction policy for caches really isn’t so bad. That random eviction isn’t bad can be surprising – if your cache fills up and you have to get rid of something, choosing the least recently used (LRU) is an obvious choice, since you’re more likely to use something if you’ve used it recently. If you have a tight loop, LRU is going to be perfect as long as the loop fits in cache, but it’s going to cause a miss every time if the loop doesn’t fit. A random eviction policy degrades gracefully as the loop gets too big.

In practice, on real workloads, random tends to do worse than other algorithms. But what if we take two random choices and just use LRU between those two choices?

Here are the relative miss rates we get for SPEC CPU1 with a Sandy Bridge-like cache (8-way associative, 64k, 256k, and 2MB L1, L2, and L3 caches, respectively). These are ratios (algorithm miss rate : random miss rate); lower is better. Each cache uses the same policy at all levels of the cache.

Dan writes at a depth I appreciate and on topics I often don’t understand (particularly when he gets into CPU engineering details).

Comments closed

MapReduce

I talk about Hadoop a good bit on Curated SQL.  Therefore, I think it’s worth mentioning the original MapReduce paper that Jeffrey Dean and Sanjay Ghemawat published in 2004:

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper.

Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.

Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google’s clusters every day.

If you’ve never read this paper before, today might be a good day to do so.

Comments closed

Arrays And Lists In SQL Server

Erland Sommarskog has updated his essay on Arrays and Lists in SQL Server.  He’s broken it down into a few parts.  First, the short version:

Now you know why IN (@list) does not work as you hoped for, but if you have a comma-separated list you still need to know to work with it.

The best approach in my opinion is to reconsider having a comma-separated list at all. After all, you are in a relational database, so why not use a table instead? That is, you should pass the data in a table-valued parameter (TVP) instead of that comma-separated list. If you have never used TVPs before, I have an article, Using Table-Valued Parameters in SQL Server and .NET, where I give a tutorial of passing TVPs from .NET to SQL Server, and there is a detailed description exactly of the case of passing a comma-separated list to a TVP. You will find that it is astonishly simple.

Unfortunately, not all environments support TVPs – Entity Framework has no real support for TVPs, reportedly nor has Reporting Services. The same applies if you are on SQL 2005 or earlier, since TVPs were added in SQL 2008. Or you may just be plain stubborn and want to use your comma-separated list. Or you are simply pressed for time, and don’t have the time to learn something new right now.

If you want a longer article on using table-valued parameters, Erland has one of those as well:

This is an article that is intended to get you started with passing table-valued parameters (TVPs) from SQL Server to .NET. I begin with presenting how you use table-valued parameters in SQL Server itself whereupon I give a quick overview of the mechanisms to pass TVPs from ADO .NET to SQL Server.

The main meat of this article are two real-world examples where I use TVPs. The first example is the classical problem of passing a comma-separated list of values to SQL Server, this time through a table-valued parameter. You will be amazed of how simple it is. In the second example I show two ways to load a file with master-detail data into tables in SQL Server. In addition to the examples, there is also some discussion on how you can improve performance when loading large amounts of data.

Despite the appearance of .NET in the title of this article, there is a final chapter that explores the possibilities in other APIs, of which some and some do not support TVPs. This includes Entity Framework which has no for support TVPs. In this chapter I briefly discuss workarounds when TVPs are not available to you.

And for the advanced look at arrays and lists, you have the long-form article:

A problem that has been popular over the years with SQL Server is how to handle a list of values. In the majority of the cases, people have a comma-separated list, because this format is produced by commonly used tools like multi-choice controls in .NET, Reporting Services and other places.

When I say that the problem is popular, I don’t only mean that the questions are commonplace – but so are solutions. You can find no end of blog posts etc that presents string-splitting functions, including performance tests of such functions and there are function that are known to be the fastest etc.

The aim of this article is two-fold: 1) Give a general discussion of how to design string-splitting functions. 2) Present and discuss each method from the angles I bring up in the general discussion. This includes performance, but not only.

Even if you’ve read this article before, it’s worth checking again to refresh your memory and to see his changes.

Comments closed

Labor Day

Today is Labor Day in the United States.  Because most Curated SQL readers have the day off, I’m going to link to some longer-form and more timeless material.

Comments closed