Locality

Kyle Kingsbury explains that sequential, serializable, and strictly serializable consistency models cannot provide locality:

We often speak of locality as a property of subhistories for a particular object x: “H|x is strictly serializable, but H is not”. This is a strange thing to say indeed, because the transactions in H may not meaningfully exist in H|x. What does it mean to run [(A enq y 1) (A enq x 1)] on x alone? If we restrict ourselves to those transactions that doapply to a single object, we find that those transactions still serialize in the full history.

So in a sense, locality is about the scope of legal operations. If we take single enqueue and dequeue operations over two queues x and y, the space of operations on the composite system of x and y is just the union of operations on x and those on y. Linearizability can also encompass transactions, so long as they are restricted to a single system at a time. Our single-key, multi-operation transactions still satisfied strict serializability even in the composite system. However, the space of transactions on a composite system is more than the union of transactions on each system independently. It’s their product.

Here’s the part where I pretend that of course I understand what Kyle wrote…  Seriously, though, this is a very interesting read.

Min And Max Server Memory

Kevin Hill explains the minimum and maximum server memory options in SQL Server:

Min Server Memory seems to get the most bad information spread around.   SQL Server does NOT automatically grab memory up to the Min setting when it starts.  However, once it gets there, it doesn’t give back.

Back to the car analogy…if you start up and head off down the road at 20 mph, you are above the default (0), but not at the max (100+, depending on the car and the tires…).  If you set the cruise control, you can accelerate up and down above 20, but you won’t go below that unless you hit the brakes.

I do like the car analogy to his post.

VirtualBox Setup

Kevin Feasel

2016-08-16

Hadoop

Jon Morisi has notes on downloading and preparing the Hortonworks sandbox VM via VirtualBox:

Lessons learned:

Use wget for downloads that repeatedly drop the connection.
Don’t try to run VirtualBox inside a VM.
Enable VTx in the BIOS to run VirtualBox.

This post is mostly about troubles in getting the VM and preparing the software.  Definitely take his advice on enabling VTx in your BIOS.

Vendors And Privileges

Dave Mason has a good post about onerous third-party software requirements:

If you’re not familiar with SQL Server, the “sysadmin” server role conveys the highest level of authorization available to a login. “db_owner” also conveys a high level of authorization. Both requirements are far more than what is necessary and violate the Principle of Least Privilege. While I strongly disagree with the install-time requirements, I can at least understand the argument: it’s a one-time activity. But elevated permissions at run time are inexcusable.

Most of the time, software companies publish that because they want to avoid the hassle of support calls when people don’t grant privileges correctly.  I’ve worked with one third-party vendor in the past who sent me the actual permissions requirements after I pestered them a bit, as I wasn’t going to let just anyone have sysadmin on my servers.  But that’s not a scalable approach and does nothing for the next guy who reads the documentation and just gives sysadmin away.

DocumentDB Notes

Kevin Feasel

2016-08-16

Cloud

Victoria Holt has a few handy DocumentDB links:

Query Playground converts T-SQL Syntax to JSON
https://www.documentdb.com/sql/demo

This tool, along with the capacity planner and a querying cheat sheet, are available by clicking through to the post.

Load Testing Analysis Services

Bill Anton provides Powershell code to load test Analysis Services cubes:

One of the more time consuming pieces in this process is step 4 – setting up a solution that can generate a concurrent query workload. Bob Duffy (b | t) has written blogs and delivered presentations on the topic of Analysis Services load testing. However, when it comes to tooling (unless something changed recently) I believe he’s still using a custom .NET Visual Studio Test solution for the test harness. And unless you know .NET, you’re going to have a pretty difficult time getting setup… which is why I was so happy when, earlier this year, Chris Schmidt (b | t) wrote a post over on MSDN demonstrating a method for load testing an Analysis Services database using PowerShell.

This weekend I finally had some time to expand upon Chris’ code sample and add several new features… e.g. parameters (target server/database, level of concurrency, queries per batch, etc) and the ability to retrieve MDX (or DAX) queries from a SQL table. In my experience, it’s quite a bit easier to generate a table of MDX/DAX queries rather than a bunch of txt files with a query in each file.

My first thought was “Well, that doesn’t seem too complicated.”  Which means that Bill did a great job.

Query Store And Dropped Objects

Grant Fritchey gives us a potential nightmare scenario with Query Store:

Along comes our aggressive junior DBA who decides that there are “too many” indexes on the server. No, I don’t know what that means either, but they evidently read it on the internet or something so they drop the index we created before:

1
DROP INDEX TransactionOccurredWhenNCI ON Warehouse.StockItemTransactions;

What now happens to our lovely execution plan and the plan forcing? We’ll take a look at two events in Extended Events, sql_statement_recompile and query_store_plan_forcing_failed. Nothing happens immediately on dropping the index. The plans associated with that object, if any, are marked as invalid in the cache. The next time we call the query it’s going to recompile and we can see the event:

The result is a bit happier than I would have expected; I was looking forward to a “and the world came crashing down” conclusion.

Index-Based Statistics Updates

Michael Bourgon has a script to get information on statistics updates for stats based off of indexes:

Quickie, based off an earlier post. (http://thebakingdba.blogspot.com/2012/02/tuning-statistics-when-were-they.html)

Get the last 4 stat updates for every statistic based on an index. The filter is on the auto_created; flip that to get all the system

This does use the DBCC SHOW_STATISTICS command, which reminds me of a rant (though not about Michael’s code; it’s about the need to use this DBCC command rather than having a nice DMV which returns all of the relevant information).

Making Wide World Importers Bigger

Kevin Feasel

2016-08-15

Data

Koen Verbeeck wants bigger fact tables for Wide World Importers:

Microsoft released a new sample database a couple of months back: Wide World Importers. It’s quite great: not every (unnecessary feature) is included but only features you’d actually use, lots of sample scripts are provided and – most importantly – you can generate data until the current date. One small drawback: it’s quite tiny. Especially the data warehouse is really small. The biggest table, Fact.Order, has about 266,000 rows and uses around 280MB on disk. Your numbers may vary, because I have generated data until the current date (12th of August 2016) and I generated data with more random samples per day. So most likely, other versions of WideWorldImportersDW might be even smaller. That’s right. Even smaller.

260 thousand rows is nothing for a fact table.  I was hoping that the data generator would allow for a bigger range of results, from “I only want a few thousand records” like it does up to “I need a reason to buy a new hard drive.”  Koen helps out by giving us a script to expand the primary fact table.

Connecting Spark and Riak

Kevin Feasel

2016-08-15

Riak, Spark

Pavel Hardak discusses the Riak Connector for Apache Spark:

Modeled using principles from the “AWS Dynamo” paper, Riak KV buckets are good for scenarios which require frequent, small data-sized operations in near real-time, especially workloads with reads, writes, and updates — something which might cause data corruption in some distributed databases or bring them to “crawl” under bigger workloads. In Riak, each data item is replicated on several nodes, which allows the database to process a huge number of operations with very low latency while having unique anti-corruption and conflict-resolution mechanisms. However, integration with Apache Spark requires a very different mode of operation — extracting large amounts of data in bulk, so that Spark can do its “magic” in memory over the whole data set. One approach to solve this challenge is to create a myriad of Spark workers, each asking for several data items. This approach works well with Riak, but it creates unacceptable overhead on the Spark side.

This is interesting in that it ties together two data platforms whose strengths are almost the opposite:  one is great for fast, small writes of single records and the other is great for operating on large batches of data.

Categories

August 2019
MTWTFSS
« Jul  
 1234
567891011
12131415161718
19202122232425
262728293031