Press "Enter" to skip to content

Author: Kevin Feasel

Reporting Services Versus Power BI Report Server

John White compares and contrasts SQL Server Reporting Services versus Power BI Report Server:

Power BI Report Server (PBIRS) was first introduced in May 2017. Based on SQL Server Reporting Services (SSRS), it brings the ability to work with Power BI reports completely on premises in addition to all the other capabilities of SSRS. Given this, it would be reasonable to conclude that PBIRS was the next version of, or a replacement for SSRS, but that is not the case. I have heard people state that SSRS is “going away”, but this is simply not the case. SSRS is still a core part of the Microsoft BI stack. So, what are the differences between the two platforms? The differences boil down to features, licensing, and update cadence.

If you’re in the BI/report writing space, you will want to read the whole thing.

Comments closed

Automatic Retry With Optimistic Concurrency

Vladimir Khorikov explains an anti-pattern when dealing with a model using optimistic concurrency (for example, memory-optimized tables):

Alright, back to the original question. So, how to combine optimistic locking and automatic retry? In other words, when the application gets an error from the database saying that the versions of a Product don’t match, how to retry the same operation again?

The short answer is: nohow. You don’t want to do that because it defeats the very purpose of having an optimistic lock in the first place.

Remember that the locking mechanism is a way to ensure that all changes are taken into consideration when changing a record in the database. In other words, someone should review the new version of the record and make an informed decision as to whether they still want to submit the update. And that should be the same client who originated the initial request, you can’t make that decision for them.

Plenty of systems do this sort of data merging automatically, but I get Vladimir’s point:  if someone else pulled the rug out from under you, it might change your decision on what that data should look like.

Comments closed

Max And Min Partition Values

Ken Kaufman explains a major performance problem when trying to get maximum (or minimum) values from a partitioned table:

Now that I rambled a bit you want to know why when using a partitioned table does grabbing the min and max of the primary key take sooooo long, and how do you fix it.  Theoretically you would expect SQL to perform the following steps in grabbing the Max Id

  1.         Grab the Max Id from each partition using a seek
  2.         Hold the results in  temp storage
  3.         Get the Max ID from the  temp storage, and return that result.

 

However SQL doesn’t do that, it actually scans each partition and finds the max id after it has examined every record in each partition.  This is very inefficient, and could kill a query that depends on this value, as well as impact a busy server low on physical resources.    So what we need to do, is manually write the code to perform the steps that SQL Server should actually be doing.

Read on for one workaround Ken uses to deal with this inefficiency.

Comments closed

Schema Comparison With Visual Studio

Arun Sirpal shows off the Schema Compare functionality within Visual Studio:

A very common requirement which can be satisfied by various tools. Personally I like using Visual Studio 2017 Community Edition and I thought I would do a quick overview of it.

First thing, you can find the download from this link: https://www.visualstudio.com/downloads/  and once installed (making sure that you select SQL Server Data Tools)  go find Visual Studio 2017 and you will be presented with your start screen.

Click through for the process.  This tool is nice for one-off jobs, like when you want to synchronize production down to source control or see the differences between two environments.  But if you’re doing these comparisons a lot, I think you’re better off scripting it out using SMO and Powershell.

Comments closed

Generating Fake Company Names

Daniel Hutmacher has a great way of generating fake company names:

This query is actually a lot simpler than it first appears. Here’s how it breaks down:

  • Pick 100 words at random (table “a”)

  • For each word in “a”, if possible, pick a single random word (“b”) that doesn’t start or end with the same three letters as the “a” word.

  • For each word in “b”, if possible, pick a single random word (“c”) that doesn’t start or end with the same three letters as the “a” nor the “b” word.

  • The UPPER(), LEFT() and SUBSTRING() stuff is just to turn the names into title case.

  • As before, the ORDER BY NEWID() randomizes the order in which the TOP (1) row is returned.

My favorite name when running this was Disaster Votes, followed closely by Fail Users Vendor and Terminated Enterprise.  Apparently my SQL Server instance has a very negative impression of my made up companies’ leadership skills.

Comments closed

Polar Charts In Power BI With R

Leila Etaati shows how to build a polar chart in Power BI using an R component:

I just add a layer to the above furmula “coord_polar()” this function also has been used for creating pie charts. it gets the “theta” variable, in below example I put theta=y axis, so we have below charts

Normally I don’t much like this type of polar chart, though I’m a big fan of radar charts, which follow a similar concept.

Comments closed

Keep That Data Raw

Archana Madhavan argues that you should retain your raw data:

When your pipeline already has to read every line of your data, it’s tempting to make it perform some fancy transformations. But you should steer clear of these add-ons so that you:

  • Avoid flawed calculations. If you have thousands of machines running your pipeline in real-time, sure, it’s easy to collect your data — but not so easy to tell if those machines are performing the right calculations.

  • Won’t limit yourself to the aggregates you decided on in the past. If you’re performing actions on your data as it streams by, you only get one shot. If you change your mind about what you want to calculate, you can only get those new stats going forward — your old data is already set in stone.

  • Won’t break the pipeline. If you start doing fancy stuff on the pipeline, you’re eventually going to break it. So you may have a great idea for a new calculation, but if you implement it, you’re putting the hundreds of other calculations used by your coworkers in jeopardy. When a pipeline breaks down, you may never get that data.

The problem is that even if the cost of storage is much cheaper than before, there’s a fairly long tail before you get into potential revenue generation.  I like the idea, but selling it is hard when you generate a huge amount of data.

Comments closed

Long-Term Storage In Kafka

Jay Kreps shows us that you can use Kafka as a primary data store:

The short answer is that it’s not insane, people do this all the time, and Kafka was actually designed for this type of usage. But first, why might you want to do this? There are actually a number of use cases, here’s a few:

  1. You may be building an application using event sourcing and need a store for the log of changes. Theoretically you could use any system to store this log, but Kafka directly solves a lot of the problems of an immutable log and “materialized views” computed off of that. The New York Times does this for all their article data as the heart of their CMS.

  2. You may have an in-memory cache in each instance of your application that is fed by updates from Kafka. A very simple way of building this is to make the Kafka topic log compacted, and have the app simply start fresh at offset zero whenever it restarts to populate its cache.

  3. Stream processing jobs do computation off a stream of data coming via Kafka. When the logic of the stream processing code changes, you often want to recompute your results. A very simple way to do this is just to reset the offset for the program to zero to recompute the results with the new code. This sometimes goes by the somewhat grandiose name of The Kappa Architecture.

  4. Kafka is often used to capture and distribute a stream of database updates (this is often called Change Data Capture or CDC). Applications that consume this data in steady state just need the newest changes, however new applications need start with a full dump or snapshot of data. However performing a full dump of a large production database is often a very delicate and time consuming operation. Enabling log compaction on the topic containing the stream of changes allows consumers of this data to simple reload by resetting to offset zero.

This is a great article, especially the part about how Kafka is not the data storage system; there are reasons you’d want data in other formats as well (like relational databases, which are great for random access queries).

Comments closed

tSQLt And VS Database Tests

Gavin Campbell combines tSQLt along with a Visual Studio database test project:

There are a few ways of getting the tSQLt objects deployed to where they are needed for testing, the way I use most often is basically this one, whereby we create a .dacpac of just the tSQLt objects (or use one we made earlier!), and create a second database project with a Database Reference of type “Same Database” to the project we are trying to test, and a reference to our tSQLt .dacpac. The .dacpac file needs to be somewhere in our source folders, as it will be added by path. We also need a reference to the master database, as this is required to build the tSQLt project.

I see the two tools as serving completely different purposes:  tSQLt is a decent unit test framework, whereas a Visual Studio database project is a decent integration test framework.  It’s interesting that Gavin was able to combine them here but aside from having a common test runner, my inclination would be to keep them separated.

Comments closed

Long Live The DBA

Kellyn Pot’vin-Gorman notes that the “Gone will be the DBA” trend has hit Oracle as well:

Any DBA who specializes in optimization knows that hardware offers around 15% overall opportunity for improvement.  My favorite quote from Cary Millsap, “You can’t hardware your way out of a software problem” is quite fitting, too.  A hardware upgrade can offer a quick increase in performance, only to find that the problem seemingly returns after a period of time.  As we’ve discussed in previous posts.  The natural life of a database is growth-  growth in data, growth in processing, growth in users.  This growth requires more resources and if the environment is not performing as optimally and efficiently as possible, more resources will always be required.

Someday I will write my “No, the DBA isn’t going anywhere” opus, but today is not that day.  Anyhow, this is a good post for anyone worried that automation will kill the DBA.

Comments closed