Press "Enter" to skip to content

Author: Kevin Feasel

Try-Catch Blocks

Dave Mason discusses a major problem with try-catch blocks:

We didn’t just lose one error. We lost them all! DBCC CHECKDB on a corrupt database just came back “clean” and the CATCH block never executed. We’ve moved into dangerous territory now. This is no longer an inconvenience or an annoyance. This could lead to data loss, financial harm to your company, maybe even some risk to your career.

Dave promises a better approach in his next post.

Comments closed

Table Heatmaps

Devin Knight continues his Power BI custom visuals series:

  • In the Table Heatmap the color of the boxes is determined by the value in your measure.

  • Only 1 category field can be used, which will dynamically generate the number of columns based on the number of distinct values your field has.

  • The number and types of colors can be changed using some of the settings we’ll discuss below.

I can see the table heatmap being a good visual for calendars.

Comments closed

Spark At Scale

Sital Kedia, Shuojie Wang, and Avery Ching have an example of how Facebook uses (and has improved) Spark for their ranking system:

Debugging at full scale can be slow, challenging, and resource intensive. We started off by converting the most resource intensive part of the Hive-based pipeline: stage two. We started with a sample of 50 GB of compressed input, then gradually scaled up to 300 GB, 1 TB, and then 20 TB. At each size increment, we resolved performance and stability issues, but experimenting with 20 TB is where we found our largest opportunity for improvement.

While running on 20 TB of input, we discovered that we were generating too many output files (each sized around 100 MB) due to the large number of tasks. Three out of 10 hours of job runtime were spent moving files from the staging directory to the final directory in HDFS. Initially, we considered two options: Either improve batch renaming in HDFS to support our use case, or configure Spark to generate fewer output files (difficult due to the large number of tasks — 70,000 — in this stage). We stepped back from the problem and considered a third alternative. Since the tmp_table2 table we generate in step two of the pipeline is temporary and used only to store the pipeline’s intermediate output, we were essentially compressing, serializing, and replicating three copies for a single read workload with terabytes of data. Instead, we went a step further: Remove the two temporary tables and combine all three Hive stages into a single Spark job that reads 60 TB of compressed data and performs a 90 TB shuffle and sort.

Maybe it’s just a mindset thing, but the part that impressed me was the number of pull requests for system improvements (and the number which were accepted).

Comments closed

Deleting From Heaps

Uwe Ricken discusses a downside to leaving an active table as a heap:

The simplest way of removing empty data pages from a heap is via the exclusive locking of the table. But keep in mind that there is no such thing as a free lunch. Blocking the table from other processes will lead to a system that will not scale. As long as the DELETE process locks the table, no other process can access the table. The only exception is the “SNAPSHOT ISOLATION” but explaining the pros and cons of this technique is outside the scope of this article.

The next example (with the new created table) demonstrates the behavior.

The section on lock escalation was rather interesting.

Comments closed

Pattern Matching

Aaron Bertrand has an article on performance testing various pattern matching techniques:

I wanted to prove to myself that the above procedure was undeniably better overall for all possible searches than any of the queries that use ORconditionals, regardless of whether they use LIKE, LEFT/RIGHT, SUBSTRING, or CHARINDEX. I took David’s basic query structures and put them in stored procedures (with the caveat that I can’t really test “contains” without his input, and that I had to make his OR logic a little more flexible to get the same number of rows), along with a version of my logic. I also planned to test the procedures with and without an index I would create on the search column, and under both a warm and a cold cache.

The conclusion is ambiguous.  Read on for the full story.

Comments closed

Big Play Alerts

Allison Tharp has a Python script to track extremely important events:

First, we get the game data for the game we want.  In this instance, I am getting game data for the Indianapolis vs Cincinnati game in the 4th week of the 2016 preseason and setting it to the variable g.  Next, we will get the current number of scoring plays (scores0), number of home/away team turnovers (home/awayto0), number of home/away penalties (home/awaypenalty0), and finally, the number of yards that resulted from home/away penalties (home/awaypenyds0).

The rest of the script runs while the game is still in progress.  To check if the game is in progress, we use g.game_over().  If this object is False, the game is ongoing:

I did not know about the nflgame module and I think my life has just become better as a result of learning about this.

Comments closed

Superlatches

Ewald Cress discusses superlatch promotion:

There are quite a few pieces of machinery that are involved in our little drama. First, I’ll introduce some instance-global settings:

  • A flag that controls whether latch promotion is enabled at all. Although I don’t have any information about this, let’s assume that it will be enabled on any system that “warrants it”.
  • A flag that controls whether cycle-based promotion is enabled. Again, I can’t currently tell you what determines this setting.
  • sm_promotionThreshold, the current calculated cycle-based promotion threshold described in Part 3.
  • sm_promotionUpperBoundCpuTicks, used as a ceiling value to prevent outliers from skewing stats. As described in Part 3, this is simply sm_promotionThreshold * 5.
  • Trace flag 844, which lowers the threshold for non-cycle-based promotions.
  • Trace flag 827, which causes each latch promotion to be noted in the SQL Server log (“Latch promotion, page %u:%u in database %u, objid %u.”)

Assume that the first flag is set on our system of interest, otherwise promotions won’t happen and we have nothing to talk about.

Read the whole thing.

Comments closed

Powershell Prompts

Thomas Rushton wants to customize his Powershell prompt:

By default, your PowerShell prompt is “PS <<path>> >”. So normal; so dull. However, that can be easily changed. In my case, I want to add:

  • the current time (well, the time at which the prompt was generated)
  • a reminder which account I’m using (I have two – a “normal user” one, and a “DBA/SU” one
  • a countdown of seconds remaining until my next holiday

Oh, and I still need to see the current path.

Very interesting.  By contrast, here’s how to change a Bash prompt.

Comments closed

Biml Relational Hierarchy

Andy Leonard introduces the Biml Relational Hierarchy:

There are two ways to populate the relational hierarchy:

  • Graphically
  • Programmatically

In this post, I am going to cover graphically populating the relational hierarchy. Scott Currie [Varigence] provided a remarkable example of programmatically populating the relational hierarchy in his BimlScript.com article Biml Basics for Relational DBs.

Populating the relational hierarchy graphically is simple; just follow the steps as they are listed from top to bottom in the Logical View’s Relational node in the Mist IDE. Begin with Connections.

Following along does require a copy of Mist/BimlStudio but there’s a free trial.  If your company gets seriously into Biml, Mist is a great purchase.

Comments closed

Azure SQL Database Supports JSON

Jovan Popvic reports that Azure SQL Database now has full JSON support:

JSON is available in all service tiers (basic, standard, and premium) but only in new SQL Database V12. You can see quick  introduction here or more details in Getting Started page. you can also find code samples that JSON functions in Azure Sql Database on official Sql Server/Azure Sql Database GitHub repository.

Note that OPENJSON function requires database compatibility level 130. If all functions work except OPENJSON, you would need to set the latest compatibility level in database.

It will be interesting to see adoption of JSON within Azure SQL Database.  I could see it being a bit more likely due to DocumentDB.

Comments closed