Press "Enter" to skip to content

Author: Kevin Feasel

Missing Foreign Keys—A Cultural Problem

Martin Catherall tells a spooky Halloween story:

By large databases I’m roughly meaning databases with several hundred tables, and I usually see a lot of these tables with several hundred GB’s of data in them.

When I generally ask about the reason for no foreign key, I’m told

  1. they add  overhead
  2. they give no benefit
  3. we can’t enter our data properly when we have them

The last one in the above list is generally down to poor modelling – an example being entering a later part of a financial transaction before the first part of the financial transaction has taken place. Once both parts of the financial transaction have taken place then the database is left in a consistent state – BUT, that generally being handled by the app NOT the database – OUCH!

There are times where key constraints are too much—often-updating fact tables might be one such scenario.  And some of “too much” comes down to hardware quality.  But for the most part, those key constraints are one of the clearest forms of database documentation available, not to mention their normal benefits.

Comments closed

Uncovering Complexity In SQL Objects

Michael J. Swart helps us uncover hidden complexity in database objects:

The other day, Erin Stellato asked a question on twitter about the value of nested SPs. Here’s how I weighed in:

Hidden complexity has given me many problems in the past. SQL Server really really likes things simple and so it’s nice to be able to uncover that complexity. Andy Yun has tackled this problem for nested views with his sp_helpexpandview.

Click through for a script which helps.

Comments closed

Valid WAITFOR Data Types

Dave Mason investigates the valid data types you can use as inputs for WAITFOR:

There are certain design patterns in T-SQL that give me pause. They may not be “code smells” per se, but when I encounter them I find myself thinking “there’s got to be a more sensible way to accomplish this”. WAITFOR DELAY is one example. I’ve used it a few times here and there, mostly in one-off scripts. If I wrote some code that used it regularly in production, I’d be paranoid about putting my SPID to sleep forever. Maybe a little bit of paranoia is a good thing. But I digress.

A recent task found its way to me, and I’ve decided to use WAITFOR DELAY as part of my solution. (It hasn’t been tested or implemented yet–perhaps more on this in another post.) My usage this time has been more complex than in the past. What I already knew is that you can use a string literal for the time_to_pass argument. For example, this will delay for 3½ seconds:

WAITFOR DELAY '00:00:03.500'

Click through for a bunch of testing.

Comments closed

Using Spring Boot To Build A NiFi Operational Dashboard

Tim Spann continues his series on building an Apache NiFi operational dashboard:

To access data to display in our dashboard we will use some Spring Boot 2.06 Java 8 microservices to call Apache Hive 3.1.0 tables in HDP 3.0 on Hadoop 3.1.

We will have our website hosted and make REST Calls to Apache NiFi, our microservices, YARN, and other APIs.

As you can see we can easily incorporate data from HDP 3 — Apache Hive 3.1.0 in Spring Boot Java applications with not much trouble. You can see the Maven build script (all code is in GitHub).

Our motivation is to put all this data somewhere and show it on a dashboard that can use REST APIs for data access and updates. We may choose to use Apache NiFi for all REST APIs or we can do some in Apache NiFi. We are still exploring. We can also decide to change the backend to HBase 2.0, Phoenix, Druid or a combination of these. We will see.

Read on for a series of screenshots and config files showing you how to set this up.

Comments closed

What’s New In Cloudera Enterprise 6.0

The Cloudera Hive team looks at the introduction of Apache Hive 2.1 into Cloudera Enterprise 6:

We are also focusing on efficiency across our platform. While on-premises platform efficiency helps manage costs in the long run, the immediate benefits of in-cloud deployments are realized by reducing total cost of ownership (TCO). We introduced Hive-on-Spark two years ago to meet  this goal in collaboration with Intel which is our strategic partner. We have a longstanding collaboration with Intel to optimize Cloudera’s stack on Intel architecture for our customers’ benefit.

In Enterprise 6.0, taking our strategic partnership with Intel ahead for further efficiency gains in Hive, we introduce a major performance and efficiency enhancement in HoS called Parquet Vectorization. This feature enables the HoS engine to process a vector of columns instead of one row at a time by batching data rows together into column vectors and making each operator work on such column vectors. This leads to better utilization of CPU caches and achieves high instructions per cycle by efficiently using the CPU instruction pipeline. In addition, we include numerous other performance improvements. For example, Hive often scans a given table multiple times during self joins, self-unions, or shared sub-queries. To address this, Dynamic RDD caching in HoS reuses a single scan across all these operations. Similarly, when the same subquery is used repeatedly, HoS executes this only once instead of separately for each subquery invocation.  Overall, with all these enhancements, in Enterprise 6.0 Hive can be up to 2.2X faster than Hive on the latest Enterprise 5.x release. The majority of these gains can be attributed to Parquet Vectorization for Hive-on-Spark.

This is another case where the Cloudera-Hortonworks merger will get interesting:  Cloudera seemed to hitch its wagon to Impala and Hortonworks to Hive; will they support both as much as they each did independently, or will the new corporate overlords settle on one of the two?

Comments closed

Digging Into Batch Mode And Parameter Sniffing

Erik Darling has mixed news on the efficacy of using batch mode for rowstore as a way of eliminating problems arising from parameter sniffing:

SQL Server 2019 introduced batch mode over row store, which allows for batch mode processing to kick in on queries when the optimizer deems it cost effective to do so, and also to open up row store queries to the possibility of Adaptive Joins, and Memory Grant Feedback.

These optimizer tricks have the potential to help with parameter sniffing, since the optimizer can change its mind about join strategies at run time, and adjust memory grant issues between query executions.

But of course, the plan that compiles initially has to qualify to begin with. In a way, that just makes parameter sniffing even more frustrating.

Read on for both the good and the bad.

Comments closed

Creating Firewall Rules With Azure Cloud Shell

Kellyn Pot’vin-Gorman shows how you can add a firewall rule for Azure SQL Database from the Azure Cloud Shell:

With my use of scripting and Azure Cloud Shell, I’m automating and building my environment, including SQL Database resources and then have a requirement to access and build the logical objects.  This means that I need a firewall rule build for the Azure Cloud Shell I’m working from.  The IP for this cloud shell is unique to the session I’m running at that moment.

The requirement to add this enhancement to my script is:

  1. Capture and read the IP Address for the Azure Cloud shell session.

  2. Populate the IP Address to a Firewall rule

  3. Log into the new SQL Server database that was created as part of the bash script and then execute SQL scripts.

Click through for instructions.

Comments closed

Optimizing DAX SWITCH Statements With Variables

Marco Russo gives us some advice on optimizing IF and SWITCH statements in DAX:

Though the DAX engine might reuse the result obtained for the same measures in the same filter context (Sales Amount and Sales LY), this is not always the case. In this scenario, variables are a good way to ensure a better optimized code execution.

However, variables should only be used within their respective scope. For example, if a variable is defined before a conditional statement, then the variable will be evaluated regardless of the condition. This has a strong performance impact in case there are disconnected slicers in the report. To elaborate on this, consider the following report where a Time Selection table is used to define a slicer that controls which columns of the matrix should be visible. The matrix contains a single measure called Sales, whose content depends on the period selected in the column.

Read on for more.

Comments closed

Azure SQL Database Hyperscale Tier

Chris Seferlis looks at a new service tier offering for Azure SQL Database:

The Hyperscale service tier provides the following capabilities:

  • Support for up to 100 terabytes of database size (and this will grow over time)

  • Faster large database backups which are based on file snapshots

  • Faster database restores (also based on file snapshots)

  • Higher overall performance due to higher log throughput and faster transaction commit time regardless of the data volumes

  • The ability to rapidly scale out. You can provision one or more read only nodes for offloading your read workload for use as hot standbys.

  • You can rapidly scale up your compute resources (in constant time) to accommodate heavy workloads, so you can scale compute up and down as needed just like Azure Data Warehouse

At what cost?  I like Chris’s “not inexpensive” understatement here.

Comments closed

Slipstream Installation Of SQL Server

Randolph West shows how to install a pre-patched version of SQL Server:

For this example, we will be using the SQL Server 2017 Developer Edition RTM (called en_sql_server_2017_developer_x64_dvd_11296168.iso), and Cumulative Update 11 (called SQLServer2017-KB4462262-x64.exe), which was the latest CU available at the time of this writing.

Place the Cumulative Update in a folder that will contain the patch files. On older versions of SQL Server, this could comprise the latest Service Pack, Cumulative Updates, as well as additional hotfixes you may wish to apply. For instance, as of this writing, SQL Server 2016 requires Service Pack 2, Cumulative Update 3 for Service Pack 2, and two more hotfixes to bring it up to date. We would have to have all four files in this folder.

The actual path does not matter as long as we keep track of where they are. For the purposes of this post, we will assume they are stored in C:\Temp.

Definitely a good idea if you’re installing SQL Server regularly.

Comments closed