May 2020 – Page 15 – Curated SQL

Kafka and Zookeeper

Published 2020-05-04 by Kevin Feasel

Ramandeep Kaur explains what Apache Kafka uses Apache Zookeeper to do:

ZooKeeper allows developers to focus on the core application logic, and it implements various protocols on the cluster so that the applications need not implement them on their own. These services are used in some form or another by distributed applications.

Ramandeep hits on KIP-500 at the end of her post as well.

Comments closed

The Limitations of Metadata-Only Updates

Published 2020-05-04 by Kevin Feasel

Eitan Blumin does not like the limitations of metadata-only column changes with SQL Server 2016:

This is an excellent mechanism on the one hand…
However, it’s completely useless when the column you want to change has a CLUSTERED INDEX defined on it (regardless of whether it’s also a PRIMARY KEY or not).
Such a scenario would especially be common with IDENTITY columns (which, ironically, is exactly the kind of examples that Paul presented in his post).

Click through to understand the scope of this limitation.

Comments closed

The Halloween Problem, Continued

Published 2020-05-04 by Kevin Feasel

Paul White continues a series on the Halloween Problem. Part 2 looks at insert and delete statements:

In the first part of this series, we saw how the Halloween Problem applies to UPDATE queries. To recap briefly, the problem was that an index used to locate records to update had its keys modified by the update operation itself (another good reason to use included columns in an index rather than extending the keys). The query optimizer introduced an Eager Table Spool operator to separate the reading and writing sides of the execution plan to avoid the problem. In this post, we will see how the same underlying issue can affect INSERT and DELETE statements.

Part 3 looks at the train wreck MERGE operator:

In the right circumstances, the SQL Server optimizer can recognize that the MERGE statement is hole-filling, which is just another way of saying that the statement only adds rows where there is an existing gap in the target table’s key.
For this optimization to be applied, the values used in the WHEN NOT MATCHED BY TARGET clause must exactly match the ON part of the USING clause. Also, the target table must have a unique key (a requirement satisfied by the PRIMARY KEY in the present case).
Where these requirements are met, the MERGE statement does not require protection from the Halloween Problem.

If only it weren’t busted in so many other ways!

Part 4 wraps up the series:

The SQL Server optimizer has specific features that allow it to reason about the level of Halloween Protection (HP) required at each point in the query plan, and the detailed effect each operator has. These extra features are incorporated into the same property framework the optimizer uses to keep track of hundreds of other important bits of information during its search activities.
Each operator has a required HP property and a delivered HP property. The required property indicates the level of HP needed at that point in the tree for correct results. The delivered property reflects the HP provided by the current operator and the cumulative HP effects provided by its subtree.

This last one goes into some nice detail.

Comments closed

Using Cognitive Services in Power BI without a Premium Subscription

Published 2020-05-04 by Kevin Feasel

Marc Lelijveld and Kathrin Borchert show how we can take advantage of Cognitive Services and Power BI without having to pay for Power BI Premium:

Recently, I was presenting my session about AI Capabilities for Power BI to make AI Accessible for Everyone for the Virtual Power BI Days Hamburg. A great event organized by Kathrin Borchert. Part of my session was about the Artificial Intelligence capabilities offered as part of Power BI Premium. A day later, Kathrin came up with a great idea how you can leverage these AI capabilities without the need for Power BI Premium.
I was directly enthusiastic about that idea since I thought about this in the past as well. Back then, there were some blockers which are sorted now. I asked Kathrin if she was open for co-authoring this blog and she immediately agreed.

Click through for the technique. Basically, it’s a trade-off between simplicity and cost.

Comments closed

Replicating SQL’s IN Operator with Azure Data Factory

Published 2020-05-04 by Kevin Feasel

Rayis Imayev shows how we can find values in a group using Azure Data Factory:

However only this use-case for the OR function with 2 condition could be possible:or(equals(variables(‘var1’), ‘A’), equals(variables(‘var1’), ‘B’)) – limit of two conditions
But what if we have an ability to check if a particular element variable/parameter/other ADF object value belongs to a range of values (array of value), similarly to what we can do with the IN operator in SQL language, this would definitely solve our problem and remove the limitation of logical conditions to check.

Click through for the answer.

Comments closed

Detecting and Changing Multiple Columns’ Data Types with Power Query

Published 2020-05-04 by Kevin Feasel

Imke Feldmann has a quick tip for us:

Very often, when you expand a column in Power Query that contains a table or some records, the expanded columns will lose their types (like Chris Webb has described here for example). Or you might just have accidently deleted a “Changed Type”-step.
Did you know there is actually a superfast and easy way to do it?

Read on to see how you can change the data type for multiple columns all at once.

Comments closed

The Value of Table Visuals

Published 2020-05-04 by Kevin Feasel

Shannon Holck gives us several scenarios in which tables are a good choice of visual:

Exact numbers are needed
If your report goes to a customer and you need an exact quantity or price or total, absolutely, a table is likely needed.
Displaying a few discrete values
If you need to display 5-10 things and they all represent unique values, a table may be appropriate.
Need to establish trust in the data
If you don’t trust the data (yet) and want to verify data, you can sometimes compare data at a roll-up level to a known source. This is a great way to test not only the data but that the calculations are accurate.

Click through for more reasons, followed by cases in which you want to avoid table visuals, and finally a few ways to improve tables. I’m not the biggest fan of the improved tables (because I want simpler and denser), but this is good food for thought.

Comments closed

Reading Query Plans in Spark

Published 2020-05-01 by Kevin Feasel

Daniel Ciocirlan has a primer on query plans in Apache Spark:

Let’s go over some examples of query plans and how to read them. Let’s go back to the one we’ve just shown:
== Physical Plan == *(1) Project [(id#0L * 5) AS id#2L]
+- *(1) Range (1, 1000000, step=1, splits=6)
We read this plan backwards, bottom to top:

Spark does have some UI components which make this a bit easier, but you’ll probably end up in a situation where you need to read it in this format.

Comments closed

Database Integrity in Cloudera Data Platform

Published 2020-05-01 by Kevin Feasel

Gokul Kamaraj and Liliana Kadar continue a series on operational database tooling in Hadoop:

Referential integrity is supported through the implementation of ‘constraints’ as well as enforcing business rules for attributes in the table.
Constraints are configurable, and you can use it across different tables. Keep in mind that you have to choose a behavior depending on the specific configuration given to that constraint.

This is rather underdeveloped compared to relational database platforms, but it’s still an improvement over the olden days, in which referential integrity was “write code which does that after the fact.”

Comments closed

Availability Group Offerings in Standard Edition

Published 2020-05-01 by Kevin Feasel

Guy Glantser notes an issue with Availability Group documentation:

In SQL Server 2017 Microsoft added a new flavor called Read-Scale Availability Groups. This is different, because the goal here is not high availability or disaster recovery, but rather read-scalability. As opposed to the other flavors, in RSAG there is no cluster, and there is also no automatic failover mechanism. But you can set up multiple secondary replicas with read-only access and load balancing, and offload read workloads from the primary replica. This is a great scalability feature, and you can read more about it here.
Now, if you check Microsoft documentation regarding the editions and supported features of SQL Server, you will be happy to see that RSAG is supported in Standard Edition. I was happy to see it too. Unfortunately, if you try to set up a Read-Scale Availability Group on Standard Edition, it will not work. You will only be able to create a Basic Availability Group, as discussed earlier.

Click through for the answer, as well as what you can do in Standard Edition.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Month: May 2020