Curated SQL – Page 947 – A Fine Slice Of SQL Server

Obtaining Accurate Totals in DAX

Published 2020-04-02 by Kevin Feasel

Alberto Ferrari explains a nuance of summation in DAX:

In simple DAX measures, the total of a report is the sum of its individual rows. For more sophisticated measures, the total might seem wrong because the formula does not include an aggregation over the rows that are visible in the report. For example, if the total of a measure must be the sum of the values displayed in the rows of a report, we consider the expected result a “visual total”, which is a total that corresponds to the visual aggregation of its values sliced by different rows in the report.

Click through for a straightforward demonstration.

Comments closed

Understanding Point-In-Time Recovery with SQL Server

Published 2020-04-02 by Kevin Feasel

Eduardo Pivaral walks us through what it takes to get point-in-time recovery of data in SQL Server:

Nowadays, data is a precious asset for companies today. If you are a database administrator (by decision or by mistake) or simply you are the “IT guy,” you have the mission of guarantee all the data is backed up and accessible for recovery.
Trust me, even when you could think you have the more reliable hardware on the planet, or you have multiple database replicas around the globe, anything can happen (a user deleting an entire schema by mistake, an application updating the wrong records, some process crashing, a lot of things can happen).
So trust me and don’t question me, just backup all your databases regularly.

During my time as a DBA, I think the most frequent reason for needing point-in-time backups was “We goofed up at 2:20 PM and need to get the database back to that state,” where goof-ups typically involved mass updates or deletes of data.

Comments closed

Auto-Detecting Column Delimiters with Data Factory

Published 2020-04-02 by Kevin Feasel

Mark Kromer shows us a way of dynamically learning what the likely delimiter of a delimited file is:

Processing delimited text files in the data lake is one of the most popular uses of Azure Data Factory (ADF). To define the field delimiter, you set the column delimiter property in an ADF dataset.
The reality of data processing is that delimiter can change often. ADF provides a facility to account for this data drift via parameterization. However, this assumes that you know that the delimiter is changing and what it will change to.
I’m going to briefly describe a sample of how to auto-detect a file delimiter using ADF Data Flows.

Click through for the demo.

Comments closed

Azure AD Passthrough and Password Hash Authentication in SQL DB, DW, MI

Published 2020-04-02 by Kevin Feasel

Mirek Sztajno announces two new security pieces for Azure SQL Database, Azure Synapse Analytics, and Azure SQL Managed Instances:

We are announcing support for Azure AD pass-through and password hash authentication for Azure SQL DB (single database and database pools), Managed Instance, and Azure Synapse (formerly SQL DW).
– Azure AD password hash authentication is the simplest way to enable authentication for on-premises Active Directory users in Azure AD. Users are synchronized with Azure AD and password validation occurs in the cloud using the same username and password that is used in on-premises environments. No additional infrastructure is required.
– Azure AD pass-through authentication provides a password validation mechanism that validate users directly with on-premises Active Directory, outside the cloud. Pass-through authentication does not require ADFS or other third-party federation services.
– Each of these authentication methods can be configured by Azure AD Connect, allowing you to provision users in the cloud.

Read on to see what this means for you.

Comments closed

Which Groups can Set Permissions in Power BI

Published 2020-04-02 by Kevin Feasel

Gilbert Quevauvilliers walks us through the groups which can set permissions in Power BI:

As you can see from above it is good to know which groups can be used to assign permissions in the Power BI Service.
If there is anything I have missed, is wrong or needs updating please let me know via the comments section below.
Thanks for reading!

Gilbert has a nice matrix as well as lots of screenshots establishing the matrix’s veracity.

Comments closed

Using Ephemeral Containers for Debugging Kubernetes-Based Applications

Published 2020-04-01 by Kevin Feasel

Praveen Sripati walks us through the notion of Ephemeral Containers:

It’s always CRITICAL to pack a Container image with the minimal binaries required as this makes the surface area of attack minimal, upgrading the image and testing also becomes easier as there are less variables to be addressed. Distroless Docker images can be used for the same. In the above diagram Container (A) has only the application and the dependent binaries and nothing more. So, if there are no debugging tools in the Container (A) nor any way to check the status of the process then how do we debug any problem in the application? Once a pod is created, it’s even not possible to add Containers to it for additional debugging tools.
That’s where the Ephemeral Containers come into picture as in the Container (B) in the above picture. These Containers are temporary that can be included in the Pod dynamically with additional debugging tools. Once a Ephemeral Container has been created, we can connect to it as usual using the kubectl attach, kubectl exec and kubectl logs commands.

It’s an interesting approach to the problem.

Comments closed

The Flink-Hive Integration

Published 2020-04-01 by Kevin Feasel

Bowen Li takes us through Apache Flink 1.10’s integration with Apache Hive:

On the other hand, Apache Hive has established itself as a focal point of the data warehousing ecosystem. It serves as not only a SQL engine for big data analytics and ETL, but also a data management platform, where data is discovered and defined. As business evolves, it puts new requirements on data warehouse.
Thus we started integrating Flink and Hive as a beta version in Flink 1.9. Over the past few months, we have been listening to users’ requests and feedback, extensively enhancing our product, and running rigorous benchmarks (which will be published soon separately). I’m glad to announce that the integration between Flink and Hive is at production grade in Flink 1.10 and we can’t wait to walk you through the details.

Click through to see how it works.

Comments closed

Diving into Kubernetes: a Workshop

Published 2020-04-01 by Kevin Feasel

Chris Adkin has been busy:

I have not blogged for a while, it was my hope to produce part 5 in the series of creating a Kubernetes cluster for production grade Big Data Clusters. However, there is a very good reason for this, and that is because I have been working on a one day workshop to be delivered at SQL Bits in September, the material can be found here, enjoy !

I’ve only looked at the module listings, but Chris does a great job putting long-form articles together, so I’ve already added it to my todos.

Comments closed

The Peril of Local Variables

Published 2020-04-01 by Kevin Feasel

Erik Darling dives into the tradeoffs you make when using local variables in stored procedures to avoid parameter sniffing:

In a stored procedure (and even in ad hoc queries or within dynamic SQL, like in the examples linked above), if you declare a variable within that code block and use it as a predicate later, you will get either a fixed guess for cardinality, or a less-confidence-inspiring guess than when the histogram is used.
The local variable effect discussed in the rest of this post produces the same behavior as the OPTIMIZE FOR UNKNOWN hint, or executing queries with sp_prepare. I have that emphasized here because I don’t want to keep qualifying it throughout the post.

This deserves a careful read-through.

Comments closed

Tracking Object Changes and Views

Published 2020-04-01 by Kevin Feasel

Ed Pollack has a solution for tracking when the underlying objects which make up a view change:

When a view’s underlying objects change, the view itself will not change. This can result in a view where the data types of columns, as well as nullability, precision, and scale can be reported inaccurately. When this happens, it is possible for queries against these columns to return errors, truncate data, perform poorly, or otherwise behave in unexpected ways.
This article will delve into views, how they are defined, and how T-SQL can be used to programmatically test the validity of views and ensure they never become stale.

Click through for an interesting article with plenty of code demos.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Curated SQL Posts