Kevin Feasel – Page 652

Apache Flink Table Store

Published 2022-05-19 by Kevin Feasel

Jingsong Lee and Jiangjie Qin have an announcement:

As of now it is quite common that people deploy a few storage systems to work with Flink for different purposes. A typical setup is a message queue for stream processing, a scannable file system / object store for batch processing and ad-hoc queries, and a K-V store for lookups. Such an architecture posts challenge in data quality and system maintenance, due to its complexity and heterogeneity. This is becoming a major issue that hurts the end-to-end user experience of streaming and batch unification brought by Apache Flink.
The goal of Flink table store is to address the above issues. This is an important step of the project. It extends Flink’s capability from computing to the storage domain. So we can provide a better end-to-end experience to the users.

Click through to see how table storage works.

Comments closed

Summarize in KQL

Published 2022-05-19 by Kevin Feasel

Robert Cain continues a series on KQL:

When data is analyzed, it is seldom done on a row by row basis. Instead, data analysts look at the big picture, looking at total values. For example, the total number of times the disk transfer counter is recorded for a time period may give an indication of disk utilization.
To aggregate these values with KQL, we’ll use the summarize operator.

Read on for plenty of demos.

Comments closed

T-SQL Order of Execution and Aliases

Published 2022-05-19 by Kevin Feasel

Joe Billingham explains why you can’t do that thing you want to do:

So, you have just written a query, hit execute and you have encountered an error: Invalid column name ‘[column name]‘.
The column you’ve used in your WHERE clause cannot be identified by its alias. You’ve defined it at the top of the query and used it fine previously as your ORDER BY condition, so why can’t the engine recognise it?

Read on for the answer. This is why some people I know have wanted a SQL-like language which runs in order of execution, so a query would start with the FROM clause rather than the SELECT clause. Languages like KQL do work that day, so there are examples in the wild.

Comments closed

Left and Right Deep Hash Joins

Published 2022-05-19 by Kevin Feasel

Forrest McDaniel dives into the forest:

There’s a lot already written about left versus right deep hash joins. Unfortunately for us SQL Server nerds, “left” and “right” don’t make as much sense in SSMS query plans – those have a different orientation than the trees of database theory.
But if you rotate plans, you can see left and right that make sense (even if they still don’t match canonical shapes). Just follow the join operators.

Read on to understand the difference and what it means for query performance.

Comments closed

Power BI Field Parameters

Published 2022-05-19 by Kevin Feasel

Matt Allington throws one in for free:

The May 2022 version of Power BI Desktop includes a very interesting and useful feature – Field Parameters. Today I will show you how to use this new feature illustrating with 3 (no, wait, 4) use cases – Chart Elements, Chart Axis, Table Contents and Permanent Ad hoc Hierarchies.

Read on to see how to use this preview feature in Power BI.

Comments closed

823/824 Alerts with SQL Server and VMware

Published 2022-05-19 by Kevin Feasel

David Klee loops us in on a tricky-to-catch problem:

We’ve been tracking a weird state with SQL Server virtual machines on VMware and possible warnings on database corruption while VM backups are running, largely centered around (but not isolated to) the tempdb database.
TLDR: We’ve now got a VMware KB article on this situation that you and your VM admins should read if you hit the condition and fall into the specifics listed below. Reference VMware KB 88201 for more details.

Read on for David’s thoughts and what to do if you hit this problem.

2 Comments

Backups with Checksum

Published 2022-05-19 by Kevin Feasel

Chad Callihan tempts Betteridge’s Law of Headlines:

When you’re specifying WITH CHECKSUM as you’re backing up databases, SQL Server will use checksums to help catch any inconsistencies with pages. This seems like a setting that you should always use and would expect to be a default setting. So why doesn’t SQL Server include it by default?

Using the principle that a backup isn’t valid until it’s verified, CHECKSUM acts as a useful but not sufficient check.

Comments closed

Azure Resource Locks

Published 2022-05-19 by Kevin Feasel

Craig Porteous explains the benefit (and pain) behind resource locks in Azure:

In theory, these are perfect for preventing accidental (or deliberate) deletion of resources in Azure. They don’t prevent the deletion of data though, only operating at the “control plane” of a resource. That still sounds great though. Turn them on everywhere! That’s another layer of security in your cloud data platform. Right?

Yeah, here’s where the pain comes in. I tried using resource group locks but there are some resources which use delete capabilities, such as Azure Media Service. A delete lock means no ability to delete uploaded videos.

Comments closed

Protecting ML Models and IP

Published 2022-05-18 by Kevin Feasel

Pete Warden has some advice:

Over the last decade I’ve helped hundreds of product teams ship ML-based products, inside and outside of Google, and one of the most frequent questions I got was “How do I protect my models?”. This usually came from executives, and digging deeper it became clear they were most worried about competitors gaining an advantage from what we released. This worry is completely understandable, because modern machine learning has become essential for many applications so quickly that best practices haven’t had time to settle and spread. The answers are complex and depend to some extent on your exact threat models, but if you want a summary of the advice I usually give it boils down to:
– Treat your training data like you do your traditional source code.
-Treat your model files like compiled executables.

Read on to see why Pete came to this as the appropriate answer, as well as what I have to consider a sly mention of duck boat tours.

Comments closed

Recursive Subdirectory Reads with Hive

Published 2022-05-18 by Kevin Feasel

The Hadoop in Real World team wants to dig deeper:

Let’s say you have a Hive table and the Hive table is pointing at a location or directory which has several sub directories and each subdirectories has files underneath it.
When you query the table however, Hive is only reading the files at the top level folder and ignoring all the files under the subdirectories.

Click through for the solution.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Author: Kevin Feasel