Press "Enter" to skip to content

Author: Kevin Feasel

Azure Synapse Link for SQL

Chuck Heinzelman makes an announcement:

Azure Synapse Link for SQL is an automated system for replicating data from your transactional databases (both SQL Server 2022 and Azure SQL Database) into a dedicated SQL pool in Azure Synapse Analytics. The process of setting up a link from your SQL data to Azure Synapse takes just a few clicks and a matter of minutes rather than hours or days for traditional ETL processes. Once configured, your initial data is replicated into the target dedicated SQL pool. After the initial table seeding, changes made to your source data are replicated in near real-time. 

I’ll be interested in seeing how it performs and how efficient that change feed processor is.

Comments closed

JSON Enhancements in Azure SQL DB and SQL Server 2022

Umchandar Jayachandran has an announcement:

Today, we are announcing the public preview of JSON enhancements in Azure SQL Database and SQL Server 2022 CTP 2.0. This preview contains an enhancement to ISJSON function and three new JSON functions – JSON_PATH_EXISTS, JSON_OBJECT and JSON_ARRAY. Currently, the ISJSON function allows you to test if a string value contains a valid JSON object or array. The new optional json_type_constraint parameter in ISJSON function can now be used to test conformance of JSON documents to the IETF RFC 8259 specification. This capability allows you to test for strings that contain a JSON value, scalar, object, or array. This functionality is like the IS JSON predicate in the ANSI SQL standard. The new JSON_PATH_EXISTS function allows you to test for the existence of a specific SQL/JSON path expression in a JSON document. This functionality is like the JSON_EXISTS predicate in the ANSI SQL standard. The new ANSI SQL compatible JSON value constructors – JSON_OBJECT and JSON_ARRAY functions allow you to construct JSON object or array from SQL data.

Even if you don’t store data in JSON format, there are good reasons why you might need to accept data in JSON format (or emit data in JSON format), especially when working with languages like R and Python.

Comments closed

Low-Latency Flink

Jun Qin and Nico Kruber have started a series on low-latency streaming in Apache Flink. The first two posts of the series are up, starting with the overview:

Latency can refer to different things. LatencyMarkers in Flink measure the time it takes for the markers to travel from each source operator to each downstream operator. As LatencyMarkers bypass user functions in operators, the measured latencies do not reflect the entire end-to-end latency but only a part of it. Flink also supports tracking the state access latency, which measures the response latency when state is read/written. One can also manually measure the time taken by some operators, or get this data with profilers. However, what users usually care about is the end-to-end latency, including the time spent in user-defined functions, in the stream processing framework, and when state is accessed. End-to-end latency is what we will focus on.

Part 2 discusses direct latency optimization techniques:

When interacting with external systems (e.g., RDBMS, object stores, web services) in a Flink job for data enrichment, the latency in getting responses from external systems often dominates the overall latency of the job. With Flink’s Async I/O API (e.g., AsyncDataStream.unorderedWait() or AsyncDataStream.orderedWait()), a single parallel function instance can handle many requests concurrently and receive responses asynchronously. This reduces latencies because the waiting time for responses is amortized over multiple requests.

Stay tuned for more posts in the series.

Comments closed

The Future Object in Scala

Gulshan Singh visits from the future:

You have units of work that you want to run asynchronously, so you don’t block while they’re running. A future gives you a simple way to run an algorithm concurrently. A future starts running concurrently when you create it and returns a result at some point, well, in the future. In Scala, we call that a future returns eventually.

The Future instance is a handle to an eventually available result. You can continue doing other work until the future completes, either successfully or unsuccessfully.

You may also know of Futures as Promises. It’s quite similar to async calls in .NET as well.

Comments closed

Allowing for Cross-Database Access via Module Signing

Tom Zika enables cross-database access:

I’ve recently had to revisit this topic and spent a lot of time recalling the details. So I’m writing this blog post mainly as a reminder for myself.

The most helpful part will be the diagram detailing all the components and their relation and a comprehensive example anyone can follow.

I’m not going to cover Module Signing in general (I’ll leave that to Solomon Rutzky).
Nor will I cover other ways to achieve Cross DB access (like Cross DB Ownership chaining) because this is superior from the security standpoint.

Click through to see a good way of providing access to cross-database resources without explicitly granting rights to users.

Comments closed

Minimum Viable Data Mesh in Azure

Paul Andrew was on a podcast:

For Paul, delivering a single data mesh data product on its own is not all that valuable – if you are going to go to the expense of implementing data mesh, you need to be able to satisfy use cases that cross domains. And the greater value is in cross-domain interoperability, getting to a data product that wasn’t possible before. And, you need to deliver the data platform alongside those first 2-3 data products, otherwise you create a very hard to support data asset, not really a data product.

When thinking about minimum viable data mesh, Paul views an approach leveraging DevOps and generally CI/CD – or Continuous Integration/Continuous Deliver – as very crucial. You need repeatability/reproducibility to really call something a data product.

Click through for the interview as well as Scott Hirleman’s summary.

Comments closed

Counting Employees by Period with DAX

Matt Allington solves a common problem:

I’m calling this article, “How many employees by period”. Staff come and go for different reasons. In some companies, the number of staff can change over time. The principles used in this article can also be used in other instances.  There can be staff moving in and out of departments, on and off of projects, etc. The technique can also be used to work out how many staff were on leave, how many off sick, how many tickets were open in a support queue, or any other concept that has a start and end date in a transactional table.

Read on for Matt’s answer but be sure to check out the comments as there are some other good solutions in there.

Comments closed

Object-Level Security in Power BI

Chris Webb checks out Object-Level Security:

If you have sensitive data in your Power BI dataset you may need to stop some users seeing the data in certain columns or measures. There is only one way to achieve this: you have to use Object Level Security (OLS) in your dataset. It’s not enough to exclude those measures or columns from your reports or to hide them, because there will always be ways for enterprising users to see data they shouldn’t be allowed to see. However the problem with OLS up to now is that it didn’t play nicely with Power BI reports and so you had to create multiple versions of the same report for different security roles. The good news is that there’s now a way to create one report connected to a dataset with OLS and have it display different columns and measures to users with different permissions.

And then watch as Chris combines Row-Level Security with Object-Level Security to make it nicer for users but probably a mess for maintainers.

Comments closed