Press "Enter" to skip to content

Author: Kevin Feasel

Row-Level Security against Power BI Shared Datasets

Teo Lachev combines two capabilities in Power BI:

In a typical engagement, I create an organizational semantic model(s) and “report packs”, such as Sales Report Pack, Inventory Report Pack, etc. These report packs are typically implemented as Power BI reports connected to the semantic model as a shared dataset using the Power BI Datasets connector. Reports sanctioned by IT are published to a dedicated workspace, such as Corporate BI. Departmental reports are deployed to their respective workspace, such as Sales, to enforce content-level security. Usually, the semantic model has row-level security (RLS) roles defined to enforce restricted access to data depending on the identity of the interactive user.

Read on to see how you can test out the results once you get it working.

Comments closed

Testing Azure SQL DB Hyperscale Performance

Reitse Eskens continues a series on performance testing Azure SQL DB tiers:

So far, my blogs have been on the different Azure SQL DB offerings where there are differences between DTU and CPU based. But in general, the design is recognizable. With the hyperscale tier, many things change. There are still cores and memory of course, but the rest of the design is totally different. I won’t go into all the details, you’re better off reading them here [https://learn.microsoft.com/en-us/azure/azure-sql/database/service-tier-hyperscale?view=azuresql] and here [https://learn.microsoft.com/en-us/azure/azure-sql/database/hyperscale-architecture?view=azuresql] , but the main differences are the support of up to 100 TB of data in one database (all the other tiers max out at 40 TB), fast database restores based on file snapshots, rapid scale out and rapid scale up.

There are differences in testing this one versus the others, so buyer beware.

Comments closed

Registered Servers in SQL Server

Kevin Hill has a video for us:

Query multiple SQL Server instances at one time!

I have used registered servers before, though my strong preference is for a Central Management Server. There are limits to how many servers CMS can handle before the initial load gets slow—I know this because we’re over the limit and it causes SSMS to freeze for 3-5 seconds the first time I open CMS. Still, having that server list in a central location means not having to share files around with the rest of the team and trying to figure out whether you have the most recent version of the file.

Comments closed

Software-as-a-Service: Single DB or Per-Client DB

Greg Low makes a choice:

On-premises applications are mostly single-tenant. They support a single organization. We do occasionally see multi-tenant databases. They hold the same types of information for many organizations.

But what about SaaS based applications? By default, you’ll want to store data for many client organizations. Should you create a large single database that holds data for everyone? Should you create a separate database for each client? Or should you create something in-between.

As with most things in computing, there is no one simple answer to this. Here are the main decision points that I look at:

Click through for Greg’s thoughts on the matter. Most of these factors are also relevant for on-premises SQL Server installations, not just Azure SQL DB/Managed Instance.

Comments closed

Motion Detecting and Alerting with Kafka and ksqlDB

Wei Rui and Yinsidi Jiao take us through a scenario:

Managing IoT (Internet of Things) devices and their produced data or events can be a challenge. On one hand, IoT devices usually generate massive amounts of data. On the other hand, IoT hardware has many limitations to process the data generated, such as cost, physical size, efficiency, and availability. You need a back-end system with high scalability and availability to process the growing volume of data. Things become more challenging when dealing with numerous devices and events in real time, and considering the required availability, latency, scalability, and agility for different usage and scenarios.

For Confluent Hackathon 2022, we built an end-to-end motion detection and alerting system, which currently acts as a home surveillance system, on top of Apache Kafka® and ksqlDB to demonstrate how easy it is to build IoT solutions by leveraging Confluent Cloud.

Read on to see how it works.

Comments closed

An Introduction to Event Sourcing

Aasif Ali provides a high-level introduction to the concept of event sourcing:

Event sourcing is a way to store data as events in an append-only log. It only keeps the latest version of the entity state. This method stores the state of a database object as a sequence of events. It is essentially a new event each time the object changed state, from the beginning of the object’s existence. An event can be anything that is generated by a user, a mouse click, a key press on a keyboard, and so on. It is a great way to atomically update the state and publish events. Not just can we query these events, but we can also use the event log to reconstruct past states, and as a foundation to automatically adjust the state to cope with retroactive changes.

Events are immutable, they cannot be changed. This well-known rule of event stores is often the first defining characteristic of event stores and event sourcing.

Read on to see how this concept works and how products like Apache Kafka make event sourcing viable.

Comments closed

Apache Flink 1.16 Released

Godfrey He makes an announcement:

To reduce the cost of migrating Hive to Flink, we introduce HiveServer2 Endpoint and Hive Syntax Improvements in this version:

The HiveServer2 Endpoint allows users to interact with SQL Gateway with Hive JDBC/Beeline and migrate with Flink into the Hive ecosystem (DBeaver, Apache Superset, Apache DolphinScheduler, and Apache Zeppelin). When users connect to the HiveServer2 endpoint, the SQL Gateway registers the Hive Catalog, switches to Hive Dialect, and uses batch execution mode to execute jobs. With these steps, users can have the same experience as HiveServer2.

Read on for a pretty large hit list.

Comments closed

OpenSSL Patch incoming

Steven Vaughan-Nichols has bad news for us:

So we should all be concerned that Mark Cox, a Red Hat Distinguished Software Engineer and the Apache Software Foundation (ASF)’s VP of Security, this week tweeted, “OpenSSL 3.0.7 update to fix Critical CVE out next Tuesday 1300-1700UTC.”

How bad is “Critical”? According to OpenSSL, an issue of critical severity affects common configurations and is also likely exploitable. 

There isn’t enough detail yet to know exactly what the issue is. It’s forthcoming, however, so time to get those patch windows ready.

Comments closed

Choosing between Synapse Spark Notebooks or Job Definitions

Arun Sethia and Arshad Ali explain when you might use a Spark notebook versus a job definition:

Synapse Spark Notebook is a web-based (HTTP/HTTPS) interactive interface to create files that contain live code, narrative text, and visualizes output with rich libraries for spark based applications. Data engineers can collaborate, schedule, run, and test their spark application code using Notebooks. Notebooks are a good place to validate ideas and do quick experiments to get insight into the data. You can integrate the Synapse Notebook into Synapse pipeline.

The Notebook allows you to combine programming code with markdown text and perform simple visualizations (using Synapse Notebook chart options and open-source libraries). In addition, running code will supply immediate feedback, output, and progress tracking within Notebook.

Click through for the comparison.

Comments closed