Press "Enter" to skip to content

Day: June 6, 2022

Knowledge Graphs, Data Fabrics, and Data Meshes

Alan Morrison describes the history of three concepts:

By 2014, SAP was using “in-memory data fabric” to describe a virtual data warehouse, a key element of its HANA “360-degree customer view” product line. Gartner for its part uses the term “data fabric” to this day as an all-encompassing means of heterogeneous data integration. From a 2021 post on data fabric architecture: 

Read on for a high-level discussion of what each is and how it fits into the context of data warehouses and data lakes.

Comments closed

Processing Security Logs in Databricks with Delta Live Tables

Silvio Fiortio ingests some data:

Databricks recently introduced Workflows to enable data engineers, data scientists, and analysts to build reliable data, analytics, and ML workflows on any cloud without needing to manage complex infrastructure. Workflows allows users to build ETL pipelines that are automatically managed, including ingestion, and lineage, using Delta Live Tables. The benefits of Workflows and Delta Live Tables easily apply to security data sources, allowing us to scale to any volume or latency required for our operational needs.

In this article we’ll demonstrate some of the key benefits of Delta Live Tables for ingesting and processing security logs, with a few examples of common data sources we’ve seen our customers load into their cyber Lakehouse.

Click through to learn more.

Comments closed

Ingesting Event Hub Telemetry Data with PySpark Streaming

Charles Chukwudozie shows how to read from Event Hubs in Databricks with Python:

Ingesting, storing, and processing millions of telemetry data from a plethora of remote IoT devices and Sensors has become common place. One of the primary Cloud services used to process streaming telemetry events at scale is Azure Event Hub.

Most documented implementations of Azure Databricks Ingestion from Azure Event Hub Data are based on Scala.

So, in this post, I outline how to use PySpark on Azure Databricks to ingest and process telemetry data from an Azure Event Hub instance configured without Event Capture.

Click through for the process.

Comments closed

Making the Case for Azure Analysis Services

Teo Lachev makes the case:

Microsoft BI practitioners have three options for hosting semantic models: SSAS (on prem), Azure Analysis Services (cloud), and Power BI (cloud). AAS is somewhat caught between a rock and a hard place. Given that Power BI gets the most attention for cloud deployment, why would you consider AAS at all? There are two main reasons:

Read on for the reasons. Knowing how much it does cost, it almost feels like trying to thread a needle: if you don’t spent enough or have enough data, Power BI is typically much more efficient; if you have sufficient data, I’d want to do a proper cost analysis between on-prem (or IaaS) Analysis Services and Azure Analysis Services.

Comments closed

Finding Queries with Missing Index Requests in SQL Server 2019

Erik Darling shows off a nicety in SQL Server 2019 and later:

Note that this script does not assemble the missing index definition for you. That stuff is all readily available in the query plans that get returned here, and of course the missing index feature has many caveats and limitations to it.

You should, as often as possible, execute the query and collect the actual execution plan to see where the time is spent before adding anything in.

Read on for the script.

Comments closed

Calculating Weekdays with M

Kristyna Hughes calculates weekdays on the fly:

Knowing the days between events is a fairly common reporting request because a lot of reporting is created to track SLA’s (service level agreement) and other KPI’s (key performance indicators). While getting the days between two dates is fairly easy to achieve, they tend to follow up and ask how many week days there are between two timed events. For example, one company may have a SLA to ship an order within three week days of the order being placed or else a discount is applied to the order. In this case, I would highly recommend that the company have software that calculates these days in the background and stores the actual week days between order date and ship date in a database. Unfortunately, many companies create policies like this without considering future reporting needs and these values have to be calculated on the backend.

Click through for the code but be sure to read the note that this is all weekdays, including holidays.

Comments closed