Curated SQL – Page 508 – A Fine Slice Of SQL Server

How It Works: Power BI Field Parameters Edition

Published 2022-06-01 by Kevin Feasel

Gilbert Quevauvilliers figures out how field parameters work:

In this blog post I want to give a visual representation as to how field parameters works and what the current limitations are.
It is important to be aware of the limitations so that you do not get caught out later or you are trying to figure out why it is not working.
I do hope my descriptions and pictures below help you understand how it works and when it does not work!

Click through for some detailed graphics and explanation.

Comments closed

Loading JSONB into Postgres via Azure Data Factory

Published 2022-06-01 by Kevin Feasel

Rayis Imayev is slinging JSON:

Requirements:
1. Sourcing data comes from a SQL Server database
2. The destination is a PostgreSQL database table
3. Transformation logic is to aggregate several rows from a sourcing table and populate the resulting JSON structured document into a single row JSONB type column

Read on for Rayis’s notes.

Comments closed

Plotting Monaco in Postgres

Published 2022-05-31 by Kevin Feasel

Mark Litwintschik does some plotting:

In this post, I’m going to extract Monaco’s road network data from OpenStreetMap, import it into PostgreSQL and render it out using a minimalist tile server. I’ll also show how the Formula 1 circuit can be highlighted using an open source geospatial desktop application.

Click through for the code and the end result in pg_tileserv and QGIS.

Comments closed

Monitoring Streaming Queries in PySpark

Published 2022-05-31 by Kevin Feasel

Hyukjin Kwon, et al, lay out some monitoring advice:

Streaming is one of the most important data processing techniques for ingestion and analysis. It provides users and developers with low latency and real-time data processing capabilities for analytics and triggering actions. However, monitoring streaming data workloads is challenging because the data is continuously processed as it arrives. Because of this always-on nature of stream processing, it is harder to troubleshoot problems during development and production without real-time metrics, alerting and dashboarding.

Read on to see how you can use the Observable API for alerting in PySpark—previously, it had been a Scala-only API.

Comments closed

Projecting (Selecting) Results with KQL

Published 2022-05-31 by Kevin Feasel

Robert Cain continues a series on the Kusto Query Language:

So far in my Fun With KQL series, we have used the column tool, found on the right side of the output pane and described in my original post Fun With KQL – The Kusto Query Language, to arrange and reduce the number of columns in the output.
We can actually limit the number of columns, as well as set their order, right within our KQL query. To accomplish this we use the project operator.

Read on for several good uses of the project operator.

Comments closed

Distributed Transactions in T-SQL

Published 2022-05-31 by Kevin Feasel

Kevin Wilkie explains what distributed transactions are and why you probably don’t want to use them:

In the version of transactions that we going to discuss today, we’re going to discuss doing transactions on multiple servers!
A Distributed transaction is defined by HazelSet to be “a set of operations on data that is performed across two or more data repositories”. In even simpler terms, it’s a command run against data on more than one server.

Click through for the warnings about what might possibly go wrong.

Comments closed

Fun with Nested Loops

Published 2022-05-31 by Kevin Feasel

Jared Poche explains my favorite type of join:

Nested loops joins are the join operator you are likely to see the most often. It tends to operate best on smaller data sets, especially when the first of the two tables being joined has a small data set.
In row mode, the first table returns rows one at a time to the join operator. The join operator then performs a seek\scan against the second table for each row passed in from the first table. It searches that table based on the data provided by the first table, and the columns defined in our ON or WHERE clauses.

Read on for more information about nested loop joins.

Comments closed

Finding Duplicates in Type 2 SCDs

Published 2022-05-31 by Kevin Feasel

Dinesh Asanka wants to verify some Type 2 slowly changing dimension results:

As we discussed in a previous article, Implementing Slowly Changing Dimensions (SCDs) in Data Warehouses, there are three main types of slowly changing dimensions, such as Type 1, Type 2, and Type 3. Out of these Type 1 is the simple dimension where you will simply maintain only the latest version of the attribute. For example, if the employee got promoted to Senior Software Engineer from Software Engineer, you will simply overwrite the existing value to the new value so that the historical aspect is lost.
Type 2 Slowly Changing Dimensions are used to track historical data in a data warehouse. This is the most common approach in dimension. This article uses a sample database of AdventureworksDW which is the sample database for the data warehouse.

Click through for one way to compare, one which you could build using dynamic SQL.

Comments closed

When to Use a Map Visual

Published 2022-05-30 by Kevin Feasel

Mick Cisneros explains when to use map visuals:

That ubiquity has given all of us an increased familiarity with maps, as well as a deeper affinity for them. (Probably a dependence as well!) It’s natural, then, to want to use a map to visualize data that has a geographic dimension. Why not, right? There is an obvious upside: audiences are drawn to the way they look, as it’s a more memorable image than the same old bar chart or line graph. Not to mention: it’s fun to make maps!
The problem is that maps look interesting, but their very nature limits our options for visualizing data within them. Per a recent paper by Franconeri, Padilla, Shaw, et. al., here are a couple of the comparisons that people are very good at making, perceptually:

Read on for a comparison of good map versus bad map. Just because something has a geographical component doesn’t mean you should map it.

Comments closed

Model Deployment Options in Azure

Published 2022-05-30 by Kevin Feasel

Tori Tompkins enumerates ways to deploy machine learning models in Azure:

There are so many options to deploy models in Azure that is can get quite overwhelming. In this blog, we break down all the available options and consider the pros and cons of each tooling option.

Even with those, there are other approaches as well, like hosting Spark-based models in Azure Synapse Analytics, using SQL Server Machine Learning Services on an Azure SQL Managed Instance or VM running SQL Server, etc.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts