2020-03-19 – Curated SQL

The initial go-to metric for understanding a regression model is the R squared (or R²) value, also known as the coefficient of determination. R squared measures how well the model is fitted to the data – the goodness of fit. It indicates how much of the variation of y (the target) is explained by the variation in x (the features).

The measures are bog standard if you’ve worked with regressions before, and Dan does a good job explaining them.

Comments closed

Ensuring Ordering of Kafka Data

Published 2020-03-19 by Kevin Feasel

Zeke Dean takes us through an explanation of Kafka partitioning and message delivery behaviors:

Apache Kafka provides developers with a uniquely powerful, open source and versatile distributed streaming platform – but it also has some rather complex nuances to understand when trying to store and retrieve data in your preferred order.
Kafka captures streaming data by publishing records to a category or feed name called a topic. Kafka consumers can then subscribe to topics to retrieve that data. For each topic, the Kafka cluster creates and updates a partitioned log. Kafka send all messages from a particular producer to the same partition, storing each message in the order it arrives. Partitions therefore function as a structure commit log, containing sequences of records that are both ordered and immutable.

Click through for the explanation.

Comments closed

Cosmos DB Changes

Published 2020-03-19 by Kevin Feasel

Hasan Savran hits on a few changes to Cosmos DB:

Free Tier of Cosmos DB
Couple of big changes came from Azure Cosmos Db team this week. First one is, free tier of Cosmos DB. In free tier, you get the first 400 Request Units and 5 GB of storage for free for the lifetime of the account. This is a great deal if you like to use Azure Cosmos DB in your new project. Give it a try, It’s free! All you need to do is click on Apply for Apply Free Tier Discount.

This is a big one, but there are a few other interesting changes in there as well.

Comments closed

Issue with PolyBase and Cosmos DB

Published 2020-03-19 by Kevin Feasel

I found an issue with connecting to Cosmos DB from PolyBase after installing SQL Server 2019 CU2:

After upgrading to SQL Server 2019 CU2, I noticed some issues when trying to connect to a Cosmos DB collection via PolyBase. Specifically, I started getting the following error message:
Msg 105082, Level 16, State 1, Line 35
105082;Generic ODBC error: [Microsoft][MongoDBODBC] (110) Error from MongoDB Client: Server at <<my Cosmos account name>>.documents.azure.com:10255 reports wire version 2, but this version of libmongoc requires at least 3 (MongoDB 3.0) (Error Code: 15) Additional error <2>: ErrorMsg: [Microsoft][MongoDBODBC] (110) Error from MongoDB Client: Server at <<my Cosmos account name>> .documents.azure.com:10255 reports wire version 2, but this version of libmongoc requires at least 3 (MongoDB 3.0) (Error Code: 15), SqlState: HY000, NativeError: 110 .

Read on for a couple attempts at a solution and some more detail.

Comments closed

Fun with Deadlocks

Published 2020-03-19 by Kevin Feasel

Jana Sattainathan diagnoses a deadlocking issue:

We know what deadlocks are and some of the common reasons they happen. If you need a refresher, I recommend this good article. I am not going to rehash what has already been said although these high level points are worth noting to resolve them:
1) Examine known Parallelism (where you have parallelized jobs)
2) Examine unknown Parallelism (unknown jobs or users interfere with your jobs in parallel)
3) Arrange order of tables doing DML to be the same across all code. E.g., Always Customers first, Orders second, OrderDetails third.
4) Examine the indexes on the affected tables to eliminate full-table scans
5) Reduce the amount of time spent in a transaction
6) Update in chunks especially if updating/deleting across sessions
7) Avoid RBAR (Row By Agonizing Row) CRUD operations! Do statement based mass changes.

Read on to understand Jana’s situation and solution.

Comments closed

Interpreting a Key Lookup

Published 2020-03-19 by Kevin Feasel

Erik Darling dives into a weird key lookup:

This lookup doesn’t produce any rows or columns, that’s why there are 0.0 rows per iteration.
It’s purely to filter data out, and it does that. Slowly.
Look, I’m not defending the choice, I’m just using it to teach you something.

Read on for your moment of understanding.

Comments closed

Transforming JSON to CSV with Azure Data Factory

Published 2020-03-19 by Kevin Feasel

Rayis Imayev shows how you can use the Flatten task in Azure Data Factory to convert JSON text to CSV:

What this new task does it helps to transform/transpose/flatten your JSON structure into a denormalized flatten datasets that you can upload into a new or existing flat database table.
I like the analogy of the Transpose function in Excel that helps to rotate your vertical set of data pairs (name : value) into a table with the column names and values for corresponding objects. And when this vertical JSON structural set contains several similar sets (array) then ADF Mapping Data Flows Flatten does a really good job by transforming it into a table with several rows (records).

Click through for a demonstration.

Comments closed

Determining if Queries are Hitting Power BI Aggregations

Published 2020-03-19 by Kevin Feasel

Patrick LeBlanc has a video for us:

Are you not sure if you are hitting your Power BI Aggregations? Patrick shows you some tools you can use to verify if they are being used.

Click through for the video.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Day: March 19, 2020

Evaluating Regression Models in Azure ML

Ensuring Ordering of Kafka Data

Cosmos DB Changes

Issue with PolyBase and Cosmos DB

Fun with Deadlocks

Interpreting a Key Lookup

Transforming JSON to CSV with Azure Data Factory

Determining if Queries are Hitting Power BI Aggregations