2021-04-15 – Curated SQL

Geospatial Fraud Detection

Published 2021-04-15 by Kevin Feasel

Antoine Amend uses Databricks to identify financial fraud in a geographical area:

As part of this real-world solution, we are releasing a new open source geospatial library, GEOSCAN, to detect geospatial behaviors at massive scale, track customers patterns over time and detect anomalous card transactions. Finally, we demonstrate how organizations can surface anomalies from an analytics environment to an online data store (ODS) with tight SLA requirements following a Lambda-like infrastructure underpinned by Delta Lake, Apache Spark and MLflow.

Click through for the article, as well as three notebooks.

Comments closed

A Modern C++ Kafka API

Published 2021-04-15 by Kevin Feasel

Kenneth Jia and Benedek Thaler announce an open source library:

Morgan Stanley uses Apache Kafka^® to publish market data to internal clients and to persist it for replay purposes. We started out using librdkafka’s C++ API, which maintains C++98 compatibility. C++ is evolving quickly, and we wanted to break away from this compatibility requirement so we could take advantage of new C++ features. This led us to create a new C++ API for Kafka that uses modern C++ features (i.e. C++14 and later). We’ve open sourced this client and hope you enjoy it.

Click through to learn more. What interests me about this is that most of the other languages’ support for Kafka (for example, .NET) is based off of librdkafka. I don’t know if there’s any benefit to moving to this new library.

Comments closed

Backtesting Options Strategies in R

Published 2021-04-15 by Kevin Feasel

Holger von Jouanne-Diedrich is in the money:

Options trading strategies are strategies where you combine, often several, derivatives instruments to create a certain risk-return profile (more on that here: Financial Engineering: Static Replication of any Payoff Function). Often we want to know how those strategies would fare in the real world.
The problem is that real data on derivatives are hard to come by and/or very expensive. But we help ourselves with a very good proxy: implied volatility which is freely available for example for many indices. With that, we can use the good old Black-Scholes model to reasonably price options whose strikes are not too far away from the current price of the underlying.

Read on to see how.

Comments closed

Another Number Series Generator Solution

Published 2021-04-15 by Kevin Feasel

Itzik Ben-Gan reviews Paul White’s solution to the number series generator challenge:

I love Paul White’s work. I keep being shocked by his discoveries, and wonder how the heck he figures out what he does. I also love his efficient and eloquent writing style. Often while reading his articles or posts, I shake my head and tell my wife, Lilach, that when I grow up, I want to be just like Paul.
When I originally posted the challenge, I was secretly hoping that Paul would post a solution. I knew that if he did, it would be a very special one. Well, he did, and it’s fascinating! It has excellent performance, and there’s quite a lot that you can learn from it. This article is dedicated to Paul’s solution.

This is an elite pair-up and well worth your time to review in detail.

Comments closed

Updating or Creating a Measure for Power BI PPU without Full Publication

Published 2021-04-15 by Kevin Feasel

Gilbert Quevauvilliers follows up:

Following on from my previous blog post How to complete granular deployment of Power BI Desktop changes to the Power BI Service (Using PPU), I want to also show how to update or create a measure in my dataset, where I can deploy this via ALM Toolkit.
This now saves me from doing the following tasks previously:
– Time taken to refresh the PBIX file so that the data is up to date.
– Re-uploading my PBIX.
– If configured re-creating the incremental refreshing
– Time and effort to upload and wait for dataset refresh.
– Quick updates to my dataset.
I will not have to worry about saving my PBIX file, file and if configured re-creating the incremental refreshing. This saves me a lot of time and effort.

Read on to see how.

Comments closed

Reading from Oracle without the Oracle Client

Published 2021-04-15 by Kevin Feasel

Emanuele Meazzo was in a bind:

If you read my previous article on how to configure a Linked server to Oracle , you know that I feel like someone is plotting to keep the topic of how to get our precious data outside of the Oracle ecosystem as obscure as possible out of the oracle circle
Fear not! I’m here to get you all the info in order to get data from Oracle Database via Powershell, in a native high-performance way, allowing you to create a multithreaded, reliable and connected ETL flow to feed a data warehouse from Oracle data, like the owner of this blog, or just move quickly some data from one place to another, programmatically, whatever it’s your need.

I support any effort to get data away from Oracle. Click through for the script and an explanation of each step.

Comments closed

More About Default Parameter Values in Powershell

Published 2021-04-15 by Kevin Feasel

Jeffrey Hicks follows up on a prior post:

Last week I shared a little nugget about making PowerShell life easier by using the built-in variable, $PSDefaultParameterValues. This is a special hashtable where you can define default parameter values for any PowerShell command. This means any PowerShell script or function that has defined parameters AND uses [cmdletbinding()]. If you have a simple function that only uses a Param() block, $PSDefaultParameterValues won’t work for that function. I recommend that all PowerShell functions use [cmdletbinding()] so hopefully, this won’t be an issue. Since I received some positive feedback and a few questions on the previous post, I thought a quick follow-up might be in order.

Read on for additional useful information regarding $PSDefaultParameterValues.

Comments closed

Fun with Multiple Indexes

Published 2021-04-15 by Kevin Feasel

Erik Darling makes a fairly rare multi-index sighting:

Notice! Both of our nonclustered indexes get used, and without a lookup are joined together.
Because the predicates are of the inequality variety, the join columns are not ordered after each seek. That makes a hash join the most likely join type.

I’ve always had this belief that there are probably more cases in which multi-index solutions are useful than the SQL Server optimizer gives us. This belief may be irrational.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

Day: April 15, 2021