Press "Enter" to skip to content

Day: January 22, 2020

Fraud Detection with Flink

Alexander Fedulov gives us a case study of using Apache Flink for fraud detection:

In this blog post, we have discussed the motivation behind supporting dynamic, runtime changes to a Flink application by looking at a sample use case – a Fraud Detection engine. We have described the overall architecture and interactions between its components as well as provided references for building and running a demo Fraud Detection application in a dockerized setup. We then showed the details of implementing a dynamic data partitioning pattern as the first underlying building block to enable flexible runtime configurations.

To remain focused on describing the core mechanics of the pattern, we kept the complexity of the DSL and the underlying rules engine to a minimum. Going forward, it is easy to imagine adding extensions such as allowing more sophisticated rule definitions, including filtering of certain events, logical rules chaining, and other more advanced functionality.

It was an interesting discussion and you can grab the code as well.

Comments closed

Customizing Your Rprofile

Colin Gillespie shows how you can customize R via the .Rprofile file:

Every time R starts, it runs through a couple of R scripts. One of these scripts is the .Rprofile. This allows users to customise their particular set-up. However, some care has to be taken, as if this script is broken, this can cause R to break. If this happens, just delete the script!

Full details of how the .Rprofile works can be found in my book with Robin on Efficient R programming. However, roughly R will look for a file called .Rprofile first in your current working directory, then in your home area. Crucially, it will only load the first file found. This means you can have per project Rprofile.

Click through for a sample R profile which has a lot going on.

Comments closed

Simulating Feller’s Coin-Tossing Puzzle in R

David Robinson has another fun puzzle:

Mathematician William Feller posed the following problem:

If you flip a coin times, what is the probability there are no streaks of heads in a row?

Note that while the number of heads in a sequence is governed by the binomial distribution, the presence of consecutive heads is a bit more complicated, because the presence of a streak at various points in the sequence isn’t independent

Click through for a solution in R.

Comments closed

Against Citizen Data Scientists

Bill Schmarzo doesn’t like the idea of “citizen data scientists” very much:

“Hello,” he says. “My name is Dr. Payne and I am your Citizen Dentist for today.”

Citizen Dentist?! You repeat the question out loud for him to hear, want an answer to this looney statement. “What is a Citizen Dentist?”

Get this. He replies, “I’m a person who performs dental work, but my proficiency and expertise is outside of the field of dentistry.”

Bill’s alternative is “Citizens of Data Science.” Click through to see what that means and how it differs.

Comments closed

Strongly Type Table-Valued Parameters

Jonathan Kehayias shows the benefits of using the MaxLength parameter when calling a table-valued parameter from .NET code:

We can see that the MaxLength for the string columns is set at -1, meaning they are being passed over TDS to SQL Server as LOBs (Large Objects) or essentially as MAX datatyped columns, and this can impact performance in a negative manner. If we change the .NET DataTable definition to be strongly-typed to the schema definition of the user-defined table type as follows and look at the MaxLength of the same column using a debug break:

This can be important, especially if you make a lot of calls or use fairly large TVP sizes.

Comments closed

Calculating Compound Interest in DAX

Marco Russo and Alberto Ferrari want you to watch your money grow:

Coincidentally, both debt instrument examples are what is known as “bullet” loans, where the entire principal amount ($100) is repaid in one lump sum at maturity (at the end of Year 5). In the first example the interest income payments are deferred until maturity, thereby allowing the interest to compound over the holding period.  In the second example, the interest income payments are made at the end of each year, which means that the amount of debt accruing interest each year is always the same ($100).

Now let us consider a slightly more complex investment with compounding interest where the interest rate differs year-to-year. Because the interest rate varies, you can’t use the simple formula above (or its FV function equivalent in Excel). Rather, you must effectively stack each year on top of the preceding year and calculate year-by-year.

And that’s something you can do with DAX.

Comments closed

Contrasting TVPs and Memory-Optimized TVPs

Denis Gobo wants to see what memory-optimized table-valued parameters are good for:

The other day I was thinking about the blog post Faster temp table and table variable by using memory optimization I read a while back. Since you can’t believe anything on the internets (no disrespect to whoever wrote that post) , I decided to take this for a test

In this post I will be creating 2 databases, one is a plain vanilla database and the other, a database that also has a file group that contains memory optimized data

I will also be creating a table type in each database, a plain one and a memory optimized one in the memory optimized database

Read on for Denis’s findings.

Comments closed

Deploy a Big Data Cluster to a Single-Node kubeadm Cluster

Mohammad Darab shows how to build out a single-node Big Data Cluster on-premises:

This blog post will walk you through deploying a SQL Server Big Data Cluster on a single node Kubernetes cluster. You can install a Big Data Cluster on a physical machine or a virtual machine. Whatever option you choose must have the below minimum requirements:

– 8 cpu
– 64 GB RAM
– 100 GB disk space

Read on for instructions, or check out Mohammad’s video on the topic.

Comments closed

New Database-Scoped Configurations in SQL Server 2019

Niko Neugebauer looks at database-scoped configuration settings in SQL Server 2019:

Looking at the picture on the left you can see the Database Scoped Configurations available in Sql Server 2019 that I took at the end of December 2019. Mainly we can see a difference of 2 new items in the configurations since the end of the last year – the VERBOSE_TRUNCATION_WARNINGS & LAST_QUERY_PLAN_STATS.

Niko explains what these are and also takes a look at offerings in Azure SQL Database.

Comments closed