Stream-To-Stream Joins In Spark

Ayush Tiwari shows how to join a pair of streams in Apache Spark 2.3:

In Spark 2.3, it added support for stream-stream joins, i.e, we can join two streaming Datasets/DataFrames and in this blog we are going to see how beautifully spark now give support for joining the two streaming dataframes.

I this example, I am going to use

Apache Spark 2.3.0
Apache Kafka 0.11.0.1
Scala 2.11.8

Click through for the demo.

Probabilities And Poker

Steve Miller has a notebook on 5-card draw probabilities:

The population of 5 card draw hands, consisting of 52 choose 5 or 2598960 elements, is pretty straightforward both mathematically and statistically.

So of course ever the geek, I just had to attempt to show her how probability and statistics converge. In addition to explaining the “combinatorics” of the counts and probabilities, I undertook two computational exercises. The first was to delineate all possible combinations of 5 card draws from a 52 card deck, counting occurrences of relevant combinations such as 2 pair, a straight, or nothing in a cell loop.

Steve has made his notebook available for us.

Picking An Azure SQL Database Tier

Kevin Feasel

2018-05-25

Cloud

Esat Erkec has various methods you can use to figure out your Azure SQL Database tier:

When we are beginning to think of migrating our on-premises databases to Azure SQL, we have to decide on a proper purchase model, a service tier, and a performance level. Before starting the Azure SQL migration process, we have to find logical and provable answers to the following questions:

  • Which purchase model is suitable for my apps and business requirements?
  • How much budget do I need?
  • Which performance level meets my requirements?
  • Can I achieve the acceptable performance of my apps?

The purchase model and service tiers will certainly affect the Azure bills. On the other hand, we have to achieve the maximum performance with the selected service tier.

It’s a good article with helpful tips for people thinking of moving to Azure SQL Database.

Gartner’s Cloud IaaS Magic Quadrant Changes

Kevin Feasel

2018-05-25

Cloud

Bruno Aziza analyzes Gartner’s Magic Quadrant for Cloud Infrastructure as a Service offerings:

The first and most drastic change that occurred over the last year is the number of players that Gartner decided to highlight in its report: the number of vendors went from 14 to just 6 this year.  

Why is that?! Have the big become bigger and the small smaller?! Or has the space shrunk?   The latter is highly improbable.  All the contrary: earlier last year, forecasted that the highest growth in the cloud market would be coming from the sector this MQ covers: Gartner predicted that the cloud system infrastructure services would grow over 36% to reach $34B+ in 2017.

So, what gives?!

Read on to learn what gives.  As far as the rankings themselves go, I think it’s reasonable:  AWS and Azure can generally go head-to-head on features though Amazon does have the advantage.  Google is a distant third and the rest aren’t major players.

Bitwise Logic To Make Values Negative

Kevin Feasel

2018-05-25

T-SQL

David Fowler is working with arcane magic:

The short answer to the question is to simply apply a bitwise NOT to the number that you want to convert and add 1 to the result.  I wrote about bitwise logic in The Arcane Science of Bitwise Logic and SQL Server but didn’t mention the bitwise NOT.

If you want to apply a bitwise NOT to a value, simply use the ~

Give it a go, try converting 10 to a negative,

SELECT ~ 10 + 1

The result, -10.

One of my rules of thumb (given how many I have, I need more thumbs) is that if you’re using bitwise logic in SQL Server, something has likely gone wrong somewhere along the way.  But read David’s explanation, which is quite clear.

What TDE Does To Query Performance

Matthew McGiffen has a few tests on using Transparent Data Encryption:

By the time it had been executed 5 times (with the memory flushed between each execution) each query read about 600,000 pages sized at 8kb each – just under 5GB. If it took 50 seconds on the decryption of those pages, then each page took about 1 twelfth of a milli-second to decrypt – or alternatively, TDE decrypted about 12 pages per millisecond. Or in terms of disk size, 100MB per second. These were tests on a server with magnetic spinning disks (not SSDs) and you can see from the above figures, the straight disk access took about 40 seconds on its own.

When TDE doesn’t read from disk it doesn’t add any overhead, but how do we quantify what the overhead to queries is when it does have to access the disk?

Matthew has some good advice here, and I’d be willing to say that his experience is within the norm for TDE and doesn’t directly contradict general guidelines by enough to shift priors.

Technical Debt, T-SQL Business Rules Edition

Paul Turley tells a story of technical debt:

They’ve been writing reports using some pretty complicated SQL queries embedded in SSRS paginated reports.  Every time a user wants a new report, a request is sent to the IT group.  A developer picks up the request, writes some gnarly T-SQL query with pre-calculated columns and business rules.  Complex reports might take days or weeks of development time.  I needed to update a dimension table in the data model and needed a calculated column to differentiate case types.  Turns out that it wasn’t a simple addition and his response was “I’ll just send you the SQL for that…you can just paste it”.  The dilemma here is that all the complicated business rules had already been resolved using layers of T-SQL common table expressions (CTEs), nested subqueries and CASE statements.  It was very well-written SQL and it would take considerable effort to re-engineer the logic into a dimensional tabular model to support general-use reporting.  After beginning to nod-off while reading through the layers of SQL script, my initial reaction was to just paste the code and be done with it.  After all, someone had already solved this problem, right?

It’s the persistent battle between “don’t fix what isn’t broken” and “the process is broken, even if the code isn’t.”

Deplaying A Container On Azure

Andrew Pruski shows us how to deploy an Azure Container Instance:

The code should be fairly self explanatory. I’m using the username and password created earlier to access the ACR and am then spin up a container from the sqlserverlinuxagent:latest image. The container has 2 CPUs and 4GB of memory available to it and it will be listening on a public IP address on port 1433 (be very careful with this).

At the time of writing, the only option available for ip-address is public, hopefully further options will be available soon. I will update this blog if/when that happens.

Read on for a demo.

Categories

May 2018
MTWTFSS
« Apr Jun »
 123456
78910111213
14151617181920
21222324252627
28293031