The Central Limit Theorem

Vincent Granville explains the central limit theorem:

The theorem discussed here is the central limit theorem. It states that if you average a large number of well behaved observations or errors, eventually, once normalized appropriately, it has a standard normal distribution. Despite the fact that we are dealing here with a more advanced and exciting version of this theorem (discussing the Liapounov condition), this article is very applied, and can be understood by high school students.

In short, we are dealing here with a not-so-well-behaved framework, and we show that even in that case, the limiting distribution of the “average” can be normal (Gaussian.). More precisely, we show when it is and when it is not normal, based on simulations and non-standard (but easy to understand) statistical tests.

Read on for more details.

Azure Functions

Steph Locke has taken a shine to Azure Functions:

Azure Functions take care of all the hosting, all the retry logic, all the parallelisation, all the authentication gubbins, all the monitoring for you. The only bits of code you really have to write is the important stuff – the code that implements the business process. This makes a coding project go from >500 lines to <50, and it should be better quality too! This is super handy for data integration, and I would recommend it over and above Data Factory, unless you need to do some Hadoop stuff and maybe not even then.

The wag in me says that with F#, you could take it from 50 lines to 10…  Read the whole thing.

Passing Values To Bash

Steph Locke shows how to send input parameters to Bash scripts:

This is a very quick post on how you can make a bash script that allows you to provide it values via the command line. Passing values to a bash script uses a 1-based array convention inside the script, that are referenced by prefixing with $ inside the script.

This means that if I provide .\ value1 value2, inside the I can retrieve these by referencing their positions:

Read on for more information, including how to use named parameters.  Given that Bash is now officially supported in Windows 10 (well, in beta form), it might be worth checking that scripting language out.

MariaDB Now Commercial

Simon Phipps reports that MariaDB is now a commercial product:

MariaDB Corp. has announced that release 2.0 of its MaxScale database proxy software is henceforth no longer open source. The organization has made it source-available under a proprietary license that promises each release will eventually become open source once it’s out of date.

MaxScale is at the pinnacle of MariaDB Corp.’s monetization strategy — it’s the key to deploying MariaDB databases at scale. The thinking seems to be that making it mandatory to pay for a license will extract top dollar from deep-pocketed corporations that might otherwise try to use it free of charge. This seems odd for a company built on MariaDB, which was originally created to liberate MySQL from the clutches of Oracle.


Analytic Tool Usage

Alex Woodie notes the increased popularity of Python for data analysis:

According to the results of the 2016 survey, R is the preferred tool for 42% of analytics professionals, followed by SAS at 39% and Python at 20%. While Python’s placing may at first appear to relegate the language to Bronze Medal status, it’s the delta here that really matters.

It’s interesting to see the breakdowns of who uses which language, comparing across industry, education, work experience, and geographic lines.

Getting Pagination Wrong

Lukas Eder discusses common pagination issues:

If your data source is a SQL database, you might have implemented pagination by using LIMIT .. OFFSET, or OFFSET .. FETCH or some ROWNUM / ROW_NUMBER() filtering (see the jOOQ manual for some syntax comparisons across RDBMS). OFFSET is the right tool to jump to page 317, but remember, no one really wants to jump to that page, and besides, OFFSET just skips a fixed number of rows. If there are new rows in the system between the time page number 316 is displayed to a user and when the user skips to page number 317, the rows will shift, because the offsets will shift. No one wants that either, when they click on “next”.

Instead, you should be using what we refer to as “keyset pagination” (as opposed to “offset pagination”).

He also has a good explanation of the seek method.

I will throw in one jab at Oracle (because hey, it’s been a while since I’ve lobbed a bomb at Oracle on this blog):  it’d really suck to have a system where I legally wasn’t allowed to distribute relevant performance comparison benchmarks.  Fortunately, I tend to work on better data stacks.

Thinking Functionally With Scala

Kevin Jacobs solves a simple problem using Scala in a few ways and explains functional programming concepts along the way:

Why is this code better than the functional approach? Note that it saves an enormous amount of time since this approach does not need to scan through all the integers! It are simply a few calculations (at which a computer is good at). All the code (the naive approach and the better approach) can be found on GitHub.

Having a solid understanding of mathematics and logic can help you come up with superior algorithms, but make sure you comment them in detail so that the next dev (who might not understand the underpinnings of your code) doesn’t replace it with a brute-force method because it’s “easier.”


August 2017
« Jul