Press "Enter" to skip to content

Day: November 21, 2016

Scaling Kinesis Streams

Allan MacInnis shows how to scale Amazon Kinesis streams using the UpdateShardCount API call:

You also need to adjust the alarm threshold to accommodate for the new shard capacity automatically. For this example, update the alarm threshold to 80% of your new capacity (or 3200 records per second) by setting a CloudWatch alarm with an action to publish to a SNS topic when the alarm is triggered.

You can then create a Lambda function that subscribes to this SNS topic and executes a call to the new UpdateShardCount API operation while adjusting the CloudWatch alarm threshold. To learn how to configure a Cloudwatch alarm, see Creating Amazon Cloudwatch Alarms. For information about how to invoke a Lambda function from SNS, see Invoking Lambda Functions Using Amazon SNS Notifications.

This is pretty cool.

Comments closed

Pie Charts

Peter Ellis defends pie charts under very specific circumstances:

The usual response from statisticians and data professionals to pie charts ranges from lofty disdain to outright snobbery. But sometimes I think they’re the right tool for communication with a particular audience. Like others I was struck by this image from New Zealand news site stuff.co.nz showing that nearly half the earthquake energy of the past six years came in one day (last Sunday night, and the shaking continues by the way). Pie charts work well when the main impression of relative proportions to the whole is obvious, and fine comparisons aren’t needed.

Here’s my own version of the graphic. I polished this up during a break while working at home due to the office being shut for earthquake-related reasons:

Consider me in the lofty disdain camp.  That said, this is probably the best case scenario for a pie chart:  when looking at relative percentage of one dominant element versus the remaining set.

Comments closed

In-Memory Columnstore Updates

Niko Neugebauer looks at improvements in SQL Server 2016 SP1 with respect to adding columnstore indexes to memory-optimized tables:

With an incredible Service Pack 1 Microsoft has triumphantly announced that all editions (Standard, Web, Express and even Local) will get the most advanced programming capabilities of Columnstore, In-Memory, Database Snapshot, Compression, Partition & many others, plus that there are some incredible features for the T-SQL (CREATE OR ALTER) and Execution Plan details (Actual Information on processed Rows, used Trace Flags, etc),
but at the same time there are some quite important improvement under the hood that will make you want to use Service Pack 1 for SQL Server 2016 immediately. One of this features is the fast addition of the Columnstore Index to the Memory-Optimised tables. Let take it to the test by restoring a copy of the ContosoRetailDW free database:

These results look nice.

Comments closed

Two-Way T Tests

Mala Mahadevan shows how to write a two-way T test in R and T-SQL:

I can do the same calculation of T value using T-SQL. I cannot calculate p value from TSQL as that comes from a table, but it is possible to look it up. I imported the set of values into a table called WalkingSteps with two columns, walkerAsteps and walkerBsteps. For doing the math on T value the formula stated here may be useful. My T-SQL code is as below

The R code is a bit shorter, although the T-SQL code isn’t bad either.

Comments closed

The Risk Of Custom Indexes

David Klee shows the risk of creating custom indexes on third-party software:

The error file tells you specifically which indexes it does not like. The error file is found at:

C:\Users\(youraccount)\AppData\Local\Temp\vcsUpgrade\vcdb_req.err

You’ll find the error message towards the bottom of the document. My specific item was:

1 [42000](50000) [Microsoft][SQL Server Native Client 11.0][SQL Server]ERROR ! Extra indexes: VPX_EVENT.HFX_VPX_EVENT_Cover01; VPX_STAT_COUNTER.IX_VPX_STAT_COUNTER_STAT; VPX_TASK.HFX_VPX_TASK_Cover01;

For well-maintained third-party vendor software which doesn’t require you to add indexes to support the product at any scale beyond what a developer needs for basic testing, this isn’t an issue.  And if you ever find that piece of software, write the company a note of congratulations for being the first…

Comments closed

Solving Math Problems With T-SQL

Erik Darling shows the Collatz Conjecture in T-SQL:

The only problem in there that could really be written in SQL was the Collatz Conjecture. It states that when you take any number, and if it’s even you divide it by 2, and if it’s odd you multiply it by 3 and add 1, you’ll always eventually end up with 1. Fair enough. There’s probably a use for that out somewhere out there.

Writing it in SQL was super easy, of course. Just throw a CASE expression at it.

I think Erik is a little harsh on SQL as a mathematical language.  Yeah, there are other languages I’d go to first for many types of math problems (R, F#, and Python being examples), but SQL can solve quite a few mathematical problems.

Comments closed

R Visuals In Power BI

Ginger Grant discusses how to display R visuals in Power BI:

I hope that some day that this list becomes much longer, but it is a good start. If your company has lots R visuals and you wish to migrate them to Power BI, chances are some of the libraries you are using are not here. If you are interested in having your library added to the list of 352, go to the Ideas page of Power BI and request that your library be added, as Microsoft I know looks at this page to determine what to release in the future. Someone has requested that igraph be added, and since it hasn’t received a lot of votes yet (hint) it is probably low on the priority list.

Even so, this list does cover a lot of the most commonly used packages.

Comments closed

Memory-Optimized Table Types

Randolph West has a three-part series in which he looks at using memory-optimized table types and table valued parameters to increase application performance.  Part 1 introduces the concept:

In other words, for these short-lived temp tables, there’s not only an added benefit of no longer worrying about referring to tempdb..#table, but you also get a massive performance improvement as well.

Part 2 specifies the scale of performance improvements:

The test is very simple and makes use of a technique that one of my customers uses extensively: writing some rows to a temp table, so that another process can reuse those values as parameters, and perform an action, after which it destroys the temp table.

Unfortunately, the method my customer uses to populate a temp table, does not work on Azure SQL Database, because they explicitly refer to the temp tables with three-part naming (tempdb..#temptable), which is not permitted.

For the sake of the exercise, I will have a process that writes to a data structure and compare the times.

Part 3 repeats the test in Azure SQL Database:

I’m going to use the same WHILE loop again, but instead of a million runs, I’ll do 1000, 10,000 and 100,000, because I’m paying for this instance of Azure SQL Database (I picked a Premium P1, with 125 DTUs) and I’m a cheapskate. I doubt the 125 DTUs is even enough to run a million times for the fourth option.

Even in SQL Server 2014, this was a good use of In-Memory OLTP.  With the improvements in 2016, this becomes a viable option for a lot more workloads.

Comments closed