December 2017 – Page 16

Protecting RDP With SSH

Published 2017-12-04 by Kevin Feasel

Chrissy LeMaire has a two-parter on enabling SSH tunneling on Windows 10. First, if you are using the Fall Creators Update:

Gotta say I’m super thankful for Chris K’s blog post “Enabling the hidden OpenSSH server in Windows 10 Fall Creators Update (1709) — and why it’s great!“, otherwise this would have taken me far longer to figure out.

So next, Run PowerShell As Administrator, then generate a key.

cd C:\windows\system32\OpenSSH
ssh-keygen -A

Alternatively, if you are not using the Fall Creators Update:

First, bash for Windows must be setup. This requires Windows 10 or Windows Server 2016.

Note: this was written for Windows 10 pre-1709. Apparently, the new update contains a ton of changes. Developer mode is not required and you install your Linux distro from the Windows Store. Seems that it may even include Open SSH right out the box. I’ll test on Tuesday and let you all know. Till then, here is how to do it if you’ve got Windows 10 without Fall Creators Update (FCU).

Doing this limits the ability of an attacker to snoop on your RDP traffic.

Comments closed

Updating Statistics On System Tables

Published 2017-12-04 by Kevin Feasel

Dan Guzman shows that you can update statistics on system tables:

The solution in many cases to simply update statistics on the underlying system tables indexes used by the problem DMV queries. This can be done selectively by identifying the system table indexes referenced in execution plan seek and scan operators of the problem query execution plan and then executing UPDATE STATISTICS on each index. However, the task is somewhat tedious.

Alternatively, one can simply update stats on all the system tables. Below is a script that generates and executes DDL to update stats on all non-empty system tables, making quick work of this.

Click through for the script, as well as an interesting note if you try to use constructs like @sql = @sql + N’some string’ in your code.

Comments closed

Calculating Azure VM Disk Throughput

Published 2017-12-04 by Kevin Feasel

Ed Elliott has built a nice calculator:

What does it do?

You choose how many volume you want, whether you want caching or not and then how many disks choosing the amount and type of disks and then the calculator will tell you what your max throughput is and also when you have exceeded the maximum throughput of the virtual machine.

Good stuff. Check out his calculator and associated FAQ.

Comments closed

Reporting Services Project Gymnastics

Published 2017-12-04 by Kevin Feasel

Nate Johnson had a bad experience with Visual Studio-based SQL Server Reporting Services projects:

So, what have we learned? Well, for one, this is a crappy situation born of poor in-product support. I should be able to configure Solution-level shared Data Sources, use them in as many Projects (within said Solution) as I want, and have VS configuration management support them; bonus points for doing so with saved & encrypted credentials. Ideally, when we check this into source-control, we’d check in the “DEV” environment flavor connection-configs. Then, when the reports get deployed to the “PROD” SSRS server, the same globally shared Data Sources are already present (and they don’t get over-written, thankfully by default!), configured by the DBA with prod credentials, and nobody in the development pipeline needs to know said credentials. Yay?

Not exactly a ringing endorsement.

1 Comment

Don’t Forget NOCOUNT

Published 2017-12-04 by Kevin Feasel

Lonny Niederstadt shows just how expensive printing out result counts can be:

OK. Now Aaron Bertrand has a post from February 2016…
Performance Surprises and Assumptions : SET NOCOUNT ON
https://sqlperformance.com/2016/02/t-sql-queries/nocount

In that blog post the potential performance benefit of NOCOUNT ON was elusive. And, to be honest, this round of my tests was NOT to learn about NOCOUNT but other stuff. Just happened to stumble on NOCOUNT when I was sifting through stuff after some early testing. But with these results in hand, maybe NOCOUNT has a few more performance surprise for all of us 🙂

For a single run, printing out counts isn’t that expensive, but when doing a lot of work, it can add up.

Comments closed

Working With Lists In Python

Published 2017-12-01 by Kevin Feasel

Jean-Francois Puget shows that iterating through lists in a for loop is not always the most efficient for Python code:

The above code is correct, but it is inefficient. List comprehensions are meant for case like this. The idea of comprehension is to move the for loop inside the construction of the result list:
def get_square_list(x):
    return [xi*xi for xi in x]
This is both simpler and faster than the previous code.

Definitely worth reading if you come at Python from a C-like language background.

Comments closed

Caching Strategy

Published 2017-12-01 by Kevin Feasel

Kevin Gessner explains some caching concepts used at Etsy:

A major drawback of modulo hashing is that the size of the cache pool needs to be stable over time. Changing the size of the cache pool will cause most cache keys to hash to a new server. Even though the values are still in the cache, if the key is distributed to a different server, the lookup will be a miss. That makes changing the size of the cache pool—to make it larger or for maintenance—an expensive and inefficient operation, as performance will suffer under tons of spurious cache misses.

For instance, if you have a pool of 4 hosts, a key that hashes to 500 will be stored on pool member 500 % 4 == 0, while a key that hashes to 1299 will be stored on pool member 1299 % 4 == 3. If you grow your cache by adding a fifth host, the cache pool calculated for each key may change. The key that hashed to 500 will still be found on pool member 500 % 5 == 0, but the key that hashed to 1299 be on pool member 1299 % 5 == 4. Until the new pool member is warmed up, your cache hit rate will suffer, as the cache data will suddenly be on the ‘wrong’ host. In some cases, pool changes can cause more than half of your cached data to be assigned to a different host, slashing the efficiency of the cache temporarily. In the case of going from 4 to 5 hosts, only 20% of cache keys will be on the same host as before!

It’s interesting reading.

Comments closed

Network And Sankey Diagrams In Python And R

Published 2017-12-01 by Kevin Feasel

Tony Hirst has a roundup of various R and Python packages which build network charts or Sankey diagrams:

Another way we might be able to look at the data “out of time” to show flow between modules is to use a Sankey diagram that allows for the possibility of feedback loops.

The Python sankeyview package (described in Hybrid Sankey diagrams: Visual analysis of multidimensional data for understanding resource use looks like it could be useful here, if I can work out how to do the set-up correctly!

Sankey diagrams are on my list of dangerous visuals: done right, they are informative, but it’s easy to try to put too much into the diagram and thereby confuse everybody.

Comments closed

Predicting Restaurant Reservations With A Neural Net

Published 2017-12-01 by Kevin Feasel

Kevin Jacobs builds a simple neural net using Pandas and sklearn:

The first thing to notice is that our values are not normalized. The number of visitors is a number and gets larger and larger. To normalize it, we simply divide it by 100, since all numbers are below 1. The same holds for the lag. Most of the lags are lower than 30. Therefore, I will divide the lag size by 30.

Notice that there are many more approaches for normalizing the data! This is just a quick normalization on the data, but feel free to use your own normalization method. My normalization process is closely related to the MinMaxScalar normalization which can be found in sklearn (scikit-learn).

With just a few lines of Python code we can create a Multi-Layer Perceptron (MLP):

Click through for the code.

Comments closed

“Caveman” Graphs In SQL

Published 2017-12-01 by Kevin Feasel

Denis Gobo puts together some basic Management Studio data visualization:

I found this technique on Rich Benner’s SQL Server Blog: Visualising the Marvel Cinematic Universe in T-SQL and decided to play around with it after someone asked me to give him the sizes of all databases on a development instance of SQL Server

The way it works is that you take the size of the database and then divide that number against the total size of all databases. You then use the replicate function with the | (pipe) character to generate the ‘graph’ so 8% will look like this ||||||||

You can use this for tables with most rows, a count per state etc etc. By looking at the output the graph column adds a nice visual effect to it IMHO

It does the job and doesn’t require you to go out to a different product, so it works pretty well for occasional administrative queries.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Month: December 2017

What does it do?