Author: Kevin Feasel

Optimizing Apache Flink

Published 2017-10-19 by Kevin Feasel

Ivan Mushketyk has a few tips for speeding up programs using Apache Flink:

One more way to optimize your Flink application is to provide some information about what your user-defined functions are doing with input data. Since Flink can’t parse and understand code, you can provide crucial information that will help to build a more efficient execution plan. There are three annotations that we can use:

@ForwardedFields: Specifies what fields in an input value were left unchanged and are used in an output value.
@NotForwardedFields: Specifies fields that were not preserved in the same positions in the output.
@ReadFields: Specifies what fields were used to compute a result value. You should only specify fields that were used in computations and not merely copied to the output.

Click through for his four tips.

Comments closed

Installing The Azure ML Workbench

Published 2017-10-19 by Kevin Feasel

Leila Etaati walks us through setting up the Azure ML workbench:

In Microsoft ignite 2017, Azure ML team announce new on-premises tools for doing machine learning. this tools much more comprehensive as it provides

1- a workspace helps data wrangling

2- Data Visualization

3-Easy to deploy

4-Support Python codes

in this post and next posts, I will share my experiment with working this tools.

Click through for the step-by-step installation guide.

Comments closed

When The Maximum Workspace Memory Isn’t The Internal Pool Maximum

Published 2017-10-19 by Kevin Feasel

Lonny Niederstadt answers the call from someone who needs the combination of Perfmon and DMV data:

When is a maximum not really the maximum?
When it’s a maximum for an explicitly or implicitly modified default.
Whether “the definitive documentation” says so or not.

Yesterday on Twitter #sqlhelp this question came up.

*****

#sqlhelp Anyone have DEFINITIVE documentation on what the MemoryManager.Maximum Workspace Memory counter indicates (grant-able mem right?)

— Michael K. Campbell (@AngryPets) October 17, 2017

*****

Aha! I thought to myself. For this I am purposed! To show how Perfmon and DMV data tie out!

Read on for the simple form of the answer, followed by the complication which makes life interesting.

Comments closed

Checking If UseLargeFRS Is Enabled

Published 2017-10-19 by Kevin Feasel

Cody Konior has a quick post showing how to see if you have the UseLargeFRS switch turned on for your SQL Server disks:

PureStorage has a pretty cool post that mentions the importance of formatting SQL Server disks with a 64KB clusters and the /L flag (also known as the UseLargeFRS switch on PowerShell’s Format-Volume cmdlet).

Why UseLargeFRS? It’s to help avoid DBCC CHECKDB failures on large/busy databases. But how do you work out whether it’s enabled or not? PowerShell to the rescue!

You can’t work this out remotely but you can do it locally.

Read on for the script.

Comments closed

Tracking Replication Throughput

Published 2017-10-19 by Kevin Feasel

Phil Grayson has a couple of replication-related queries:

It’s a right laugh when replication goes wrong, if you haven’t been there and use replication heavily, you’re in for the night / early morning of your life. Like any issue, the cause of replication problems could be one of many, but checking what commands are being passed through the distributor is always worthwhile, as an inefficient query could be updating the same data over and over, putting unnecessary strain on the distributor, infrastructure or the subscribers.

Read on for those queries.

Comments closed

XESmartTarget: Event Handling With SQL Server

Published 2017-10-19 by Kevin Feasel

Gianluca Sartori introduces a new tool to help with event handling in SQL Server:

I am proud to introduce XESmartTarget: the easiest way to interact with extended events sessions without writing a single line of code.

XESmartTarget uses the Extended Events streaming API to connect to a session and execute some actions in response to each event. The actions you perform can be configured in a simple .json file, which controls the Response types and all their properties.

For instance, if you have a session to capture deadlocks, you may want to be notified whenever such an event is captured. All you need to do is configure XESmartTarget to send an email using the EmailResponse class. You can do that by creating a simple .json file with all the parameters needed

This looks quite interesting.

Comments closed

Replacing DAX PathContains With OR

Published 2017-10-19 by Kevin Feasel

Chris Koester shows the performance benefits of replacing the PathContains function in DAX with a simple OR operator:

This post shows how you can generate optimized multi-value DAX parameters in SSRS and achieve greater performance compared to the DAX PathContains function. This will be a short post that provides the SSRS expression to convert multiple SSRS parameters into a double-pipe delimited string for use in a DAX query. In other words, the goal is to use the DAX OR operator (||) instead of the PathContains function. I’m assuming the reader has experience with SSRS, so not all steps will be shown.

Read on for the example, which ended up being a 16X performance improvement.

Comments closed

Installing dbatools

Published 2017-10-19 by Kevin Feasel

Chrissy LeMaire has a good walkthrough on how to install a module from the Powershell gallery, using dbatools as an example:

It’s easy for PowerShell toolmakers to forget that new users may have questions about ExecutionPolicy and Repositories. If dbatools is the first module you’ve ever installed and used, this guide is intended to help you with all of your install/setup questions and concerns.

In order to emulate what your experience may be like, I spun up a fresh Windows 10 instance in Azure and went through all of the required steps which include addressing:

Setting the Execution Policy
Explicitly trusting Microsoft’s PowerShell Gallery repository
Installing dbatools
Explicitly trusting dbatools as a Publisher, before first use

If you’ve never installed dbatools before, Chrissy shows just how easy it can be.

Comments closed

Detecting RAID Failures

Published 2017-10-19 by Kevin Feasel

Randolph West has an interesting after-the-fact test to determine whether data on a RAID array which experienced major failure is recoverable:

We took a look through some random files on the disk. I was looking for evidence that the VMDK file that had been copied was taken from a RAID array in a corrupt state, namely that one of the disks had failed, and that the second disk had failed during the rebuild of the array.

The easiest way to see this is to look for a JPEG or PNG file over 256 KB in size. Most RAID block sizes are multiples of 64 KB, usually 128 KB or 256 KB. Each block is split over the individual physical disks, with a parity bit, so for a particular block of data, if the RAID array has failed, you will see a corrupt image, or the image won’t be viewable at all.

Randolph presents an interesting smoke test here. Read the whole thing.

Comments closed

Unsupervised Decision Trees

Published 2017-10-18 by Kevin Feasel

William Vorhies describes what unsupervised decision trees are:

In anomaly detection we are attempting to identify items or events that don’t match the expected pattern in the data set and are by definition rare. The traditional ‘signature based’ approach widely used in intrusion detection systems creates training data that can be used in normal supervised techniques. When an attack is detected the associated traffic pattern is recorded and marked and classified as an intrusion by humans. That data then combined with normal data creates the supervised training set.

In both supervised and unsupervised cases decision trees, now in the form of random forests are the weapon of choice. Decision trees are nonparametric; they don’t make an assumption about the distribution of the data. They’re great at combining numeric and categoricals, and handle missing data like a champ. All types of anomaly data tend to be highly dimensional and decision trees can take it all in and offer a reasonably clear guide for pruning back to just what’s important.

To be complete, there is also category of Semi-Supervised anomaly detection in which the training data consists only of normal transactions without any anomalies. This is also known as ‘One Class Classification’ and uses one class SVMs or autoencoders in a slightly different way not discussed here.

Interesting reading. I’d had no idea that unsupervised decision trees were even a thing.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31