Author: Kevin Feasel

Discerning eyes might notice that SQL Server didn’t shade in the area inside of the polygon — it instead shaded in everything in the world EXCEPT for the interior of our polygon.

If this is the first time you’ve encountered this behavior then you’re probably confused by this behavior — I know I was.

Read on to learn more about left-hand and right-hand polygon specifications and how to translate from one to the other.

Comments closed

Defining A Data Lake

Published 2018-01-23 by Kevin Feasel

Derik Hammer gives us a definition of the data lake:

Data lake, a term originally coined by James Dixon, the founder and CTO of Pentaho, is used to describe a data store which can scale to extremely large sizes, in an affordable manner. A data lake is also designed to store the raw data, in its original format, so it can be used immediately, rather than waiting weeks for the IT department to massage it into a format that the data warehouse can accept and/or use effectively.

The data lake concept always includes the capability to scale to an enormous size. However, you do not need petabytes of data to find use in a data lake. It can be used as cheap storage for long-term archival data. It can be used to transform data before attempting to ingest into a data warehouse with the convenience of retaining the original and transformed versions of the data. It also can be used as the centralized staging location for ingestion into the data warehouse, simplifying the loading processes.

I would like to take this opportunity to remind readers that the Aristotelian opposite of the Data Lake is the Data Swamp. Derik uses this term as well and it makes me feel warm and fuzzy inside to see broad adoption of this term.

Comments closed

The SQLUndercover Inspector

Published 2018-01-23 by Kevin Feasel

Adrian Buckman announces the SQLUndercover Inspector:

In a Nutshell:

The SQLUndercover Inspector is a configurable daily report written in SQL that will send you an email (or log the report to a SQL Table) showing you information about specific parts of SQL Server in HTML format including highlighted warnings/advisory conditions, the report has configurable thresholds and settings leaving you in control.

Click through to see what they track.

Comments closed

DRY With Biml

Published 2018-01-23 by Kevin Feasel

Cathrine Wilhelmsen shows how to use tiered Biml files to prevent repetition:

Many Biml solutions start very simple, with just a single Biml file that generates a few SSIS packages. Most developers quickly see the need for a more complex solution for multiple sources. One way to reuse code and apply the Don’t Repeat Yourself software engineering principle in Biml is to use Tiered Biml Files.

In addition to using Tiered Biml Files, there are four other main ways you can avoid repeating your Biml code:

Include Files

CallBimlScript with Parameters

C#/VB Code Files

Transformers (BimlStudio feature only)

In this post we will look at how to use Tiered Biml Files.

Tiering is a helpful mechanism for doing work in one location and using the subsequent results of that work within your Biml solution. Read the whole thing.

Comments closed

CYA Parameters In Powershell

Published 2018-01-23 by Kevin Feasel

Rob Sewell describes a class of Powershell parameters designed to keep you from doing something awful on accident:

If you wanted to see what would happen, you could edit the script and add the WhatIf parameter to every changing command but that’s not really a viable solution. What you can do is

1

$PSDefaultParameterValues[‘*:WhatIf’] = $true

this will set all commands that accept WhatIf to use the WhatIf parameter. This means that if you are using functions that you have written internally you must ensure that you write your functions to use the common parameters

Once you have set the default value for WhatIf as above, you can simply call your script and see the WhatIf output

WhatIf is a great parameter and when developing cmdlets, you should add in support.

Comments closed

Unit Testing Spark Streaming DStreams

Published 2018-01-22 by Kevin Feasel

Anuj Saxena shows how to create unit tests for DStreams in Spark Streaming:

The method ‘ testOperation ‘ takes the output of the operation performed on the ‘inputPair’ and check whether it is equal to the ‘outputPair’ and just like this, we can test our business logic.

This short snippet lets you test your business logic without forcing you to create even a Spark session. You can mock the whole streaming environment and test your business logic easily.

This was a simple example of unary operations on DStreams. Similarly, we can test binary operations and window operations on DStreams.

Click through for an example with code.

Comments closed

Markov Chains In Python

Published 2018-01-22 by Kevin Feasel

Sandipan Dey shows off various uses of Markov chains as well as how to create one in Python:

Perspective. In the 1948 landmark paper A Mathematical Theory of Communication, Claude Shannon founded the field of information theory and revolutionized the telecommunications industry, laying the groundwork for today’s Information Age. In this paper, Shannon proposed using a Markov chain to create a statistical model of the sequences of letters in a piece of English text. Markov chains are now widely used in speech recognition, handwriting recognition, information retrieval, data compression, and spam filtering. They also have many scientific computing applications including the genemark algorithm for gene prediction, the Metropolis algorithm for measuring thermodynamical properties, and Google’s PageRank algorithm for Web search. For this assignment, we consider a whimsical variant: generating stylized pseudo-random text.

Markov chains are a venerable statistical technique and formed the basis of a lot of text processing (especially text generation) due to the algorithm’s relatively low computational requirements.

Comments closed

Custom Alerting With PowerApps

Published 2018-01-22 by Kevin Feasel

Jason Thomas shows how to create custom PowerApps alerts:

So this happened yesterday – one of my customers pinged me and asked whether it is possible to set customized data alerts for her end users? I froze for a second, knowing that such a functionality is not available out of the box but knowing how flexible Power BI is, I decided to explore her use case further. Worst case, I know I have the backing of the world’s best product team, and could submit a request to build this for us. Basically, she wanted her end users to get data alerts if specific products got sold in the last 24 hours (which should have been easy with the regular data alerts functionality in Power BI), but the challenge was that she wanted her users to set (add/delete) their own products. As I said earlier, this functionality is not available out of the box but with the PowerApps custom visual for Power BI and some DAX, we can definitely create a workaround.

Read on to see how it’s done.

Comments closed

Log Shipping With dbatools

Published 2018-01-22 by Kevin Feasel

Sander Stad has started a series on using dbatools to help set up log shipping. Part one walks through the basics and setup:

Technically you don’t need multiple servers to setup log shipping. You can set it up with just one single SQL Server instance. In an HA solution this wouldn’t make sense but technically it’s possible.

Having a separate server acting as the monitoring server ensures that when one of the server goes down, the logging of the actions still takes place.

Having a separate network share for both the backup and copy makes it easier to setup security and decide which accounts can access the backups. The backup share needs to be readable and writable by the primary instance and readable by the secondary instance.
The copy share needs to be accessible and writable for only the secondary instance.

Part two is all about checking the status of a log shipping implementation:

Monitoring your log shipping processes is important. You need the synchronization status of the log shipped databases.

The log ship process consists of three steps; Backup, Copy and Restore. The log shipping tracks the status for these processes.
It registers the last transaction log backup, the last file copied and the last file restored. It also keeps track of the time since the last backup, copy and restore.

But that’s not all. Log shipping also checks if the threshold for the backup and restore has been exceeded.

Log shipping is an underrated piece of the HA/DR puzzle, and Sander shows how easy dbatools makes it to configure.

Comments closed

What’s Happing In Azure Data Factory Right Now?

Published 2018-01-22 by Kevin Feasel

Melissa Coates has a couple Powershell scripts to figure out which pipelines are currently running in Azure Data Factory v1:

This is a quick post to share a few scripts to find what is currently executing in Azure Data Factory. These PowerShell scripts are applicable to ADF version 1 (not version 2 which uses different cmdlets).

Prerequisite: In addition to having installed the Azure Resource Manager modules, you’ll have to register the provider for Azure Data Factory:
#One-time registration of the ADF provider
#Register-AzureRmResourceProvider -ProviderNamespace Microsoft.DataFactory

Click through for the Powershell snippets.

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30