Press "Enter" to skip to content

Day: January 22, 2018

Unit Testing Spark Streaming DStreams

Anuj Saxena shows how to create unit tests for DStreams in Spark Streaming:

The method ‘ testOperation ‘ takes the output of the operation performed on the ‘inputPair’ and check whether it is equal to the ‘outputPair’ and just like this, we can test our business logic.

This short snippet lets you test your business logic without forcing you to create even a Spark session. You can mock the whole streaming environment and test your business logic easily.

This was a simple example of unary operations on DStreams. Similarly, we can test binary operations and window operations on DStreams.

Click through for an example with code.

Comments closed

Markov Chains In Python

Sandipan Dey shows off various uses of Markov chains as well as how to create one in Python:

Perspective. In the 1948 landmark paper A Mathematical Theory of Communication, Claude Shannon founded the field of information theory and revolutionized the telecommunications industry, laying the groundwork for today’s Information Age. In this paper, Shannon proposed using a Markov chain to create a statistical model of the sequences of letters in a piece of English text. Markov chains are now widely used in speech recognition, handwriting recognition, information retrieval, data compression, and spam filtering. They also have many scientific computing applications including the genemark algorithm for gene prediction, the Metropolis algorithm for measuring thermodynamical properties, and Google’s PageRank algorithm for Web search. For this assignment, we consider a whimsical variant: generating stylized pseudo-random text.

Markov chains are a venerable statistical technique and formed the basis of a lot of text processing (especially text generation) due to the algorithm’s relatively low computational requirements.

Comments closed

Log Shipping With dbatools

Sander Stad has started a series on using dbatools to help set up log shipping.  Part one walks through the basics and setup:

Technically you don’t need multiple servers to setup log shipping. You can set it up with just one single SQL Server instance. In an HA solution this wouldn’t make sense but technically it’s possible.

Having a separate server acting as the monitoring server ensures that when one of the server goes down, the logging of the actions still takes place.

Having a separate network share for both the backup and copy makes it easier to setup security and decide which accounts can access the backups. The backup share needs to be readable and writable by the primary instance and readable by the secondary instance.
The copy share needs to be accessible and writable for only the secondary instance.

Part two is all about checking the status of a log shipping implementation:

Monitoring your log shipping processes is important. You need the synchronization status of the log shipped databases.

The log ship process consists of three steps; Backup, Copy and Restore. The log shipping tracks the status for these processes.
It registers the last transaction log backup, the last file copied and the last file restored. It also keeps track of the time since the last backup, copy and restore.

But that’s not all. Log shipping also checks if the threshold for the backup and restore has been exceeded.

Log shipping is an underrated piece of the HA/DR puzzle, and Sander shows how easy dbatools makes it to configure.

Comments closed

Custom Alerting With PowerApps

Jason Thomas shows how to create custom PowerApps alerts:

So this happened yesterday – one of my customers pinged me and asked whether it is possible to set customized data alerts for her end users? I froze for a second, knowing that such a functionality is not available out of the box but knowing how flexible Power BI is, I decided to explore her use case further. Worst case, I know I have the backing of the world’s best product team, and could submit a request to build this for us. Basically, she wanted her end users to get data alerts if specific products got sold in the last 24 hours (which should have been easy with the regular data alerts functionality in Power BI), but the challenge was that she wanted her users to set (add/delete) their own products. As I said earlier, this functionality is not available out of the box but with the PowerApps custom visual for Power BI and some DAX, we can definitely create a workaround.

Read on to see how it’s done.

Comments closed

What’s Happing In Azure Data Factory Right Now?

Melissa Coates has a couple Powershell scripts to figure out which pipelines are currently running in Azure Data Factory v1:

This is a quick post to share a few scripts to find what is currently executing in Azure Data Factory. These PowerShell scripts are applicable to ADF version 1 (not version 2 which uses different cmdlets).

Prerequisite: In addition to having installed the Azure Resource Manager modules, you’ll have to register the provider for  Azure Data Factory:

#One-time registration of the ADF provider
#Register-AzureRmResourceProvider -ProviderNamespace Microsoft.DataFactory

Click through for the Powershell snippets.

Comments closed

More DBA Salary Research

Ginger Grant digs into the DBA salary survey a bit further:

I know that I have heard that if you want to make money you need to get into management. Being a good manager is not the same skill set as being a good database professional, and there are many people who do not want to be managers.  According to the data in the survey, you can be in the top 5% of wage earners and not be a manager. How about telecommuting? What is the impact on telecommuting and the top 5%?  Well, it depends if you are looking at the much smaller female population. The majority of females in the top 5% telecommute.  Those who commute 100% of the time do very well, as well as those who spend every day at a job site.  Males report working more hours and telecommuting less than females do as well.  If you look at people who are in the average category, they do not telecommute. The average category has 25% of people who work less than 40 hours a week too. If you look at the number of items in the category by country you can determine that in many cases, like Uganda, there are not enough survey respondents to draw any conclusions about salary in locations.

Another area of importance here is in trying to normalize salaries for standard of living:  it’s a lot easier to get a $100K/year job in Manhattan, NY than Manhattan, KS, but $100K in the latter goes much further.  Based on my little digging into the set, it’d be tough to draw any conclusions on that front, but it is an a priori factor that I’d want to consider when dealing with salary survey data.

Comments closed

More SSMS Tips & Tricks

Wayne Sheffield’s been busy since our last visit.  Here are six more SSMS tips and tricks.

First, Wayne shows how to create keyboard shortcuts for common activities.  Then, he shows how to color-code SQL Server instances, which is very helpful when trying to avoid accidental deployments to prod.

Next up he shows off the template explorer:

To use any of the templates, just double-click them. This template will be opened up in a new query window, ready for you to change.

You can also change the templates themselves. Just right-click on the template, and select “Edit”. The template will be opened up, and changes that you make will be saved to the template definition itself.

Finally, you can create your own templates. Right-click the root folder (SQL Server Templates), and you can select to create a new folder or template.

Templates are quite helpful when you commonly run the same bits of code.  Speaking of the same bits of code, Wayne next shows how to use Snippets.

His latest two posts are about the Object Explorer.  The first post shows you how to filter objects in the Object Explorer, and the second shows which columns you can include in it.

Comments closed

Building A Comparer For The Power BI Table.Group Function

Imke Feldmann shows off what you can do with the fifth parameter in Table.Group:

The Table.Group-function will pass 2 parameters to the function in the 5th arguments if it is used: For GroupKind.Local this is group-columns-record from the initial/first row of the table/group and the respective record of the current row.

As long as the comparer-function returns 0, the current row will be regarded as belonging to the group: This is a match in the Comparer.OrdinalIgnoreCase-function and also the value of false (which makes the syntax a bit counterintuitive here in my eyes)

Interesting reading.

Comments closed