Press "Enter" to skip to content

Month: January 2018

Log Shipping With dbatools

Sander Stad has started a series on using dbatools to help set up log shipping.  Part one walks through the basics and setup:

Technically you don’t need multiple servers to setup log shipping. You can set it up with just one single SQL Server instance. In an HA solution this wouldn’t make sense but technically it’s possible.

Having a separate server acting as the monitoring server ensures that when one of the server goes down, the logging of the actions still takes place.

Having a separate network share for both the backup and copy makes it easier to setup security and decide which accounts can access the backups. The backup share needs to be readable and writable by the primary instance and readable by the secondary instance.
The copy share needs to be accessible and writable for only the secondary instance.

Part two is all about checking the status of a log shipping implementation:

Monitoring your log shipping processes is important. You need the synchronization status of the log shipped databases.

The log ship process consists of three steps; Backup, Copy and Restore. The log shipping tracks the status for these processes.
It registers the last transaction log backup, the last file copied and the last file restored. It also keeps track of the time since the last backup, copy and restore.

But that’s not all. Log shipping also checks if the threshold for the backup and restore has been exceeded.

Log shipping is an underrated piece of the HA/DR puzzle, and Sander shows how easy dbatools makes it to configure.

Comments closed

Custom Alerting With PowerApps

Jason Thomas shows how to create custom PowerApps alerts:

So this happened yesterday – one of my customers pinged me and asked whether it is possible to set customized data alerts for her end users? I froze for a second, knowing that such a functionality is not available out of the box but knowing how flexible Power BI is, I decided to explore her use case further. Worst case, I know I have the backing of the world’s best product team, and could submit a request to build this for us. Basically, she wanted her end users to get data alerts if specific products got sold in the last 24 hours (which should have been easy with the regular data alerts functionality in Power BI), but the challenge was that she wanted her users to set (add/delete) their own products. As I said earlier, this functionality is not available out of the box but with the PowerApps custom visual for Power BI and some DAX, we can definitely create a workaround.

Read on to see how it’s done.

Comments closed

What’s Happing In Azure Data Factory Right Now?

Melissa Coates has a couple Powershell scripts to figure out which pipelines are currently running in Azure Data Factory v1:

This is a quick post to share a few scripts to find what is currently executing in Azure Data Factory. These PowerShell scripts are applicable to ADF version 1 (not version 2 which uses different cmdlets).

Prerequisite: In addition to having installed the Azure Resource Manager modules, you’ll have to register the provider for  Azure Data Factory:

#One-time registration of the ADF provider
#Register-AzureRmResourceProvider -ProviderNamespace Microsoft.DataFactory

Click through for the Powershell snippets.

Comments closed

More DBA Salary Research

Ginger Grant digs into the DBA salary survey a bit further:

I know that I have heard that if you want to make money you need to get into management. Being a good manager is not the same skill set as being a good database professional, and there are many people who do not want to be managers.  According to the data in the survey, you can be in the top 5% of wage earners and not be a manager. How about telecommuting? What is the impact on telecommuting and the top 5%?  Well, it depends if you are looking at the much smaller female population. The majority of females in the top 5% telecommute.  Those who commute 100% of the time do very well, as well as those who spend every day at a job site.  Males report working more hours and telecommuting less than females do as well.  If you look at people who are in the average category, they do not telecommute. The average category has 25% of people who work less than 40 hours a week too. If you look at the number of items in the category by country you can determine that in many cases, like Uganda, there are not enough survey respondents to draw any conclusions about salary in locations.

Another area of importance here is in trying to normalize salaries for standard of living:  it’s a lot easier to get a $100K/year job in Manhattan, NY than Manhattan, KS, but $100K in the latter goes much further.  Based on my little digging into the set, it’d be tough to draw any conclusions on that front, but it is an a priori factor that I’d want to consider when dealing with salary survey data.

Comments closed

More SSMS Tips & Tricks

Wayne Sheffield’s been busy since our last visit.  Here are six more SSMS tips and tricks.

First, Wayne shows how to create keyboard shortcuts for common activities.  Then, he shows how to color-code SQL Server instances, which is very helpful when trying to avoid accidental deployments to prod.

Next up he shows off the template explorer:

To use any of the templates, just double-click them. This template will be opened up in a new query window, ready for you to change.

You can also change the templates themselves. Just right-click on the template, and select “Edit”. The template will be opened up, and changes that you make will be saved to the template definition itself.

Finally, you can create your own templates. Right-click the root folder (SQL Server Templates), and you can select to create a new folder or template.

Templates are quite helpful when you commonly run the same bits of code.  Speaking of the same bits of code, Wayne next shows how to use Snippets.

His latest two posts are about the Object Explorer.  The first post shows you how to filter objects in the Object Explorer, and the second shows which columns you can include in it.

Comments closed

Building A Comparer For The Power BI Table.Group Function

Imke Feldmann shows off what you can do with the fifth parameter in Table.Group:

The Table.Group-function will pass 2 parameters to the function in the 5th arguments if it is used: For GroupKind.Local this is group-columns-record from the initial/first row of the table/group and the respective record of the current row.

As long as the comparer-function returns 0, the current row will be regarded as belonging to the group: This is a match in the Comparer.OrdinalIgnoreCase-function and also the value of false (which makes the syntax a bit counterintuitive here in my eyes)

Interesting reading.

Comments closed

Kafka Topic Reuse

Martin Kleppmann walks through the trade-offs of reusing Apache Kafka topics for different event types:

The common wisdom (according to several conversations I’ve had, and according to a mailing list thread) seems to be: put all events of the same type in the same topic, and use different topics for different event types. That line of thinking is reminiscent of relational databases, where a table is a collection of records with the same type (i.e. the same set of columns), so we have an analogy between a relational table and a Kafka topic.

The Confluent Avro Schema Registry has traditionally reinforced this pattern, because it encourages you to use the same Avro schema for all messages in a topic. That schema can be evolved while maintaining compatibility (e.g. by adding optional fields), but ultimately all messages have been expected to conform to a certain record type. We’ll revisit this later in the post, after we’ve covered some more background.

For some types of streaming data, such as logged activity events, it makes sense to require that all messages in the same topic conform to the same schema. However, some people are using Kafka for more database-like purposes, such as event sourcing, or exchanging data between microservices. In this context, I believe it’s less important to define a topic as a grouping of messages with the same schema. Much more important is the fact that Kafka maintains ordering of messages within a topic-partition.

Read the whole thing.

Comments closed

“Pretty But Useless” Visuals

I continue my dashboard visualization series with a bit of an extended rant:

The best use of a pie chart is to show a simple share of a static total.  Here, we can see that Daredevil has almost half of the critics’ reviews, and that The Punisher and Jessica Jones are split.

This simple pie chart also shows some of the problems of pie charts.  The biggest issue is that people have trouble with angle, making it hard to distinguish relative slices.  For example, is Jessica Jones’s slice larger or is The Punisher’s?  It’s really hard to tell in this case, and if that difference is significant, you’re making life harder for your viewers.

Second, as slice percentages get smaller, it becomes harder to differentiate slices.  In this case, we can see all three pretty clearly, but if we start getting 1% or 2% slices, they end up as slivers on the pie, making it hard to distinguish one slice from another.

Third, pie charts usually require one color per slice.  This can lead to an explosion of color usage.  Aside from potential risks of using colors which in concert are not CVD-friendly, adding all of these colors has yet another unintended consequence.  If you use the same color in two different pie charts to mean different things, you can confuse people, as they will associate color with some category, and so if they see the same color twice, they will implicitly assign both things the same category.  That leads to confusion.  Yes, careful reading of your legend dissuades people of that notion, but by the time they see the legend, they’ve already implicitly mapped out what this color represents.

Fourth, pie charts often require legends, which increases eye scanning.

Click through to read me complain about other types of visuals, too.

Comments closed

Anomaly Detection With Python

Robert Sheldon continues his SQL Server Machine Learning Series:

As important as these concepts are to working Python and MLS, the purpose in covering them was meant only to provide you with a foundation for doing what’s really important in MLS, that is, using Python (or the R language) to analyze data and present the results in a meaningful way. In this article, we start digging into the analytics side of Python by stepping through a script that identifies anomalies in a data set, which can occur as a result of fraud, demographic irregularities, network or system intrusion, or any number of other reasons.

The article uses a single example to demonstrate how to generate training and test data, create a support vector machine (SVM) data model based on the training data, score the test data using the SVM model, and create a scatter plot that shows the scoring results.

Click through to see the scenario that Robert has laid out as an example.

Comments closed

AWS Glue Now Supports Scala

Mehul Shah, et al, announce that AWS Glue officially supports Scala:

We are excited to announce AWS Glue support for running ETL (extract, transform, and load) scripts in Scala. Scala lovers can rejoice because they now have one more powerful tool in their arsenal. Scala is the native language for Apache Spark, the underlying engine that AWS Glue offers for performing data transformations.

Beyond its elegant language features, writing Scala scripts for AWS Glue has two main advantages over writing scripts in Python. First, Scala is faster for custom transformations that do a lot of heavy lifting because there is no need to shovel data between Python and Apache Spark’s Scala runtime (that is, the Java virtual machine, or JVM). You can build your own transformations or invoke functions in third-party libraries. Second, it’s simpler to call functions in external Java class libraries from Scala because Scala is designed to be Java-compatible. It compiles to the same bytecode, and its data structures don’t need to be converted.

To illustrate these benefits, we walk through an example that analyzes a recent sample of the GitHub public timeline available from the GitHub archive. This site is an archive of public requests to the GitHub service, recording more than 35 event types ranging from commits and forks to issues and comments.

Functional languages tend to be very good for ETL tasks, and Scala is a great choice due to its relationship with Spark.

Comments closed