Press "Enter" to skip to content

Month: September 2017

Transactional Replication And SQL On Linux

Phil Grayson shows a way to get transactional replication working on Linux:

Microsoft have stated that transactional replication isn’t supported on Linux and we’re not sure if they intend to in the future. This means that if you try to add the server to a publisher, you get the following message. So you can’t use the GUI and it also means that you can’t use pull as the necessary files won’t be there.

Despite that warning, there is a way to set up a push subscription; click through for that way.

2 Comments

Polybase And RPC Protection

Casey Karst announces that Polybase supports Hadoop RPC protection:

Supporting this configuration allows PolyBase to connect and query Hadoop clusters that have wire encryption turned on. This enables a secure connection between Hadoop and SQL Server; as well as, among the Hadoop Data Nodes.

To connect to a Hadoop cluster with the hadoop.rpc.protection set to privacy or integrity, you will need to alter the core-site.xml file that is installed with PolyBase. This file is generally found at C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Binn\Polybase\Hadoop\conf.

That’s good news for Polybase users.

Comments closed

Regular Expression Cheat Sheets

Mara Averick shows off a collection of regular expression guides:

There are helpful string-related R packages 📦, stringr (which is built on top of the more comprehensive stringi package) comes to mind. But, at some point in your computing life, you’re gonna need to get down with regular expressions.

And so, here’s a collection of some of the Regex-related links I’ve tweeted 🐦:

Click through for links to regular expression resources.

Comments closed

Visualizing A Single Number

Tim Bock shows a dozen methods for visualizing a single number:

There are a number of situations in which it can be advantageous to create a visualization to represent a single number:

  • To communicate with less numerate viewers/readers;

  • Infographics and dashboards commonly use one important number;

  • To attract the attention of distracted or busy viewers/readers;

  • To add some humanity or “color”, to create an emotional connection;

  • Or to increase the redundancy of the presentation (see Improve the Quality of Data Visualizations Using Redundancy).

To a great extent, my favorite is the first.  There are good cases for many of the others—primarily the shock value of the uncountable pictogram—but typically, the best visualization is simple.

Comments closed

Date Correlation Optimization

Monica Rathbun explains another quasi-hidden SQL Server configuration option:

According to MSDN – The DATE_CORRELATION_OPTIMIZATION database SET option improves the performance of queries that perform an equi-join between two tables whose date or datetime columns are correlated, and which specify a date restriction in the query predicate.

How many of you read what MSDN says and thinks “wuuuuuttt, English please”? I do.

Read on for the English translation.

Comments closed

Dealing With Noisy Neighbors

Kevin Kline explains what Resource Governor does:

There are lots of ways to manage noisy neighbors. For example, you could spin up additional instances of SQL Server on a single physical or virtual machine (VM), and then segregate the applications to a distinct instance. You could also follow the old adage of “one application, one SQL Server” by putting the SQL Server onto its own machine, either physical or virtual. But that can get very expensive very quickly, depending on your licensing methodology.

If you’re running SQL Server 2008 or later, you might want to investigate Resource Governor as an alternative. Resource Governor lets you create limits on the amount of system resources a database and application can consume. On versions 2008 to 2012, Resource Governor can explicitly limit CPU and memory and, starting with version 2014, limit I/O consumption as well. This is powerful medicine for multi-tenant instances with noisy neighbors!

My response to noisy neighbors is to turn my music up really loud as a passive-aggressive response.  Oh, wait, wrong kind of noisy neighbor…  H/T SentryOne

Comments closed

Creating Database Snapshots

David Fowler has a script which lets you create database snapshots easily:

The procedure takes two parameters,

@DatabaseList – a comma delimited string of database names, allows wildcards
@ListOnly – 1- a list of affected databases will be displayed but snapshots aren’t created.
0- Snapshots are created automatically DEFAULT

I’m a big fan of database snapshots in development and QA environments—take a snapshot, run a workload, revert the snapshot.

Comments closed

Azure Data Lake Analytics Pipelines

Yan Li notes that Azure Data Lake Analytics now offers the ability to manage pipelines:

To make it easier to manage and understand jobs, ADLA now captures the pipeline and recurrence information for each job. This information can be used to connect and organize jobs belonging to the same pipeline or recurring instances. As shown in Fig 2, now jobs are organized by pipeline and recurring instances which enable you to:

  • Quickly identify jobs in pipelines which may have failed or taken longer than expected.

  • Get the aggregated statistics (e.g. job counts, successful and failed AU hours etc.) for a pipeline or a recurring instance

This is an interesting improvement.

Comments closed

Integrating Azure Data Catalog With Power BI

Gaston Cruz shows how to tie view Azure Data Catalog data in Power BI:

A Self Service culture will allow to address analysts to generate their own reports, lists, and dashboards without dependence on the schedule and availability of IT staff. In these cases reports combine different sources of information are generated, many of which may not have been used historically in the company, and this in turn implies that a large number of cases which source you do not know used to implement certain reports.

Azure Data Catalog comes as an option to break that cycle of discovery that is usually done manually. This means that after the first cycle where the business analyst discovers the sources of optimal data to generate certain reports the can register, and add information (metadata) to make this source easier to discover future analysts requiring such data for the implementation of similar reports. The discovery of these sources, and capability to add metadata are procedures do not have to give at the same time but Data Catalog allows work annotations by analysts as a continuous work in time where more information is added to the repository every time.

Click through for a demo.

Comments closed