So You Want To Wait…

If you need your queries to be slower, Kenneth Fisher has you covered:

And in case you run into a development team that complains that when they time their code the duration is all over the place, this little gem will make sure their query will always take the same amount of time (assuming normal run time is under 90 seconds).

It’s the T-SQL equivalent of speed-up loops.

Adding Public Holidays To A Date Dimension

Reza Rad continues his series on Power BI date dimensions:

To get public holidays live, you first need an API that is giving you up-to-date information. There are some web pages that has the list of public holidays. I have already explained in another blog post how to use a web page and query public holidays from there. That method uses custom functions as well, here you can read about that.

The method of reading data from a web page has an issue already; Web.Page function from Power Query is used to pull data from that page, and this function needs a gateway configuration to work. There is another function Xml.Document that can work even without the gateway. So because of this reason, we’ll use Xml.Document and get data from an API that provides the result set as XML.

WebCal.fi is a great free website with calendars for 36 countries which I do recommend for this example. This website, provides the calendars through XML format. There are other websites that give you the calendar details through a paid subscription. However, this website is a great free one which can be used for this example. WebCal.fi is created by User Point Inc.

This was an interesting approach to the problem, one I did not expect when first reading the article.  I figured it’d be some sort of date calculation script.

Creating A Simple Kafka Streams Application

Bill Bejeck has built a simple Kafka Streams application for us:

This blog post will quickly get you off the ground and show you how Kafka Streams works. We’re going to make a toy application that takes incoming messages and upper-cases the text of those messages, effectively yelling at anyone who reads the message. This application is called the yelling application.

Before diving into the code, let’s take a look at the processing topology you’ll assemble for this “yelling” application. We’ll build a processing graph topology, where each node in the graph has a particular function.

His entire application is 20 lines of code but it does function as a valid Kafka Streams app and works well as a demo.

Introduction To Bayesian Statistics

Kennie Nybo Pontoppidan has just completed a course on Bayesian statistics:

Last month I finished a four-week course on Bayesian statistics. I have always wondered why people deemed it hard, and why I heard that the computations quickly became complicated. The course wasn’t that hard, and it gave a nice introduction to prior/posterior distributions and I many cases also how to interpret the parameters in the prior distribution as extra data points.

An interesting aspect of Bayesian statistics is that it is a mathematically rigorous model, with no magic numbers such as the 5% threshold for p-values. And I like the way it naturally caters sequential hypothesis testing with where the sample size of each iteration is not fixed in advance. Instead data are evaluated and used to update the model as they are collected.

Check out Kennie’s explanation as well as the course.  I also went through Bayes’ Theorem not too long ago, which is a good introduction to the topic if you’re unfamiliar with Bayes’s Law.

How Kafka Is Tested

Colin McCabe walks us through the process of a change in Apache Kafka:

The Kafka community has a culture of deep and extensive code review that tries to proactively find correctness and performance issues. Code review is, of course, a pretty common practice in software engineering but it is often cursory check of style and high-level design. We’ve found a deeper investment of time in code review really pays off.

The failures in distributed systems often have to do with error conditions, often in combinations and states that can be difficult to trigger in a targeted test. There is simply no substitute for a deeply paranoid individual going through new code line-by-line and spending significant time trying to think of everything that could go wrong. This often helps to find the kind of rare problem that can be hard to trigger in a test.

Testing data processing engines is difficult, particularly distributed systems where things like network partitions and transient errors are hard to reproduce in a test environment.

Time-Varying Models

Lingrui Gan explains how to model for parameters whose effects change over time:

We can frame conversion prediction as a binary classification problem, with outcome “1” when the visitor converts, and outcome “0” when they do not. Suppose we build a model to predict conversion using site visitor features. Some examples of relevant features are: time of day, geographical features based on a visitor’s IP address, their device type, such as “iPhone”, and features extracted from paid ads the visitor interacted with online.

A static classification model, such as logistic regression, assumes the influence of all features is stable over time, in other words, the coefficients in the model are constants. For many applications, this assumption is reasonable—we wouldn’t expect huge variations in the effect of a visitor’s device type. In other situations, we may want to allow for coefficients that change over time—as we better optimize our paid ad channel, we expect features extracted from ad interactions to be more influential in our prediction model.

Read on for more.

Testing Backups With dbatools

Constantine Kokkinos shows off a dbatools cmdlet to test the last full backup:

This:

  • Defines a list of two servers (PowerShell lists are as easy as “”,””)

  • Pipes them to the Test-DbaLastBackup command.

  • Which then:

    • Gathers information about the last full backups for all of your databases on that instance.

    • Restores the backups to the Destination with a new name. If no Destination is specified, the originating server will be used.

    • The database is restored as “dbatools-testrestore-$databaseName” by default, which you can change with the -Prefix parameter.

    • The internal file names are also renamed to prevent conflicts with original database.

    • A DBCC CHECKTABLE is then performed.

    • And the test database is finally dropped.

Pretty snazzy.

Names Matter

Chris Webb is concerned about the under-use of the name “Power Query”:

In more recent times I’ve written posts with unwieldy names like “Introduction to Insert Topic Name Here in Power Query/Power BI/Excel 2016 Get & Transform” and in the future I suppose this will have to grow to “Introduction to Insert Topic Name Here in Power Query/Power BI/Excel 2016 Get & Transform/Analysis Services Data Loading/Common Data Service”. Tagging and categorising blog posts can help here, I know, but it’s the title of a blog post that’s the main determining factor as to whether it gets read or not when someone is looking at a list of search results. It’s getting ridiculous, but how else can I ensure that someone searching for the solution to a data loading problem in Excel 2016 Get & Transform will find a post I’ve written that contains the answer but shown in Power BI?

Commenters agree that there’s a lot of ambiguity and multiple naming, and that this hurts end users who don’t necessarily know that all of these technologies are really the same Power Query engine.

Scripting Multiple SQL Agent Jobs With SSMS

Steve Stedman shows how to script multiple SQL Agent jobs from SQL Server Management Studio:

While doing a recent server move, I came across the need to script all agent jobs and save them to a file so that if we needed to reference what was there after the move we could get at the agent jobs without having to restart the old SQL Server.

It made me realize that sometimes what you want to do is available in SQL Server Management Studio, but that perhaps it is not always entirely obvious.

Click through for the demo.  I’d probably use Powershell over this, but if you just need to do a one-time job move, this gets you going quickly.

Using Biml With Oracle

Shannon Lowder shows us that you can write Biml to connect to an Oracle database as well:

Next, I wanted to build a package from Biml. A package that uses a connection to my Oracle instance.  Defining connections to Oracle in Biml is similar to every other connection you’ve ever defined in Biml (lines 3-12). The difference here is the Provider. (line 10)  I took the Provider attribute for my Oracle connection from the manual package I created in my previous test.

Simply right click on your Oracle connection in the package and choose properties.  Then look at the Qualifier value. I take the whole value and copy and paste it into the Provider attribute in my BimlScript.

Once you have that defined, set up a Package with our two connections, and a data flow (lines 14 – 33).  This script builds a simple data flow to move data from a single table on Oracle over to SQL Server.  The only difference is, this time, the package is named Automatic, rather than Manual.

You could get as complex as you’d like in your example.

As far as Integration Services goes, Oracle is just another source or sink and as long as you have an appropriate driver, the flows work the same way as with any other database.

Categories

September 2017
MTWTFSS
« Aug  
 123
45678910
11121314151617
18192021222324
252627282930