Press "Enter" to skip to content

Day: August 16, 2018

Last-Click Attribution With Databricks Delta

Caryl Yuhas and Denny Lee give us an example of building a last-click digital marketing attribution model with Databricks Delta:

The first thing we will need to do is to establish the impression and conversion data streams.   The impression data stream provides us a real-time view of the attributes associated with those customers who were served the digital ad (impression) while the conversion stream denotes customers who have performed an action (e.g. click the ad, purchased an item, etc.) based on that ad.

With Structured Streaming in Databricks, you can quickly plug into the stream as Databricks supports direct connectivity to Kafka (Apache KafkaApache Kafka on AWSApache Kafka on HDInsight) and Kinesis as noted in the following code snippet (this is for impressions, repeat this step for conversions)

This is definitely an interesting approach to the problem.  Check it out.

Comments closed

Bayesian Neural Networks

Yoel Zeldes thinks about neural networks from a different perspective:

The term logP(w), which represents our prior, acts as a regularization term. Choosing a Gaussian distribution with mean 0 as the prior, you’ll get the mathematical equivalence of L2 regularization.

Now that we start thinking about neural networks as probabilistic creatures, we can let the fun begin. For start, who says we have to output one set of weights at the end of the training process? What if instead of learning the model’s weights, we learn a distribution over the weights? This will allow us to estimate uncertainty over the weights. So how do we do that?

It’s an interesting approach to the problem.

Comments closed

Microsoft R Open 3.5.1

David Smith announces Microsoft R Open 3.5.1:

Microsoft R Open 3.5.1 has been released, combining the latest R language engine with multi-processor performance and tools for managing R packages reproducibly. You can download Microsoft R Open 3.5.1 for Windows, Mac and Linux from MRAN now. Microsoft R Open is 100% compatible with all R scripts and packages, and works with all your favorite R interfaces and development environments.

This update brings a number of minor fixes to the R language engine from the R core team. It also makes available a host of new R packages contributed by the community, including packages for downloading financial data, connecting with analytics systems, applying machine learning algorithms and statistical models, and many more. New R packages are released every day, and you can access packages released after the 1 August 2018 CRAN snapshot used by MRO 3.5.1 using the checkpoint package.

Read on for more and check out the updates.

Comments closed

Dealing With CheckDB Error Message 824 Level 24

Steve Stedman has a post on fixing a database which has experienced an incorrect pageid error:

Msg 824, Level 24, State 2, Line 1

SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:2806320; actual 0:0).  It occurred during a read of page (1:xxxxx) in database ID 5 at offset 0x00000xxxxx0000 in file ‘C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\YourDatabaseName.mdf’.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

Since this is one of those things that I regularly work with, I thought I would see what other people are saying about this error message, and boy oh boy did I found some crazy and outright damaging suggestions

Steve puts together a bunch of really bad advice and explains why you shouldn’t follow it.  Read the whole thing and listen to Steve’s advice, not the bad advice.

Comments closed

Getting An Accurate Query Execution Time

Grant Fritchey shares some tips on accurate query time estimation:

Before we get into all the choices and compare them, let’s baseline on methodology and a query to use.

Not sure why, but many people give me blow back when I say “on average, this query runs in X amount of time.” The feedback goes “You can’t say that. What if it was just blocking or resources or…” I get it. Run a query one time, change something, run that query again, declare the problem solved, is not what I’m suggesting. Notice the key word and trick phrase “on average.” I don’t run the query once. I run it several times, capture them all, then get the average of the durations.

The observer effect is in full force with a couple of the techniques Grant shows, but the rest are generally stable, which is a good thing.

Comments closed

Performing Linear Regression With Power BI

Jason Cantrell shows how to create a simple linear regression in Power BI:

Linear Regression is a very useful statistical tool that helps us understand the relationship between variables and the effects they have on each other. It can be used across many industries in a variety of ways – from spurring value to gaining customer insight – to benefit business.

The Simple Linear Regression model allows us to summarize and examine relationships between two variables. It uses a single independent variable and a single dependent variable and finds a linear function that predicts the dependent variable values as a function of the independent variables.

If you want real linear regression, drop in an R or Python script.

Comments closed

When MS_SSISServerCleanupJobLogin Is Orphaned

Sreekanth Bandarla noticed a problem in cleaning up SSIS metadata:

Couple of weeks ago I was analyzing a server for space and noticed SSISDB database was abnormally huge (this Instance was running just a handful of packages). I noticed couple of internal schema tables in SSISDB were huge (with some hundreds of millions of rows), well that’s not right. There should be SSIS Server maintenance job which SQL server creates to purge older entries based on the retention settings right? My immediate action was to check the retention period set and what’s the status of the job.  As I suspected, the job was failing (looks like this has been failing since ages) with below error.

The job failed.  The Job was invoked by Schedule 9 (SSISDB Scheduler).  The last step to run was step 1 (SSIS Server Operation Records Maintenance).
Execute as Login failed for the requested login ‘##MS_SSISServerCleanupJobLogin##’

Read on for the root cause and solution.

Comments closed

Rant: NoSQL Isn’t A Thing

Grant Fritchey is in rare form today:

Go and search through it for a NoSQL data management system. I’ll wait.

You found one that was named NoSQL didn’t you. Oracle NoSQL, because Oracle. Of course. However, under the Database Model what did it say? Document Store. Why?

BECAUSE NOSQL IS NOT A DATA STORAGE ENGINE!

There is not a NoSQL thing to use. You can’t compare NoSQL to anything because NoSQL is just a different screed, a rant, a tantrum. A bunch of influential developers threw themselves on the floor, screaming and kicking, spittle flying everywhere, because they didn’t want to eat their broccoli.

Wait, that was my kids when they were three.

As the voice of reason (and you know you’re in trouble when that’s the case!), non-relational engines take up 6 of the top 15 slots.  There are particular advantages to non-relational data stores—as Grant willingly notes—and as companies grow, specialized data storage can become quite useful.  But relational databases are a great starting point and for good reason:  they solve a subset of problems extremely well and solve most problems well enough.  This is probably also a good place to drop in a reference to Feasel’s Law.

Comments closed

Default Parameter Values In Powershell

Chrissy LeMaire explains what $PSDefaultParameterValues is and places where she finds it useful:

$PSDefaultParameterValues is a hashtable available in PowerShell that can set defaults for any command that you run. In it’s simplest form, setting a default parameter value can look like this:

After running the above code, Get-DbaDatabase will show verbose output every time it’s executed, without me having to specify -Verbose. If I need to override that verbose flag for some reason, I can simply add -Verbose:$false to my Get-DbaDatabase command.

Read on for plenty of good use cases and additional resources from sharp people.

Comments closed