Press "Enter" to skip to content

Month: January 2020

Databricks Automated Deployment and Testing

Li Yu, et al, explain how to use Databricks notebooks and MLflow to automate deployment and testing of Spark solutions:

Today many data science (DS) organizations are accelerating the agile analytics development process using Databricks notebooks.  Fully leveraging the distributed computing power of Apache Spark™, these organizations are able to interact easily with data at multi-terabytes scale, from exploration to fast prototype and all the way to productionize sophisticated machine learning (ML) models.  As fast iteration is achieved at high velocity, what has become increasingly evident is that it is non-trivial to manage the DS life cycle for efficiency, reproducibility, and high-quality. The challenge multiplies in large enterprises where data volume grows exponentially, the expectation of ROI is high on getting business value from data, and cross-functional collaborations are common.

In this blog, we introduce a joint work with Iterable that hardens the DS process with best practices from software development.  This approach automates building, testing, and deployment of DS workflow from inside Databricks notebooks and integrates fully with MLflow and Databricks CLI. It enables proper version control and comprehensive logging of important metrics, including functional and integration tests, model performance metrics, and data lineage. All of these are achieved without the need to maintain a separate build server.

Read on to see how.

Leave a Comment

Querying SQL Server Replicas Under Load

Taryn Pratt points out that replicas of data can contain stale data:

Last week at Stack Overflow we had an internal hack-a-thon, or as we call it, a make-a-thon. I was on the bug-bashing team, which is the team that attempts to fix smallish bugs we haven’t gotten around to fixing, due to other time-constraints. I was tagged to investigate a bug about duplicate badges being awarded because it looked to possibly be an easy fix in SQL. At first glance it looked simple enough, but once I started digging in, I figured out very quickly it wouldn’t be.

It’s an interesting problem, but no solutions in the post. It’s a hard problem.

Leave a Comment

Managing Systems with Azure Arc

Robert Smit takes us through Azure Arc:

This Blog post is about Azure Arc, how to set this up and get you started with Azure Arc. For customers who want to simplify complex and distributed environments across on-premises, edge and multi cloud, Azure Arc enables deployment of Azure services anywhere and extends Azure management to any infrastructure.

So Azure Arc is not a replacement for the old Azure Server manager tools! So no remote RDP or open MMC only log analytics, policy’s, CLI etc. https://robertsmit.wordpress.com/2016/08/25/azure-server-management-tools-manage-your-servers-from-anywhere-servermgmt-azure-smt/

Click through for a demonstration.

Leave a Comment

Finding SSRS Log Files

Rudy Rodarte explains where you can find SQL Server Reporting Services log files:

After many years working with SQL Server Reporting Services, I’ve gotten myself into a few situations where I had to examine the SSRS log to see what was happening with data sources, subscriptions, and other, unique SSRS particulars. For a time, I thought that the regular SQL Server Error Log was enough. But experience has taught me that there is much more information inside of the SSRS log file than the SQL Server Agent and Service logs. But, you must know where to look to find the SSRS Log file.

Those logs often contain good information.

Leave a Comment

Transactions Rolling Back During CHECKDB

Paul Randal explains a long-standing bug in CHECKDB:

Continuing the database snapshot theme from the previous Curious Case post, I had another question from someone who was concerned about transactions rolling back during DBCC CHECKDB. They’d just noticed the messages in the error log saying that when DBCC CHECKDB was executed, it was causing transactions to roll back in the database – and how could that possibly be allowed to happen? They said they panicked and stopped all DBCC CHECKDB executions.

There’s no need to panic. The problem is actually a bug in the database snapshot code that’s been there since SQL Server 2005, where it reports the wrong database name.

Read on for a demo; it’s not a dangerous problem.

Leave a Comment

Memory Optimizer Advisor

Monica Rathbun takes us through the Memory Optimization Advisor in SQL Server Management Studio:

Previously I wrote about In-Memory Optimized Tables, in this blog I am going to discuss how to determine which tables could benefit from being In-Memory by using a tool called Memory Optimization Advisor (MOA). This a is a tool built into SQL Server Management Studio (SSMS) that will inform you of which tables could benefit  using In Memory OLTP capabilities and which may have non supported features. Once identified, MOA will help you to actually migrate that table and data to be optimized. Let’s see how it works by walking through it using a table I use for demonstrations in AdventureWorks2016CTP3. Since this is a smaller table and doesn’t incur a ton writes it is not a good use case, however, for simplicity I am using it for this demo.

This is good for seeing which tables could easily move to memory-optimized and which you shouldn’t even try.

Leave a Comment

Solving the Spelling Bee Honeycomb Puzzle

David Robinson has fun with puzzle-solving:

Solving this puzzle in R is interesting enough, but it’s particularly challenging to do so in a computationally efficient way. As much as I love the tidyverse, this, like the “lost boarding pass” puzzle and Emily Robinson’s evaluation of the best Pokémon team, serves as a great example of using R’s matrix operations to work efficiently with data.

I’ve done a lot of puzzles recently, and I realized that showing the end result isn’t a representation of my thought process. I don’t show all the dead ends and bugs, or explain why I ended up choosing a particular path. So in the same spirit as my Tidy Tuesday screencasts, I recorded myself solving this puzzle (though not the process of turning it into a blog post).

Most of the post is analysis around the problem, but you do get a viable solution as well.

Leave a Comment

Cluster-Based Image Analysis and Reduction

Sebastian Sauer takes an image and reduces it to a group of colors:

This post is a remake of this casestudy: https://fallstudien.netlify.com/fallstudie_bildanalyse/bildanalyse

brought to you by Karsten Lübke.

The main purpose is to replace the base R command that Karsten used with a more tidyverse-friendly style. I think that’s easier (for me).

We will compute a cluster analysis to find the typical RGB color per cluster.

Click through for quite a bit of R code and a couple interesting turns.

Leave a Comment

Problems Installing SQL Server from a Mounted ISO

Randolph West recommends against installing SQL Server from a mounted ISO file:

In the old days, software used to be distributed on physical media like diskette, CD, and finally DVD. Nowadays we download and install software as standalone files. Even enterprise software like SQL Server is available for download in this way, usually distributed as an ISO file. Once you’ve downloaded it, you can mount that ISO file as a virtual drive and access it as though it was on physical media.

Or you can do the right and proper thing, and extract the ISO file to a network drive or local storage first, using a tool like 7-Zip.

I don’t think I’ve ever seen the problems Randolph refers to, though I’ll readily admit that Randolph has quite a few more installations done than I.

Leave a Comment

Understanding Implicit Conversions

Jared Poche takes us through the concept of implicit conversions:

A quick search will tell you that implicit conversions are pretty awful for performance, and in particular drive CPU usage. That’s not news. There is an aspect of this I think a lot of engineers don’t understand; why does it cause performance issues?

An implicit conversion occurs when you try to compare between or assign data across two different data types, without converting one yourself. If you used a CAST or CONVERT, it would be an explicit conversion. When SQL Server does it for you, it’s an implicit conversion, and these can have a real impact on your execution plans. Not all combinations of types can be converted implicitly, for a full list look here.

What’s interesting in this is that there were some cases where Jared expected implicit conversion and the rules indicate that there should be implicit conversion, but the database optimizer saw through his ruse.

Leave a Comment