Press "Enter" to skip to content

Author: Kevin Feasel

A Data Governance by any other Name

Matthew Roche wants a re-naming:

To successfully implement managed self-service business intelligence at any non-trivial scale, you need data governance. To build and nurture a successful data culture, data governance is an essential part of the success.

Despite this fact, and despite the obvious value that data governance can provide, data governance has a bad reputation. Many people – likely including the leaders you need to be your ally if you’re working to build a data culture in your organization – have had negative experiences with data governance in the past, and now react negatively when the topic of data governance is raised.

They now treat data governance as a four-letter word.

Read the whole thing, though I do disagree with Matthew. Changing the name does not change the underlying problems; all it does is make the new name just as hated as the old one because the problems are still there. Call it Data Enablement if you’d like, but if the process is the same and the tools are the same, the outcome is the same, regardless of the name.

Comments closed

An Introduction to BugLab

Miltos Allamanis and Marc Brockschmidt take us through a new paper:

Finding and fixing bugs in code is a time-consuming, and often frustrating, part of everyday work for software developers. Can deep learning address this problem and help developers deliver better software, faster? In a new paper, Self-Supervised Bug Detection and Repair, presented at the 2021 Conference on Neural Information Processing Systems (NeurIPS 2021), we show a promising deep learning model, which we call BugLab. BugLab can be taught to detect and fix bugs, without using labelled data, through a “hide and seek” game.

I think there’s a lot more research required before we get to the point where this is useful in practical circumstances, but it’s exciting to see.

Comments closed

Pattern Matching in Scala

Kuldeepak Gupta shows off pattern matching in Scala:

Pattern Matching is a mechanism of checking a value against a Pattern. It gives a way of checking the given sequence of tokens for the presence of a specific pattern. Here, we match expressions against a pattern.

Compared to the ‘switch’ in C++, C, JAVA, there’s no fall through to the next alternative in Scala pattern matching. A Match error is thrown when no pattern matches.

This is a powerful part of functional programming.

Comments closed

Understanding the Oldest Page Wait

Tom Collins explains a database wait:

SQL Server Log truncation deletes inactive Virtual Log Files (VLF) from the SQL Server database transaction log . The Log truncation process frees  space in the logical log for reuse by the Physical transaction log. If no truncation occurs , eventually it will fill all the disk space allocated to physical log files.

SQL Server Log truncation can be delayed for a range of different reasons.A good starting point is to  query the sys.databases log_reuse_wait and log_reuse_wait_desc columns. This will supply different waits describing the reason for a delay 

Read on for more info about the OLDEST_PAGE wait.

Comments closed

Using GREATEST and LEAST in Azure SQL DB

Aaron Bertrand preps us for SQL Server 2022:

In an earlier tip, “Find MAX value from multiple columns in a SQL Server table,” Sergey Gigoyan showed us how to simulate GREATEST() and LEAST() functions, which are available in multiple database platforms but were – at least at the time – missing from Transact-SQL. These functions are now available in Azure SQL Database and Azure SQL Managed Instance, and will be coming in SQL Server 2022, so I thought it was a good time to revisit Sergey’s methods and compare.

Read on to see how the workaround compares.

Comments closed

Azure Synapse Analytics November Updates

James Serra keeps us up to date on Synapse:

Delta Lake support for serverless SQL is generally available: Azure Synapse has had preview-level support for serverless SQL pools querying the Delta Lake format. This enables BI and reporting tools to access data in Delta Lake format through standard T-SQL. With this latest update, the support is now Generally Available and can be used in production. See How to query Delta Lake files using serverless SQL pools

Click through for the full list of what James likes.

Comments closed

Deploying dbt on Databricks

Dave Eyler, et al, have a great announcement:

At Databricks, nothing makes us happier than making our users more productive, which is why we are delighted to announce a native adapter for dbt. It’s now easier than ever to develop robust data pipelines on Databricks using SQL.

dbt is a popular open source tool that lets a new breed of ‘analytics engineer’ build data pipelines using simple SQL. Everything is organized within directories, as plain text, making version control, deployment, and testability simple.

Click through for more information on how this works and how you can get the native adapter.

Comments closed

Lambda Expressions in Scala

Shubham Shrivastava explains how lambda expressions work in Scala:

Lambda expressions in Scala the syntax for these uses a symbol it’s an equals and a greater than and we refer to this as rocket ( => ). When we’re reading the code the idea of a lambda expression is a short literal expression that defines a function and typically these should not be overly long. so for example I could define a function for squaring values that looks something like this.

Lambda expressions are great in cases where you need to perform an operation exactly one time. If you create a separate function with its own name, there’s always a wonder in the back of a developer’s mind if this thing will get used again, and so it takes up a little bit of cognitive load. A lambda expression answers that conclusively: no, we won’t use this code again.

Comments closed