Press "Enter" to skip to content

Author: Kevin Feasel

tapply() and Ragged Arrays in R

Benjamin Smith explains how tapply() works:

While I saw other programmers use this function, I found myself unsure how of how it worked or knew when I would need to use it. In this blog I attempt to change that and explain the cryptic description by showing some applications with my commentary and how it compares to using the “tidy” approach with tidyverse.

My inspiration for writing this blog was from seeing Dr. Norm Matloff’s blog where he mentions the use of tapply() and his thoughts on the tidyverse. For a more thorough treatment on his critique of the tidyverse and “tidy” methods, check out his formal essay here.

Read on to learn the benefit of learning and using tapply().

Comments closed

Case-Sensitive String Comparisons and Case-Insensitive Tables

Meagan Longoria reminds us that case sensitivity was a huge mistake:

Here’s the scenario: You are using Python, perhaps in Azure Databricks, to manipulate data before inserting it into a SQL Database. Your source data is a flattened data extract and you need to create a unique list of values for an entity found in the data. For example, you have a dataset containing sales for the last month and you want a list of the unique products that have been sold. Then you insert the unique product values into a SQL table with a unique constraint, but you encounter issues on the insert related to unique values.

Click through for an example and how to extricate yourself from this scenario. Python certainly is not the only language to do this, so it’s good to know even if you don’t plan on using or supporting Python.

Comments closed

Querystrings and R Shiny

Thomas Williams passes along querystring data:

As background, a query string is part of a web page address. Query strings are used to pass information to web pages, in name/value pairs separated by an equals sign – for instance, user=Andrew or country=au. Name/value pairs are themselves separated by ampersands, so passing multiple values looks like user=Andrew&country=au.

Click through for an example of how it all works.

Comments closed

Incremental Refresh Parameters in Power BI

Chris Webb continues a series on using Power Automate to call the Power BI Enhanced Refresh API:

If you’ve read part 1 or part 2 of this series you’ll know how you can create a Power Automate custom connector to call the Power BI Enhanced Refresh API and get fine-grained control over your refreshes. In this post I will take a look at the two parameters in the Enhanced Refresh API for datasets that use incremental refresh: applyRefreshPolicy and effectiveDate. They are documented here but, as always, some worked examples are helpful to really understand how they work.

Click through to see how these parameters work.

Comments closed

OPENJSON Performance and Schemas

Dave Mason has a new blog theme and a post on OPENJSON performance:

Support for JSON data has been around in SQL Server for a while now, starting with SQL 2016. The OPENJSON rowset function is the built-in function that allows you to natively convert JSON text into a set of rows and columns. There are two options for using OPENJSON: with the default schema or with an explicit schema. There are performance implications for each, which I’ll review with some examples.

Dave has some nice tips for people working with JSON data in SQL Server.

Comments closed

The Mechanics of Transparent Data Encryption

Etienne Lopes takes us through the process of using Transparent Data Encryption:

Securing data has always been important but as time goes by, more and more data is available all around us, some of it is considered sensitive data and it becomes a major concern to protect it somehow, in fact in certain cases it is legally mandatory to comply with certain regulations (like GDPR). SQL Server offers a few options regarding data protection (either by means of encryption or obfuscation), TDE being one of them.

In this post I’ll explain what is TDE along with its use cases and I’ll use a thorough demo to show how to implement it in a database and how it works

I do tend to give TDE disrespect (disrespect that I think it deserves) but it does allow you to check a compliance box without enormous cost. The problem is, I don’t think it moves the needle in terms of proper security when the attacker has admin status on the machine hosting SQL Server and other techniques (e.g., encrypting backups, encrypting specific columns) are better at preventing security issues in other common data scenarios. I’m just not sure there’s a case where TDE helps and there isn’t already a better solution.

Comments closed

Continuing Arc-Enabled Data Services

Warwick Rudd continues a series on Azure Arc-Enabled Data Services. Part 5 takes us through what you can do with the Azure CLI:

In our previous post, we touched on the deployment of the Data Controller and being able to deploy via the Portal, Azure Data Studio, or CLI commands depending on whether you are implementing a directly or indirectly connected Data Controller.

Az Arcdata is a suite of CLI commands that allow command line management of the data controller and the Arc-enabled SQL Managed Instance once we have it configured.

Part 6 details the services available today:

Azure data services such as Azure SQL Managed Instance and Azure PostgreSQL are fully managed by Microsoft in the Azure Cloud. They provide you with evergreen environments because they are managed by Microsoft and always have the latest patches and feature offerings, while also providing you the ability to quickly and easily scale on demand based on the workload or requirements.

I do expect this set to grow over time.

Comments closed

Bulk Insert into Azure SQL DB using Python

Jose Manuel Jurado Diaz shares some customer notes:

Today, I’ve been working on a service request that our customer wants to improve the performance of a bulk insert process. Following, I would like to share my experience working on that.

Our customer mentioned that inserting data (100.000 rows) is taking 14 seconds in a database in Business Critical. I was able to reproduce this time using a single thread using a table with 20 columns.

A lot of this advice also applies to on-premises SQL Server and relates to using bulk inserts and picking good batch sizes. Similar advice to what we’d be doing with SQL Server Integration Services or any other ETL/ELT process, tailored to Python.

Comments closed

Paved a Repo and Put up a Parking Lot

Robert Harris warns against the desire of starting it all over:

We’re programmers. Programmers are, in their hearts, architects, and the first thing they want to do when they get to a site is to bulldoze the place flat and build something grand…It’s important to remember that when you start from scratch there is absolutely no reason to believe that you are going to do a better job than you did the first time.

JOEL SPOLSKY IN THINGS YOU SHOULD NEVER DO, PART 1

There is a fleeting moment in every software project when it is absolutely perfect. It is the time between clicking “New” and “Save” in your code editor. In that brief interval, limitless potential and beauty. In every moment that follows, compromise and doubt (but working software, too!).

There are a few threads to unravel here.

First, Chesterton’s fence: if you don’t know why a thing is there, you are probably not the right person to decide to remove it. If you understand why the code is there and exactly what it is doing, then you become qualified to decide what, if anything, needs to be changed.

Second, ego: I’m a great developer. The best developer I know. Heck, maybe the best developer in the world. Therefore, if I don’t immediately understand code, it must be because that code is bad. Most of us don’t think explicitly in these terms but we still end up in the conclusion of, “if I don’t immediately understand the code, it is bad.” Or even worse, “If the code does not work exactly the way I would have it work, it is bad.”

Third, unstated/misunderstood business requirements. Code often starts to get nasty because the business requirements changed on the original designers or there was a process of business evolution. If business requirements are still evolving, what makes you think you’re going to write code that won’t be just as ugly? If business requirements are not still evolving and you really understand the code, you have a chance. But that leads me to the next bit.

Fourth, the value of reformation. Refactoring is a common path for code reformation. Having lots of tests increases the safety net we have for reformation, as those tests are likely to catch some of the dumb mistakes we make and hopefully suss out some of the worst things.

Fifth, Javascript is a hole of pain.

Comments closed