Press "Enter" to skip to content

Month: March 2024

Avoid the securityadmin Role

Etienne Lopes recommends against a particular SQL Server role:

I usually avoid using the two “radical” words: “never” and “always” but regarding the membership need for the “securityadmin” server role since SQL 2005/2008 onward, I find it hard to come up with a good reason why it should ever be used, especially considering the security risks involved. A few weeks ago, while checking permissions for some logins in a (critical) SQL Server 2019 instance, I came across some really worrying situations, among which I found this one… again! Although it wasn’t the worst I found there (the worst was to bad to even mention here) I felt impelled to write about this one, maybe because the risks may not be so obvious or are somewhat concealed… Well, let’s bring them to light! 

Cosnidering that securityadmin can increase its own rights to sysadmin, you’d might as well just grant that login sysadmin.

Comments closed

MySQL: INTO OUTFILE and INTO DUMPFILE

Chad Callihan makes a comparison:

I haven’t had a MySQL post for awhile, so it’s time to add some variety to the blog.

There are a couple of different ways to export data with a SELECT query in MySQL: INTO OUTFILE and INTO DUMPFILE. Let’s use the MySQL Sakila sample database and walk through some examples to compare these two options.

Read on to see when you might want to use each of these.

Comments closed

Restoring a Tablespace using Barman on Windows

Semab Tariq restores a database:

I recently had the opportunity to contribute to a customer project, where the objective was to establish a system for PostgreSQL full backups and seamless restoration. Considering Barman’s successful functionality on Linux, we decided to explore its compatibility with Windows. Secondly, no other tool claims to work on Windows to take backups and perform a restore

From official documentation it is mentioned that: 
Backup of a PostgreSQL server on Windows is possible, but it is still experimental because it is not yet part of our continuous integration system.

Click through for the walkthrough.

Comments closed

Removing Skew in Data with Python

Vinod Chugani kicks the lop-sided distribution to straighten it out:

Data transformations enable data scientists to refine, normalize, and standardize raw data into a format ripe for analysis. These transformations are not merely procedural steps; they are essential in mitigating biases, handling skewed distributions, and enhancing the robustness of statistical models. This post will primarily focus on how to address skewed data. By focusing on the ‘SalePrice’ and ‘YearBuilt’ attributes from the Ames housing dataset, we will provide examples of positive and negative skewed data and illustrate ways to normalize their distributions using transformations.

Read on to see what kinds of transformations are available.

Comments closed

Data Management with Open Table Formats

Anandaganesh Balakrishnan covers a few open-source products and formats:

Apache Iceberg is an open-source table format designed for large-scale data lakes, aiming to improve data reliability, performance, and scalability. Its architecture introduces several key components and concepts that address the challenges commonly associated with big data processing and analytics, such as managing large datasets, schema evolution, efficient querying, and ensuring transactional integrity. Here’s a deep dive into the core components and architectural design of Apache Iceberg:

Click through for a review of Iceberg, Hudi, and the Delta Lake format.

Comments closed

Choosing a Data Platform Technology

Jenn Junod builds a flow chart:

From a structural perspective you can think of data as existing on a continuum or gradient: on the one side, there’s highly structured data, like customer profiles. Each customer has a known name, address, phone number, and we can expect those to fall within certain formatting constraints. For example, a phone number will always be a set of numbers and special characters. On the other side, there’s unstructured data: the contents of an email, for example, are unstructured.

The shape of the data should define what kind of software you use to store and manipulate it.

Read on for several questions along these lines and recommendations based on your answers.

Comments closed

Configuring a New Powershell Installation

Jeff Hicks starts over:

The other day on X, I was asked about what things I would setup or configure on a new PowerShell installation. This is something I actually have thought about and face all the time when I setup a new demo virtual machine. I had been meaning to build new tooling to meet this challenge, and the question provided the spark I needed to get off my butt and get it done.

Read on for some thoughts on the topic, as well as a lengthy script to get you going.

Comments closed

That the Division of Labor is Limited by the Extent of the Market

Eitan Blumin lets me quote one of my favorite chapter headings from Adam Smith:

You see, in this past decade or so, we’ve all noticed a trend where more and more businesses are looking for DBAs who know a bit about everything: RDBMS, NoSQL, Key-Value databases, Cache Databases, Full-Text Search Databases, Vector Databases, and more. In response, more and more data professionals prefer to become multi-disciplinary, doing a bit of everything… Even if it means becoming a part-time Data Engineer / Data Analyst / Data Scientist / DevOps / whatever. Otherwise, they fear that they won’t be able to find enough job opportunities.

But, there’s a cost to that approach. Sometimes a very high cost.

Eitan argues in favor of an approach with part-time experts and full-time jacks-of-many-trades. For smaller companies, I completely agree: a software development company with 5 total employees usually can’t afford a full-time DBA or networking specialist or virtualization specialist, but they can still run into problems that an expert would easily be able to solve. But I think once you get past a couple dozen IT people at a company, there’s enough scope for some critical specialization. In other words, the market within a company grows (hush, people who are going to throw Ronald Coase at me for that statement—it’s metaphorical!) and allows for additional division of labor and specialization.

Anyhow, Eitan has some tips around determining whether you’d rather be a jack-of-all-trades or a master of something.

Comments closed

Renaming Factor Levels in R

Steven Sanderson renames factor levels of a categorical variable:

Before we jump into renaming factor levels, let’s quickly recap what factors are and why they’re useful. Factors are used to represent categorical data in R. They store both the values of the categorical variables and their corresponding levels. Each level represents a unique category within the variable.

Click through for three methods you can use to pull this off.

Comments closed

A Primer on Pandas Queries

Vinod Chugani works with Pandas:

In the realm of data analysis, SQL stands as a mighty tool, renowned for its robust capabilities in managing and querying databases. However, Python’s pandas library brings SQL-like functionalities to the fingertips of analysts and data scientists, enabling sophisticated data manipulation and analysis without the need for a traditional SQL database. This exploration delves into applying SQL-like functions within Python to dissect and understand data, using the Ames Housing dataset as your canvas. The Ames Housing dataset, a comprehensive compilation of residential property sales in Ames, Iowa, from 2006 to 2010, serves as an ideal dataset for this exploration, offering a rich variety of features to analyze and derive insights from.

Click through for examples of how to use the query() function in conjunction with other Pandas functionality to answer questions of the data.

Comments closed