Press "Enter" to skip to content

Author: Kevin Feasel

Running SemPy from Microsoft Fabric Notebooks

Gilbert Quevauvilliers sets up an environment:

Below is where I had an error when trying to run a notebook via a data pipeline and it failed.

Below are the steps to get this working.

This was the error message I got as shown below.

Notebook execution failed at Notebook service with http status code – ‘200’, please check the Run logs on Notebook, additional details – ‘Error name – MagicUsageError, Error value – %pip magic command is disabled.’ :

Read on to see how you can fix this error and get SemPy running.

Comments closed

Input, Output, & Input/Output Parameters in Oracle & Postgres Procedures & Functions

Akhil Reddy Banappagari makes me use too many ampersands:

When migrating Oracle routines to PostgreSQL, handling OUT and INOUT parameters can be tricky. Understanding the distinctions between Oracle and PostgreSQL in how they manage these parameters is essential for a successful migration. This knowledge helps you smoothly adjust your routines, ensuring your code works well without any issues. In this article, we shall explore IN, OUT and INOUT parameters in Oracle and PostgreSQL and understand some of the important differences.

Read on to see how these work in Postgres and Oracle.

Comments closed

A Primer on Vector Similarity Search

Pavan Belagatti talks vectors:

In the realm of generative AI, vectors play a crucial role as a means of representing and manipulating complex data. Within this context, vectors are often high-dimensional arrays of numbers that encode significant amounts of information. For instance, in the case of image generation, each image can be converted into a vector representing its pixel values or more abstract features extracted through deep learning models.

These vectors become the language through which AI algorithms understand and generate new content. By navigating and modifying these vectors in a multidimensional space, generative AI produces new, synthetic instances of data — whether images, sounds or text — that mimic the characteristics of the original dataset. This vector manipulation is at the heart of AI’s ability to learn from data and generate realistic outputs based on that learning.

Read on for a high-level overview of the topic.

Comments closed

Subsetting Data Frames in R using Multiple Conditions

Steven Sanderson can’t stop at one filter:

In data analysis with R, subsetting data frames based on multiple conditions is a common task. It allows us to extract specific subsets of data that meet certain criteria. In this blog post, we will explore how to subset a data frame using three different methods: base R’s subset() function, dplyr’s filter() function, and the data.table package.

Click through for examples.

Comments closed

Avoid the securityadmin Role

Etienne Lopes recommends against a particular SQL Server role:

I usually avoid using the two “radical” words: “never” and “always” but regarding the membership need for the “securityadmin” server role since SQL 2005/2008 onward, I find it hard to come up with a good reason why it should ever be used, especially considering the security risks involved. A few weeks ago, while checking permissions for some logins in a (critical) SQL Server 2019 instance, I came across some really worrying situations, among which I found this one… again! Although it wasn’t the worst I found there (the worst was to bad to even mention here) I felt impelled to write about this one, maybe because the risks may not be so obvious or are somewhat concealed… Well, let’s bring them to light! 

Cosnidering that securityadmin can increase its own rights to sysadmin, you’d might as well just grant that login sysadmin.

Comments closed

Reading Azure SQL Audit Logs from Azure Storage

Matt Changchien covers a strange scenario:

When you read an Azure SQL Database audit log from Azure Storage using sys.fn_get_audit_file, you might encounter a situation where the audit log appears non-empty, but the query still returns an empty result. This discrepancy can be puzzling, especially when the official documentation doesn’t explicitly mention any limitations or requirements for the sys.fn_get_audit_file system function.

In this post, I will shed light on these limitations and demonstrate them to provide clarity.

Read on to see when this might happen and what you can do about it.

Comments closed

MySQL: INTO OUTFILE and INTO DUMPFILE

Chad Callihan makes a comparison:

I haven’t had a MySQL post for awhile, so it’s time to add some variety to the blog.

There are a couple of different ways to export data with a SELECT query in MySQL: INTO OUTFILE and INTO DUMPFILE. Let’s use the MySQL Sakila sample database and walk through some examples to compare these two options.

Read on to see when you might want to use each of these.

Comments closed

Restoring a Tablespace using Barman on Windows

Semab Tariq restores a database:

I recently had the opportunity to contribute to a customer project, where the objective was to establish a system for PostgreSQL full backups and seamless restoration. Considering Barman’s successful functionality on Linux, we decided to explore its compatibility with Windows. Secondly, no other tool claims to work on Windows to take backups and perform a restore

From official documentation it is mentioned that: 
Backup of a PostgreSQL server on Windows is possible, but it is still experimental because it is not yet part of our continuous integration system.

Click through for the walkthrough.

Comments closed

Removing Skew in Data with Python

Vinod Chugani kicks the lop-sided distribution to straighten it out:

Data transformations enable data scientists to refine, normalize, and standardize raw data into a format ripe for analysis. These transformations are not merely procedural steps; they are essential in mitigating biases, handling skewed distributions, and enhancing the robustness of statistical models. This post will primarily focus on how to address skewed data. By focusing on the ‘SalePrice’ and ‘YearBuilt’ attributes from the Ames housing dataset, we will provide examples of positive and negative skewed data and illustrate ways to normalize their distributions using transformations.

Read on to see what kinds of transformations are available.

Comments closed

Data Management with Open Table Formats

Anandaganesh Balakrishnan covers a few open-source products and formats:

Apache Iceberg is an open-source table format designed for large-scale data lakes, aiming to improve data reliability, performance, and scalability. Its architecture introduces several key components and concepts that address the challenges commonly associated with big data processing and analytics, such as managing large datasets, schema evolution, efficient querying, and ensuring transactional integrity. Here’s a deep dive into the core components and architectural design of Apache Iceberg:

Click through for a review of Iceberg, Hudi, and the Delta Lake format.

Comments closed