Press "Enter" to skip to content

Day: March 6, 2024

Renaming Factor Levels in R

Steven Sanderson renames factor levels of a categorical variable:

Before we jump into renaming factor levels, let’s quickly recap what factors are and why they’re useful. Factors are used to represent categorical data in R. They store both the values of the categorical variables and their corresponding levels. Each level represents a unique category within the variable.

Click through for three methods you can use to pull this off.

Comments closed

A Primer on Pandas Queries

Vinod Chugani works with Pandas:

In the realm of data analysis, SQL stands as a mighty tool, renowned for its robust capabilities in managing and querying databases. However, Python’s pandas library brings SQL-like functionalities to the fingertips of analysts and data scientists, enabling sophisticated data manipulation and analysis without the need for a traditional SQL database. This exploration delves into applying SQL-like functions within Python to dissect and understand data, using the Ames Housing dataset as your canvas. The Ames Housing dataset, a comprehensive compilation of residential property sales in Ames, Iowa, from 2006 to 2010, serves as an ideal dataset for this exploration, offering a rich variety of features to analyze and derive insights from.

Click through for examples of how to use the query() function in conjunction with other Pandas functionality to answer questions of the data.

Comments closed

Recovering Deleted Oracle Connections in Visual Studio Code

Brendan Tierney goes on a search and rescue mission:

With the current early release, there is no way to organise your Database connections like you can in the full Oracle SQL Developer. We are told this will/might be possible in a future release but it might be later this year (or longer) before that feature will be available.

In a previous post, I showed how to import your connections from the full SQL Developer into SQL Dev VS Code. While this is a bit of a fudge, yet relatively straight forward to do, you may or may not want all those connections in your SQL Dev VS Code environment. Typically, you will use different tools, such as SQLcl, SQL*Plus, SQL Developer, etc to perform different tasks, and will only want those connections set up in one of those tools.

Click through to see what it takes to recover these.

Comments closed

Comparing TOP(1) + ORDER BY vs MAX() Performance in SQL Server

Andy Brownsword breaks out the stopwatch:

The TOP clause limits the number of results which are returned from a query, in this instance we’re focussing on a single result. In contrast, when using MAX we’re applying a function to our data to select the largest value from our data.

Let’s dive into some examples with the StackOverflow data, specifically the Votes table.

Read on for several scenarios and how the two perform. Things get a bit more complicated as you introduce other tables in joins and similar additional factors, but this gives you a good foundation for comparison.

Comments closed

Microsoft Fabric Governance & Administration: Tenant Settings

Nicky van Vroenhoven has a pair of posts on Microsoft Fabric administration, specifically around tenant settings. First up is a post on APIs:

Obviously, to use the Get Tenant Settings API you need to have at least Tenant.Read.All permissions, or have the Fabric Administrator role (or higher) in Azure.

There are a few use cases I see for getting these settings exported with this API:

  • Documentation purposes when you have multiple Fabric Administrators
  • Distributing tenant settings to users, and explaining why we (as a team of Administrators/within the Center of Excellence) made certain choices
  • Get notified of the changes in the tenant settings, without having to use Microsoft Defender or M365 Security & Compliance center like mentioned here

Nicky has a follow-up post on visual cues in the Tenant Settings page:

Today I want to talk about a new little addition Microsoft made to the Fabric Admin portal.

This change has actually been here for quite a while now, but I still think it’s worth mentioning because (1) I really like it, and (2) it’s also an important change that the community, and MVP’s in specific, has been requesting for quite some time.

Radhakrishnan Srinivasan and (members of) his team added visual cues to the Admin portal of Fabric.

Check out both posts for good information.

Comments closed

Query Compilation Timeouts and Query Store

Kendra Little diagnoses a problem:

Last November, a puzzle was really bothering me. Some queries from an application were timing out frequently after running for 30 seconds, but they were halfway invisible in the SQL Server.

I say “halfway invisible” because I could see the queries while they were running in SQL Server’s dynamic management views using free tools (sp_WhoIsActive and sp_BlitzWho).

But the queries had some odd characteristics:

Through the power of communication with other humans (eew, that idea sounds icky), Kendra was able to learn what the problem was and how you can track such issues outside of Query Store.

Comments closed

Improving Data Labels with Format Strings

Kurt Buhler has some new digs:

Format strings greatly improve the usefulness of your model, particularly when you apply custom and dynamic format strings creatively. However, there are considerations to keep in mind.

It is important that the values in a report display for users as they expect. Effective formatting of DAX measures and table columns improves your model, as values are easier to read and interpret. While formatting may seem simple at first glance, the flexibility of DAX and format strings in Power BI can create many opportunities for more creative and efficient tables or visuals.

Click through to understand why proper formatting of measures is so important on dashboards, and also congratulate Kurt on becoming an Italian. At least, I assume citizenship conveys when you do work for Marco & Alberto.

Comments closed