Press "Enter" to skip to content

Month: February 2024

Visualizing Genomics Data with Streamlit in Snowflake

Rebecca O’Connor builds an app:

The blog describes how this data is complemented with the following two additional data sets:

  • An Annotation dataset
  • A Panel Dataset

Simple SQL queries is then used gain answers to a multitude of questions held within the vast amount of data.

I utilised the same datasets in order to create a streamlit app.

Click through for the code. This is the reason why I like Streamlit so much: you can build an interactive data-centric application very easily. Granted, you can abuse Streamlit pretty hard, but it is powerful.

Comments closed

The Value of the keyring Package

Maelle Salmon looks at a good package in R

Does your package need the user to provide secrets, like API tokens, to work? Have you considered telling your package users about the keyring package, or even forcing them to use it?

The keyring package maintained by Gábor Csárdi is a package that accesses the system credential store from R: each operating system has a special place for storing secrets securely, that keyring knows how to interact with. The credential store can hold several keyrings, each keyring can be protected by a specific password and can hold several keys which are the secrets.

Read on for several advantages of using the keyring package.

Comments closed

Find and Replace Text Strings in SQL

Ed Pollack deals with the lack of regular expressions in T-SQL:

This article is focused on how to write, simplify, and get the most out of string data using the least amount of code to do so. It’s a reminder that code complexity is equivalent to technical debt and that in these scenarios, less is always more.

Text-manipulating functions will be introduced and reviewed throughout this article, so feel free to perform searches against it to find what you are looking for.

Read on for mechanisms to split strings into multiple rows, build strings from multiple rows, and search+replace within strings.

Comments closed

An Overview of DAX Visual Calculations

Teo Lachev takes us through visual calculations in DAX:

Suppose you need a measure that calculates the difference between the product categories in the order they were sorted in the visual. Implementing this as a regular DAX measure is a challenge. Yet, if we had a way to work with the cells in the visual, we can easily find a way to get this to work. Ideally, this would work similar in Excel, but DAX doesn’t know a think about cell references. However, visual calculations do.

As always, Teo gives us a view of the good, the bad, and the ugly with this feature.

Comments closed

Finding the First of Many Similar Rows in SQL

Tobias Petry shows off a feature in Postgres:

SQL is a straightforward and expressive language, but it sometimes lacks constructs for writing queries in a simple way. It’s more complicated than it should be to write a query to get, e.g., only the most expensive order for every customer of the current fiscal year. You can either use PostgreSQL’s vendor-specific DISTINCT ON feature or window functions for every other database.

Click through for the normal method (which works for MySQL, Oracle, SQL Server, etc.) and the Postgres way.

Comments closed

Renaming Multiple Columns in a PySpark Notebook

Gilbert Quevauvilliers wants one rename to rule them all:

Following on from my previous blog post this blog post I’m going to demonstrate how to bulk rename column names in a single step instead of having to rename them individually.

The reason this came about is because I had a set of data where the column names had the square brackets which I wanted to remove.

As shown below I have highlighted 2 column names with the square brackets.

Read on to see how you can perform somewhat-generic rename operations in Spark notebooks.

Comments closed

Feature Request: Right-Click to Open Plan in New Tab

Erik Darling wants you to vote:

One problem I run into regularly is when I’m tuning a query, and I want to keep one of the resulting execution plans available, so I can:

  1. Compare it after I make some other changes and run the query again
  2. Do some additional analysis without worrying about accidentally losing the plan

The only way to do that is to save the plan, stick the XML in another tool, or keep opening new query tabs to run things in, where I won’t lose the plan.

Vote for Erik’s idea.

Comments closed

Azure Data Studio 1.48 Now Available

David Levy gives us the latest:

The February release of Azure Data Studio (version 1.48) includes several new features and enhancements, such as:

  • Support for restores from URLs, including S3-compatible storage, via the restore dialog
  • Query Editor now includes SPIDs in tab titles
  • You can now specify a connection when launching ADS from the command line
  • You can now enable Ledger when creating a database in the Create Database dialog

Read on for more about each of these topics.

Comments closed

Melting Datasets in R

Steven Sanderson performs a melt():

The melt() function in the data.table package is an extremely useful tool for reshaping datasets in R. However, for beginners, understanding how to use melt() can be tricky. In this post, I’ll walk through several examples to demonstrate how to use melt() to move from wide to long data formats.

“Melting,” by the way, is the R term for unpivoting data.

Comments closed

Generators in Python

Jack Wallen generates some values:

Within the realm of programming, a generator is a routine that is used to control interaction within a loop. Generators are useful when you want to produce a large sequence of values without storing them in memory at once. The ability to create a large sequence without using up memory is important, especially when dealing with Python programs that generate a large amount of information, such as a long sequence of numbers.

Read on to see how generators work and how to build a custom generator.

Comments closed