Press "Enter" to skip to content

Curated SQL Posts

Loading Data from Pandas into Snowflake

Anil Kumar Moka loads some data:

Loading data into Snowflake is a common need. Using Python and pandas is a common go-to solution for data professionals. Whether you’re pulling data from a relational database, wrangling a CSV file, or prototyping a new pipeline, this combination leverages pandas’ intuitive data manipulation and Snowflake’s cloud-native scalability. But let’s be real—data loading isn’t always a simple task.

Files go missing, connections drop, and type mismatches pop up when you least expect them. That’s why robust error handling isn’t just nice-to-have; it’s essential for anything you’d trust in production. In this guide, we’ll walk through the fundamentals of getting data into Snowflake, explore practical examples with pandas and SQLAlchemy, and equip you with the tools to build a dependable, real-world-ready pipeline. Let’s dive in and make your data loading process as smooth as possible!

Read on for a quick primer around data loading and some of the sanity checking we should be doing along the way.

Leave a Comment

Finding Oracle Blogs

Brendan Tierney theoretically makes my life easier:

A regular question I get asked is, “is there a list of Oracle-related Blogs?” and “what people/blogs should I follow to learn more about Oracle Database?” This typically gets asked by people in the early stages of their careers and even by those who have been around for a while.

You could ask these questions to twenty different (experienced) people, and you’d get largely the same answers with some variations. These variations would be down to their preferences on how certain people cover certain topics. This comes down to experience of following lots of people and learning over time.

Click through for a list of top Oracle blogs. I should probably check out some of them, though I don’t tend to do much with Oracle here. But if you do, there’s a starting point with 100 separate blogs to check out.

Leave a Comment

What’s New with OneLake Shortcuts

Miquella de Boer gives us an update:

Microsoft Fabric shortcuts enable organizations to unify their data across various domains and clouds by creating a single virtual data lake. These shortcuts act as symbolic links to data in different storage locations, simplifying access and reducing the need for multiple copies.

OneLake serves as the central hub for all analytics data. By using OneLake shortcuts, organizations can connect to existing data sources like Azure and AWS through a unified namespace, streamlining workflows and enhancing collaboration.

Click through for several feature improvements for shortcuts.

Leave a Comment

Time Range Generation in Data Diluvium

Adron Hall extends Data Diluvium:

Following up on my previous posts about adding humidity and temperature data generation to Data Diluvium, I’m now adding a Time Range generator. I decided this would be a nice addition to give any graphing of the data a good look. This will complete the trio of generators I needed for my TimeScale DB setup. While humidity and temperature provide the environmental data, the Time Range generator ensures we have properly spaced time points for our time-series analysis.

Click through to see how it works.

Leave a Comment

Checking Who Created a Table in SQL Server

Burt King reviews the logs:

I have noticed new SQL Server database objects have been created and want to know how we can track down who created these objects. What options are available in SQL Server? In this article, learn how in SQL Server to check who created a table or other objects.

Burt shows a couple of techniques for this, though I’d lean heavily on using Extended Events over a server-side trace or Profiler for the task.

Leave a Comment

Converting a CSV to Parquet with DuckDB and Polars in R

Michael Mayer makes a swap:

In this recent post, we have used Polars and DuckDB to convert a large CSV file to Parquet in steaming mode – and Python.

Different people have contacted me and asked: “and in R?”

Simple answer: We have DuckDB, and we have different Polars bindings. Here, we are using {polars} which is currently being overhauled into {neopandas}.

Click through for the comparison.

Leave a Comment

Reactable Tables with Sparklines in Shiny Apps

Osheen MacOscar continues a series:

This is the third blog in a series about the {sparkline} R package for inline data visualisations. You can read the first one about getting started with the package here and the second one about embedding them in HTML tables with the {reactable} package here.

In this blog I am taking it a step further and demonstrating how to use our sparkline reactable table in a Shiny app. Thankfully {reactable} has some helpful functions that make this super easy! I will also look at using a dynamic traffic light image in a reactable table at the end.

Click through to see how it all works.

Leave a Comment

Calculating a Matrix Inversion in SQL Server

Sebastiao Pereira performs matrix math in-database:

There are numerous applications to obtain a Matrix inverse for a given Matrix. Is it possible to do it using only SQL Server? Read on to learn how to build a matrix inverse calculator using a set of SQL Server custom functions.

I expect this to be extremely slow in comparison to GPU-based methods using a language like C, but this approach maximizes style points.

Leave a Comment

Avoid Exposing PostgreSQL Port 5432 to the Internet

Christophe Pettus shares some good advice:

Sometimes, we run into a client who has port 5432 exposed to the public Internet, usually as a convenience measure to allow remote applications to access the database without having to go through an intermediate server appllication.

Do not do this.

This is the equivalent of exposing port 1433 on a SQL Server instance to the broader internet, and is a bad idea for many of the same reasons.

Leave a Comment