Press "Enter" to skip to content

Curated SQL Posts

Skipping through Rows in a Large File with Powershell

Jay Robinson has a one-liner for us:

It came from working with enormous text files, and the gotchas that come with them. In this case, I had a 50gb data file that had something wrong with it, buried about 25% in. The file was being processed just fine, until it hit this unexpected data. And because the ETL app was written for performance first, there wasn’t a lot of data validation being done. So it’d just go boom when it hit that batch.

So what was wrong with the file? Well, in order to determine that, I had to see what was in that batch. But you can’t just open a 50gb file in Notepad. Yes, there are other ways to get around this, but here’s the one I chose:

Click through for Jay’s solution to the problem.

Comments closed

SSIS Catalog Error 27150

Andy Leonard diagnoses an error:

In case you’ve never encountered SSIS Catalog Error 27150, the error message reads:

The version of the project has changed since the instance of the execution has been created. Create a new execution instance and try again.

A customer recently encountered this error.
So… what happened?

Click through for the answer. I’ve had this happen as well for the same reason as Andy.

Comments closed

Packed/Batched Remote Procedure Calls in SQL Server

Bob Dorr digs in:

The SQL Server TDS protocol provides two main paths for query execution (Language and RPC events.)  You can trace these events using the Batch::Starting/Completed (Language) and RPC:Starting/Completed (RPC) XEvents.

Language events are text streams requiring full parsing and may be susceptible to injection attacks.  Language events also require educated guesses.  For example, should select 1 return a smallint, bigint?

Bob goes on to show an example of a simple call being susceptible to SQL injection, explains why Remote Procedure Calls (RPCs) are superior, and what packing (or batching) RPCs does for you.

Comments closed

Functional Programming and R

Anirban Shaw ties functional programming to R:

Functional Programming‘s relevance in the R programming language, a language primarily known for its prowess in data analysis and statistical computing, is particularly noteworthy. By leveraging functional programming, organizations can improve operational efficiency and gain a competitive edge

R’s ecosystem is enriched by functional programming paradigms, which enable developers and data scientists to write concise and expressive code for tasks such as data manipulation, transformation, and visualization.

In this article, we take a deep dive into the fundamental characteristics of R, the advantages of adopting functional programming within it and the essential concepts ingrained in the core of R. 

Read on to see how the two fit together. H/T R-Bloggers.

Comments closed

Working with Oracle Cloud Infrastructure from Python

Brendan Tierney peruses an SDK:

Having set up your Oracle Cloud account and perhaps initiated some services, such as an Autonomous Database or an AI Service, you have some options with how you’ll interact with those services. The first is to use the online interfaces and tools for them. Another is to access these OCI services from your local PC or laptop. This latter method gives you greater flexibility for accessing these services, automating various tasks and building out some applications. This post will step through how to install the necessary Python library, how to configure for connecting to OCI and will demo some of the simpler APIs for accessing the OCI services. This will help you verify your connection is working correctly.

Click through for the details.

Comments closed

Understanding Postgres Explain Plans

Muhammad Ali explains explain plans:

In a previous post titled Exploring Postgres Performance: A Deep Dive into pg_stat_statements, we discussed the utility of pg_stat_statements as a tool for identifying resource-intensive queries in terms of memory, CPU, or I/O usage.

Now, let’s suppose you’ve pinpointed certain queries you’d like to investigate further. The EXPLAIN command is used to generate execution plans. It includes:

Read on to see what it includes, how you can generate an explain plan, and some of the optional settings.

Comments closed

Scree Plots in R

Steven Sanderson builds a scree plot:

A scree plot is a line plot that shows the eigenvalues or variance explained by each principal component (PC) in a Principal Component Analysis (PCA). It is a useful tool for determining the number of PCs to retain in a PCA model.

In this blog post, we will show you how to create a scree plot in base R. We will use the iris dataset as an example.

Read on to learn more about the plot, as well as examples of how to create scree plots.

Comments closed

Embrace the Power of 5th Normal Form

I have a new video up:

In this video, we drill into the other most important normal form, learning what Fifth Normal Form (5NF) is, why Boyce-Codd Normal Form is not enough, and examples of why 5NF can be such a challenge to implement.

Until I read CJ Date’s Database Design and Relational Theory (2nd edition), my level of appreciation for 5th Normal Form was somewhat limited, but that’s mostly because I didn’t understand it well at all. I liked the connection trap example in this article, but Date’s book was the first really good explanation of 5NF and just how powerful it is. My hope is that I was successfully able to convey that power to audiences.

Comments closed