Press "Enter" to skip to content

Curated SQL Posts

Spark Application Dependency Caching

Shu Wang, Biao He, and Minchu Yang talk turkey about dependencies:

In this blog post, we will share our analysis of Spark Dependency Management at LinkedIn, highlight interesting findings, and present our design choice of using a simple user-level cache over more complex alternatives. We will also discuss our rollout experience and lessons learned. Finally, we will demonstrate the impact of accelerating all Spark applications at LinkedIn at the cluster level. Faster Spark jobs translate to increased data freshness, leading to an enhanced member experience by way of more relevant recommendations, timely insights, effective abuse protection, and other similar improvements.

If you work with Spark to any serious extent, you’ll want to read this post.

Comments closed

Type 2 Dimension Loading in Redshift

Vaidy Kalpathy takes us through a bit of dimensional modeling in AWS Redshift:

Populating an SCD dimension table involves merging data from multiple source tables, which are usually normalized. SCD tables contain a pair of date columns (effective and expiry dates) that represent the record’s validity date range. Changes are inserted as new active records effective from the date of data loading, while simultaneously expiring the current active record on a previous day. During each data load, incoming change records are matched against existing active records, comparing each attribute value to determine whether existing records have changed or were deleted or are new records coming in.

Click through for the article.

Comments closed

Architectural Erosion and Technical Debt

Uli Homann and Eric Charran (via Ben Brauer) talk about the concept of architectural erosion:

The way Eric thinks about architectural erosion is when architects and engineers work together, they construct a system or solution. They have launched that solution into production. It’s performing well for a period of time and then change happens. Whether it’s a change in requirements, a change in infrastructure, or a change in customer habits, DevOps signals saying that people are using a certain feature versus they’re not using a certain feature. What ended up happening is there’s a moment in time in which we look at velocity around how do we implement this change and make the applications experience, do what the customer or the end user wants as quickly as possible. Then there’s the strategic picture of managing things like technical debt, which is if I do something tactical, I’m probably going to do it fast and cheap and not necessarily the “right way.” This then accrues to the architectural patterns, longevity and scalability and all those other types of things, and then that goes into my pile of technical debt.

Read on to learn more about the topic and what we, as technical professionals, can do to mitigate this risk.

Comments closed

Try Purview (Almost) for Free

Wolfgang Strasser wants to try Microsoft Purview but doesn’t want to break the bank:

And my reaction was – Nice, very nice.. I can try and create Microsoft Purview instances for free and test new features..

BUT: I wanted to be sure and check, how much metadata (sources, scan results, data assets, classifications) can fit into 1 MB of metadata.

Read on for Wolfgang’s test, as well as the full set of costs around trying out Purview.

Comments closed

Building Your Own TRY_PARSE Function

Aaron Bertrand gives it a go:

A while back, I wrote a couple of tips about simulating TRY_CONVERT – a function added in SQL Server 2012 – in earlier versions (see Part 1 and Part 2). Recently, someone in the community had a similar requirement: they wanted to make TRY_PARSE work for dates in SQL Server 2008. Now, I’m not one to help people stay on versions of SQL Server that are now 15 years old, but I do acknowledge that hands may be tied, and they can’t always control the version they’re stuck with.

Aaron does this community member a solid, though it’s probably something you’d never want to use if you’re on SQL Server 2012 or later.

Comments closed

The Importance of Trace Flag 460

Jonathan Kehayias helps me understand that Trace Flag 460 is one of the best trace flags ever:

For most people that are reading this post, I’d venture to guess that you have no idea what Trace Flag 460 is or when you would use it. Well first off, let me tell you it’s a fully documented and supported trace flag, and it’s totally safe. In fact, on SQL Server 2019 and higher it is the default behavior when you create a new database. What does it do? It makes troubleshooting string or binary truncation issues easier by changing the error message that is returned from message ID 8152 and replaces it instead with message ID 2628. The trace flag is also available in SQL Server 2016 SP2 CU6+ and SQL Server 2017 CU12.

Read on to learn more about it and to remove a potential bit of confusion in the documentation.

Comments closed

Working with IP Addresses in Powershell

Bill Kindle takes us through several Powershell cmdlets:

A common SysAdmin task involves managing a PowerShell IP configuration. Typically, one performs this task using the GUI, which is OK. But PowerShell can do the same job and more, which you can put into scripting.

In this tutorial, you will learn how to use a few network management cmdlets in PowerShell to help manage a Windows host’s IP address, gateway, and DNS settings.

Understanding how to do this becomes even more important if you’re running Windows Server Core, where you don’t have too many choices other than rolling with Powershell.

Comments closed

Tracking Home Heating Oil Prices in R

Steven Sanderson charts some prices:

If you live in New York and rely on heating oil to keep your home warm during the colder months, you know how important it is to keep track of heating oil prices. Fortunately, with a bit of R code, you can easily access the latest heating oil prices in New York.

The code uses the {dplyr} package to clean and manipulate the data, as well as the {timetk} package to plot the time series.

Read on for an overview of what the code does, followed by the code itself and a time series plot at the end.

Comments closed

Tracking Network Errors with WASP

Thoe Roe gives us an introduction to Network Error Logging:

Heads up! We’re about to launch WASP, a Web Application Security Platform. The aim of WASP is to help you manage (well, you guessed it) the security of you application using Content Security Policy and Network Error Logging. We’ll be chatting about it more in a full blog post nearer the time.

Read on to learn about what Network Error Logging is, how you can activate it for a website, and what information you get back as a result.

Comments closed

Enabling Powershell’s Strict Mode

Patrick Gruenauer grabs the ruler:

PowerShell is very forgiving of errors. For example, if you call something that does not exist, then no error message is displayed. In this short article I want to show you how to make PowerShell a bit more strict with the strict mode.

Consider you are calling a variable that doesn’t exist. PowerShell will display no errors.

Granted, that laxity isn’t On Error Resume Next level bad, but Patrick shows us a way to toughen up the interpreter’s responses.

Comments closed