Press "Enter" to skip to content

Curated SQL Posts

The End of the NFT Bubble(?)

Stephanie Glen has music to my ears:

Non-fungible tokens (NFTs), tradable digital certificates that verify ownership of digital assets using blockchain technology, have dominated headlines in the last several months. The media mania hit a high with the $69 million sale of Beeple’s Everydays:The First 5000 Days. A few months after Beeple’s historic sale at Christie’s auction house, the crypo-art bubble has officially burst.

These sorts of things are a bit too volatile for me to cheer just yet. The blockchain bubble is something I look at and say, this is incredibly dumb. The whole premise of it makes zero sense: you’re wasting resources (and don’t get me started on Chia, the grim reaper for residential SSDs) for nothing. The end product has extremely little to no subjective value—how much would you pay for blockchain outputs?—but burns up resources in the form of energy, increased prices for computer components, and time that could have been spent doing something more productive, like repeatedly turning your computer off and on again: at least there, you gain valuable skills in figuring out how to power down and power up a machine.

I can kinda-sorta get the idea of using blockchain for certain types of auditing trails, but there are still two big problems with it. First is the 50% problem: whoever controls 50% of the compute controls the past, present, and future of the blockchain and can make whatever arbitrary changes are desired. Beyond that, the other problem is, how much better is this than a digest hash of activities written to a WORM drive? Considering how many orders of magnitude less expensive the latter is to the former, there has to be an enormous benefit for it to make any sense. And there’s really not.

Comments closed

The Value of a Working Dev Environment

Tim Mitchell wants to talk about dev environments:

Let’s talk about your development environment.

Specifically, I’d like to chat with you about the virtual space where your data architecture team, software developers, and information curators do their development and testing work. A proper development environment is logically separated from the production environment, and is often further partitioned into different realms for initial development, data or functional validation, and user acceptance testing. For mature enterprise-ready environments, there is also usually a build and deployment process that automates the movement of code from one environment to the next, reducing the chance for human error when moving code through its paces and ultimately into the production environment.

I’d like optimistically to say that Tim is using strawmen here, but I’ve worked in (and sometimes created) pretty much each one of these.

Comments closed

Comparing Views to Derived Tables and CTEs

Itzik Ben-Gan takes us through views:

As usual when discussing relational theory, we SQL practitioners are often told that the terminology we’re using is wrong. So, in this spirit, right off the bat, I’ll start by saying that when you use the term tables and views, it’s wrong. I’ve learned this from Chris Date.

Recall that a table is SQL’s counterpart to a relation (oversimplifying the discussion around values and variables a bit). A table could be a base table defined as an object in the database, or it could be a table returned by an expression—more specifically, a table expression. That’s similar to the fact that a relation could be one that is returned from a relational expression. A table expression could be a query.

Now, what is a view? It’s a named table expression, much like a CTE is a named table expression. It’s just that like I said, a view is a reusable named table expression that is created as an object in the database, and is accessible to those who have the right permissions. This is all to say, a view is a table. It’s not a base table, but a table nonetheless. So just like saying “a rectangle and a square” or “a whisky and a Lagavulin” would seem strange (unless you had too much Lagavulin!), using “tables and views” is as improper.

Yeah, if we’re going to push our glasses up the bridges of our noses and get all relational here, we’d have the relvar (which is a name and collection of attributes + data types) and the relation (which you can think of as a relvar at a particular point in time—that’s where we get tuples of data). And as Itzik points out, what the RDBMS world calls a view quite neatly fits the definition of a relvar, as we define a name and collection of attributes + data types.

If all of this could not mean less to you, still read the article for Itzik’s view on views.

Comments closed

Converting from XML in Powershell

Phil Factor has unleashed the full power of XML:

I want to convert reasonably small XML files to hash tables and PowerShell objects. PowerShell never had a ConvertFrom-XML Cmdlet because gulping a large XML file into a PowerShell data object is expensive in resources. It is the sheer time it takes to consume a large XML file. Instead, you have to use the XMLDocument object to navigate to the data you want or use an Xpath query. It is all well and good to handle XML in this way, but it is inconsistent to have no ConvertFrom-XML cmdlet. After all, there is a ConvertFrom cmdlet for CSV, JSON, and a variety of text-based data. It would be good to have one for XML as well. Usually, I just want to consume relatively small XML files and just pick out the data I want. I hoped that one that worked would turn up but somehow it never did. So I wrote my own.

Click through for that script, as well as some considerations about using it.

Comments closed

Wait Stats Not in Query Store

Erik Darling says wait, wait, don’t tell me:

There are some oddities in the documentation for query store wait stats.

One is that RESOURCE_SEMAPHORE_QUERY_COMPILE is listed as a collected wait, but with an asterisk that says it’s not actually collected. I’ve tested workloads that generate lots of that wait, and just like the docs say, it doesn’t end up there.

Of course, since I added wait stats recently to sp_QuickieStore, I wanted to make sure other waits that I care about actually show up in there.

Read on to see which wait stats you can find in Query Store and which you’ll have to get from someplace else.

Comments closed

Developing a Patch Strategy

Brent Ozar shares a patching strategy:

Decide how you’re going to detect problems. Every now and then, an update breaks something. For example, SQL Server 2019 CU7 broke snapshotsSQL Server 2019 CU2 broke Agent, and so many more, but my personal favorite was when SQL Server 2014 SP1 CU6 broke NOLOCK. Sure, sometimes the update installer will just outright fail – but sometimes the installer succeeds, but your SQL Server installation is broken anyway, and it may take hours or days to detect the problem. You need to monitor for new and unusual failures or performance problems.

Click through to see the high-level strategy elements.

Comments closed

Ranger and Jersey Clients

Jon Morisi troubleshoots an irksome issue:

Just a quick blog here about an issue I had with HDP-3.1.4.0.  I recently was setting up a new user with specific rights in Ranger for Hive access.  After creating the new policy and attempting to validate it, I received an error message stating that the hive user does not have use privilege.  This error was produced even though I had just created the policy specifically granting those privilege’s.

Upon further review I noticed that the plugin was downloading the policy, but not applying it.  

Read on to learn what the problem was and how Jon corrected it.

Comments closed

Announcements from Data+AI Summit

Ryan Boyd summarizes Databricks announcements:

The Delta Lake open source project is a key enabler of the lakehouse, as it fixes many of the limitations of data lakes: data quality, performance and governance. The project has come a long way since its initial release, and the Delta Lake 1.0 release was just certified by the community. The release represents a variety of new features, including generated columns and cloud independence with multi-cluster writes and my favorite — Delta Lake standalone, which reads from Delta tables but doesn’t require Apache SparkTM.

We also announced a bunch of new committers to the Delta Lake project, including QP Hou, R.Tyler Croy, Christian Williams, Mykhailo Osypov and Florian Valeye.

Learn more about Delta Lake 1.0 in the keynotes from co-creator and Distinguished Engineer Michael Armbrust.

Read on for a variety of announcements in this vein.

Comments closed

Querying AWS Athena via Powershell

Michael Bourgon needs to get some data out of S3:

I was running into issues with the Linked Server lopping off long JSON that I’m having to pull out from the raw files.  I can’t explain it – doesn’t appear to be SSMS.  See previous post

But I needed to automate this, rather than use SQL Workbench, save to “Excel” (it was XML), then opening it again and saving it so that instead of 250mb, it’s 30mb.  Runs against the previous month, one day at a time (walking the partitions), and then saves to a file.  You got your Athena, your ODBC, your Export-Excel…

Incidentally, that previous post was around trying to use a linked server to pull the data in via SQL Server.

Comments closed