Press "Enter" to skip to content

Month: July 2022

Sources of Data Structure Truth

Deb Melkin performs database epistomology:

The “source of truth” is my newly made up phrase for whatever you are using to say this is my database schema and initial data needed to start up the application. This can be your script directory; this can be a dacpac or bacpac; this can be your data model; this can be a combination of these things. My go-to “source of truth” right now is my source control repository. I’ve got both the schema and the default data needed in the same location. In the past, I would have probably included the data model as way to help me make sure whatever database table changes I have in my source control are there, especially for that one database which only had tables and views. (A different rant for a different time.) Whatever you use, it absolutely CANNOT be an actual database. There are two main reasons for this:

Read on for those reasons.

Comments closed

Finding Sample Data Online

Mara Pereira goes searching for data:

Have you ever struggled to find sample data to play with in Power BI?

Did you spend hours (sounds crazy, but it happened to me too!) just looking for a dataset with insurance data? Healthcare data? Housing prices data?

Did you ever wonder “where are people finding the data to create those Netflix and Amazon reports that seem to be everywhere these days? Seriously, tell me your secret!”?

Click through for three good sites. Another one I’ve taken to is the US Bureau of Labor Statistics, which is the federal department responsible for tracking things like employment data, consumer prices, and compensation data. The plus side to these datasets is that you get the whole range of data cleanup, warehousing, querying, and analytics over data which is both real and fairly interesting. For a broader take, data.gov has open source data hosted by the US federal government, though I haven’t spent much time working with it.

Comments closed

SQL Login of the Week Club

Mala Mahadevan recounts a horror story:

Its been a few years since I left consulting. But the last gig I was at – we encountered something like this. We had a big client who had outsourced all their database development and manual update work (no not to us, to some third-party contracting company). These were contractors paid by the hour, and the turnover was really high. Our client did not want to issue windows based authenticated logins to these people for some reason (do not recall what). So every week, when the week started, the contractor working on a particular server would get a SQL Server authenticated login they could use. This was valid just for that week and would expire the next week. And, every weekend , it was our job, as the remote DBA company, to set up those logins.

But wait, it gets worse.

Comments closed

The Risks of “Unused” Databases

Chad Callihan reminds us about things in use:

You may have been updating data someone needed for validating a fix. The “_OLD” table that you assumed could be dropped may still have been useful to the person who created it. There might be a database covered in cobwebs that should have been dropped years ago but it could also be a database that’s used for some type of reporting every few months.

Yeah, that’s a pretty common problem. A couple of things which help mitigate this issue:

  • Check wherever you can to see if the database (or database object) is in use: cached plans, stored procedure calls, application calling code, SQL Agent jobs, SSIS packages, etc.
  • Take (and test!) backups of databases before you drop tables or get rid of them.
  • Keep those database backups around for quite a while.
  • Take databases offline for a while before dropping them. That way, if somebody really does use it on occasion, it’s easy to bring back online rather than needing to restore from a backup.

At the end of the day, however, you shouldn’t be afraid to drop things. Do the appropriate amount of diligence and make it a controlled demolition.

Comments closed

Parameterizing Queries with Amazon Athena

Blayze Stefaniak, et al, architect a service to provide data via Amazon Athena:

Customers tell us they are finding new ways to make effective use of their data assets by providing data as a service (DaaS). In this post, we share a sample architecture using parameterized queries applied in the form of a DaaS application. This is helpful for many types of organizations, whether you’re working with an enterprise making data available to other lines of business, a regulator making reports available to your industry, a company monetizing your data assets, an independent software vendor (ISV) enabling your applications’ tenants to query their data when they need it, or trying to share data at scale in other ways. In DaaS applications, you can provide predefined queries to run against your governed datasets with values your users input. You can expand your DaaS application to break away from monolithic data infrastructure by treating data as a product (DaaP) and providing a distribution of datasets, which have distinct domain-specific data pipelines. You can authorize these datasets to consumers in your DaaS application permissions. You can use Athena parameterized queries as a way to predefine your queries, which you can use to run queries across your datasets, and serve as a layer of protection for your DaaS applications. This post first describes how parameterized queries work, then applies parameterized queries in the form of a DaaS application.

Click through to learn how.

Comments closed

Application Names and Database Queries

Tom Zika does not like those missing application names:

Whenever I’m trying to debug a problem using sp_whoisactive or Extended Events (XE) and I see either Core Microsoft SqlClient Data Provider or .Net SqlClient Data Provider, my blood begins to boil.

It means I’ll probably spend hours asking around to try and find the owner. Sometimes knowing the host_name helps, but there can be a multi-purpose host that runs many applications – which one is having the problem?

How do you set the name? Read the post to find out.

Comments closed

Test Those Backups

Kevin Chant comes at us with important advice:

In reality, not being able to restore a database can damage your company’s reputation and you can end up losing colleagues for various reasons. For example, if they ask for restores to be tested and it never happens.

To save your colleagues from a database restore nightmare you must test restoring your database backups at some level. Otherwise, it can lead to serious issues.

As the saying goes, you don’t have a backup until you’ve successfully restored it.

Comments closed

Another Knock against OFFSET/FETCH

Erik Darling won’t play fetch:

The two methods we’re going to look at are OFFSET/FETCH and a more traditional ROW_NUMBER query.

As you may have picked up from the title, one will turn out better, and it’s not the OFFSET/FETCH variety. Especially as you get larger, or go deeper into results, it becomes a real boat anchor.

Click through for the details.

Comments closed

Database Project Versioning and Identification

Eitan Blumin answers an important question:

“What is SSDT“, you ask? Oh, you didn’t? Well, let me tell you anyway! SSDT is the go-to solution from Microsoft for versioning SQL Server databases and performing state-based deployments (and it’s free!). It has many useful capabilities for developing and publishing changes from your SQL Database project to your SQL Database in production (or wherever).

One of the things that are not so clear about SSDT specifically and database versioning in general, is how should one identify which “version” of your database project was last deployed to your server?

Eitan includes several ways of tracking and controlling database versions.

Comments closed