Press "Enter" to skip to content

Month: March 2018

Rotating Out Partitions

Kendra Little explains that there are a couple of models available for partitioned table management:

I recently received a terrific question about table partitioning:

I’m writing up a proposal for my company to start partitioning a 2.5TB table. The idea/need is to control the size of the table while saving the old data. The old data will be moved to an archive database on a different server where the BI guys work with it.

In none of the videos articles I’ve seen is the explanation of how the rolling partition works on a long term daily basis.

  1. Are the partitions reused, like in a ROUND ROBIN fashion?
  2. Or, do you add new partitions each day with new filegroups, drop the oldest partition off – this would be FIFO?

Lots of folks assume the answer here is always #2, simply because there’s a bunch of sample code out there for it.

But option #1 can be simpler to manage when it fits your data retention technique!

Click through to learn more about reusable partitioning.

Comments closed

Second-Order SQL Injection Attacks

Bert Wagner explains what he calls second-order SQL injection attacks:

SQL injection attacks that delay execution until a secondary query are known as “second order”.

This means a malicious user can inject a query fragment into a query (that’s not necessarily vulnerable to injection), and then have that injected SQL execute in a second query that is vulnerable to SQL injection.

Let’s look at an example.

Another way of thinking about this is a persisted SQL injection attack, akin to reflected versus persisted cross-site scripting attacks.  The fix is, don’t trust unsanitized user input.  Just because you put a user’s data into your database doesn’t mean that someone sanitized it, so treat that stuff as unsafe unless you know otherwise.

Comments closed

Avro Schema Compatibility In Kafka

Neha Bhardwaj walks us through an error in Kafka:

You might have come across a similar exception while working with AVRO schemas.

Kafka throws this exception due to a compatibility issue since the current schema is not compatible with the earlier schema registered on this topic.

You can check the current schema(s) on the topic using:
curl -X GET <a href=”http://localhost:8081/subjects//versions/”&gt;http://localhost:8081/subjects//versions/

Read on to understand what this error means and how you can fix it if you see it.

Comments closed

Making Hadoop A Relational Database

Alex Woodie reports on an offering by a company named Splice Machine:

“Everybody is struggling to figure out how to expose what’s been put on the data lake to the business,” Splice Machine founder and CEO Monte Zweben told Datanami at the recent Strata Data Conference. “Our opinion is that you can take infrastructure that people understand, like relational database management systems, and run them directly on the data lake.”

That’s essentially the message that Splice has been pushing since the peak of the Hadoop frenzy in the 2013-2015 timeframe, and it’s the same message that it’s pushing today. The big difference, according to Zweben, is the maturity level. Splice Machine’s open source technology that essentially turns Hadoop into a distributed ACID-compliant relational database is now ready for primetime. Wells Fargo is arguably its biggest paying customer and production use case, but it has dozens more across financial services, healthcare and other industries.

Feasel’s Law strikes again.

Comments closed

Comparing Hadoop On EC2 To EMR

Mark Litwintschik looks at EC2 versus EMR in terms of performance, specifically targeting a solution at less than $3 per hour:

The $3.00 price point was driven by the first method: running a single-node Hadoop installation. I wanted to make sure the dataset used in this benchmark could easily fit into memory.

The price set the limit for the second method: AWS EMR. This is Amazon’s Hadoop Platform offering. It has a huge feature set but the key one is that it lets you setup Hadoop clusters with very little instruction. The $3.00 price limit includes the service fee for EMR.

Note I’ll be running everything in AWS’ Irish region and the prices mentioned are region- and in the case of spot prices, time-specific.

I was a bit surprised at which service won.

Comments closed

Automated Cleanup With Query Store

Grant Fritchey discusses Query Store’s automated cleanup and also looks at an interesting question:

Query Store has mechanisms for automatically cleaning your data. It is possible to cause them to break down. While presenting a session about the Query Store recently, I was asked what happened if you set the size of the Query Store below the amount of data currently in the store. I didn’t know the answer, so we tried it. Things got a little weird.

Click through to see how weird.

Comments closed

Unique Indexes Versus Unique Constraints

Greg Low argues that you should create unique constraints instead of unique indexes whenever possible:

The CREATE INDEX statement is used to do exactly what its name says, it creates an index. But when you say CREATE UNIQUE INDEX, you are doing more than that; you are enforcing a business rule that involves uniqueness.

I have a simple rule on this. Wherever possible business rules like uniqueness, check values, etc. should be part of the design of the table, and not enforced in an external object like an index.

So, rather than a unique index, I’d rather see a unique constraint on the underlying table.

But that’s where real life steps in. I see two scenarios that lead me to occasionally use CREATE UNIQUE INDEX.

Here’s a third:  creating constraints can cause blocking issues.  If you already have a large table and Enterprise Edition, creating a unique index can be an online operation (unless you have a clustered columnstore index on the table), but a unique constraint is always a blocking activity.

Comments closed

Backing Up Azure SQL Databases

Arun Sirpal enumerates the options we have for backups of Azure SQL Databases:

If you have a business requirement which has a need to retain database backups for longer than 35 days, then you have an option to use long-term backup retention. This feature utilises the Azure Recovery Services Vault where you can store up to 10 years’ worth of backups for up to 1000 databases per vault and 25 vaults per subscription.

There are some guidelines that you need to follow to successful set this up:

  • Your vault MUST be in the same region, subscription and resource group as your logical SQL Server, if not then you will not be able to set this up.

  • Register the vault to the server.

  • Create a protection policy.

  • Apply the above policy to the databases that require long-term backup retention.

Arun also looks at restoration options.

Comments closed

The Data Exploration Process

Stacia Varga takes a step back from analyzing NHL data to explore it a little more:

As I mentioned in my last post, I am currently in an exploratory phase with my data analytics project. Although I would love to dive in and do some cool predictive analytics or machine learning projects, I really need to continue learning as much about my data as possible before diving into more advanced techniques.

My data exploration process has the following four steps:

  1. Assess the data that I have at a high level

  2. Determine how this data is relevant to the analytics project I want to undertake

  3. Get a general overview of the data characteristics by calculating simple statistics

  4. Understand the “middles” and the “ends” of your numeric data points

There’s some good stuff in here.  I particularly appreciate Stacia’s consideration of data exploration as an iterative process.

Comments closed

The Blame Game

Kenneth Fisher has a board game for us:

It’s Monday morning and your manager Brent has called his usual emergency all-employee meeting. He looks more than a little bit unhappy, and this time it’s not because someone stole his cruller. Over the weekend he was demonstrating the new anatomy program Mr. Body to some investors and frankly the performance was miserable! Now Brent has only one question.

Who killed Mr. Body’s performance?

We all know Andy Mallon did it.

Comments closed