You might have come across a similar exception while working with AVRO schemas.
Kafka throws this exception due to a compatibility issue since the current schema is not compatible with the earlier schema registered on this topic.
Read on to understand what this error means and how you can fix it if you see it.
“Everybody is struggling to figure out how to expose what’s been put on the data lake to the business,” Splice Machine founder and CEO Monte Zweben told Datanami at the recent Strata Data Conference. “Our opinion is that you can take infrastructure that people understand, like relational database management systems, and run them directly on the data lake.”
That’s essentially the message that Splice has been pushing since the peak of the Hadoop frenzy in the 2013-2015 timeframe, and it’s the same message that it’s pushing today. The big difference, according to Zweben, is the maturity level. Splice Machine’s open source technology that essentially turns Hadoop into a distributed ACID-compliant relational database is now ready for primetime. Wells Fargo is arguably its biggest paying customer and production use case, but it has dozens more across financial services, healthcare and other industries.
Feasel’s Law strikes again.
The $3.00 price point was driven by the first method: running a single-node Hadoop installation. I wanted to make sure the dataset used in this benchmark could easily fit into memory.
The price set the limit for the second method: AWS EMR. This is Amazon’s Hadoop Platform offering. It has a huge feature set but the key one is that it lets you setup Hadoop clusters with very little instruction. The $3.00 price limit includes the service fee for EMR.
Note I’ll be running everything in AWS’ Irish region and the prices mentioned are region- and in the case of spot prices, time-specific.
I was a bit surprised at which service won.
Query Store has mechanisms for automatically cleaning your data. It is possible to cause them to break down. While presenting a session about the Query Store recently, I was asked what happened if you set the size of the Query Store below the amount of data currently in the store. I didn’t know the answer, so we tried it. Things got a little weird.
Click through to see how weird.
The CREATE INDEX statement is used to do exactly what its name says, it creates an index. But when you say CREATE UNIQUE INDEX, you are doing more than that; you are enforcing a business rule that involves uniqueness.
I have a simple rule on this. Wherever possible business rules like uniqueness, check values, etc. should be part of the design of the table, and not enforced in an external object like an index.
So, rather than a unique index, I’d rather see a unique constraint on the underlying table.
But that’s where real life steps in. I see two scenarios that lead me to occasionally use CREATE UNIQUE INDEX.
Here’s a third: creating constraints can cause blocking issues. If you already have a large table and Enterprise Edition, creating a unique index can be an online operation (unless you have a clustered columnstore index on the table), but a unique constraint is always a blocking activity.
If you have a business requirement which has a need to retain database backups for longer than 35 days, then you have an option to use long-term backup retention. This feature utilises the Azure Recovery Services Vault where you can store up to 10 years’ worth of backups for up to 1000 databases per vault and 25 vaults per subscription.
There are some guidelines that you need to follow to successful set this up:
Your vault MUST be in the same region, subscription and resource group as your logical SQL Server, if not then you will not be able to set this up.
Register the vault to the server.
Create a protection policy.
Apply the above policy to the databases that require long-term backup retention.
Arun also looks at restoration options.
As I mentioned in my last post, I am currently in an exploratory phase with my data analytics project. Although I would love to dive in and do some cool predictive analytics or machine learning projects, I really need to continue learning as much about my data as possible before diving into more advanced techniques.
My data exploration process has the following four steps:
Assess the data that I have at a high level
Determine how this data is relevant to the analytics project I want to undertake
Get a general overview of the data characteristics by calculating simple statistics
Understand the “middles” and the “ends” of your numeric data points
There’s some good stuff in here. I particularly appreciate Stacia’s consideration of data exploration as an iterative process.
It’s Monday morning and your manager Brent has called his usual emergency all-employee meeting. He looks more than a little bit unhappy, and this time it’s not because someone stole his cruller. Over the weekend he was demonstrating the new anatomy program Mr. Body to some investors and frankly the performance was miserable! Now Brent has only one question.
Who killed Mr. Body’s performance?
We all know Andy Mallon did it.
Then, let’s say the requirements are as follows:
1. No values that are either empty or only spaces
2. No leading spaces
3. No trailing spaces
4. Allow NULL if column allows NULL
Let’s look at how we could implement all of these independently, as there certainly are cases where you may wish to allow any or all of the situations in a column.
Click through for the scripts, as well as a time comparison to see how much overhead you’re adding.