Single Message Transforms (SMT) is a functionality within Kafka Connect that enables the transformation … of single messages. Clever naming, right?! Anything that’s more complex, such as aggregating or joins streams of data should be done with Kafka Streams — but simple transformations can be done within Kafka Connect itself, without needing a single line of code.
SMTs are applied to messages as they flow through Kafka Connect; inbound it modifies the message before it hits Kafka, outbound and the message in Kafka remains untouched but the data landed downstream is modified.
There’s quite a bit you can do with this, so check it out.
But not only can existing objects be viewed, new ones can be created.
In my last post I created a single pod running SQL Server, I want to move on from that as you’d generally never just deploy one pod. Instead you would create what’s called a deployment.
The dashboard makes it really simple to create deployments. Just click Deployments on the right-hand side menu and fill out the details:
Check it out; this looks like a good way of managing Kubernetes on the small, or getting an idea of what it can do.
Causes of Overfitting
There are two major situations that could cause overfitting in DTrees:
- Overfitting Due to Presence of Noise – Mislabeled instances may contradict the class labels of other similar records.
- Overfitting Due to Lack of Representative Instances – Lack of representative instances in the training data can prevent refinement of the learning algorithm.
A good model must not only fit the training data well
but also accurately classify records it has never seen.
How to avoid overfitting?
There are 2 major approaches to avoid overfitting in DTrees.
approaches that stop growing the tree earlier, before it reaches the point where it perfectly classifies the training data.
approaches that allow the tree to overfit the data, and then post-prune the tree.
Click through for more details on these two approaches.
There have already been many posts/articles/books written about the subject of how CALCULATE and FILTER works, so I’m not going to repeat all that information here. Noteworthy resources (by “the Italians” of course):
In this blog post I’d rather discuss a performance issue I had to tackle at a client. There were quite a lot of measures of the following format:
Click through for a couple iterations of this.
I’ve been using TRUNCATE TABLE to clear out some temporary tables in a database. It’s a very simple statement to run, but I never really knew why it was so much quicker than a delete statement. So let’s look at some facts:
The TRUNCATE TABLE statement is a DDL operation, whilst DELETE is a DML operation.
TRUNCATE Table is useful for emptying temporary tables, but leaving the structure for more data. To remove the table definition in addition to its data, use the DROP TABLE statement.
Read on for more details and a couple scripts to test out Richie’s statements.
You look at the numbers again, and now you find that disk latency, which had previously been fine, is now completely in the tank during the business day, showing that I/O delays are through the roof.What happened?This demonstrates the concept of shifting bottleneck – while CPU use was through the roof, the engine so bogged down that it couldn’t generate that much I/O, but once the CPU issue was resolved queries started moving through more quickly until the next choke point was met at the I/O limit. Odds are once you resolve the I/O situation, you would find a new bottleneck.How do you ever defeat a bad guy that constantly moves around and frequently changes form?
Click through for some pointers on disk latency and trying to figure out when it becomes a problem.
Here’s a (SQL) to the Power BI Hurricane tracker that I did last year for Hurricane Matthew. It’s not 100% perfect but it gets the job done. It should update every couple of hours, enjoy. Stay safe my fellow Floridians.
Click here to see the live report
Check it out.
1. Need to create a group/user “User1”, which has to have only CRUD (Create-Read-Update-Delete) permissions for data in schema called “Schema1”.
2. Need to create a group/user “User2”, which has to have similar permissions as “User1” and have to be able create Views/Procedures/Functions in schema called “Schema2”.
3. The group/user “User1” has to have Select/Execute permissions for all newly created objects in “Schema2”.
Solution: Create a special database role for group/user “User2”.
Read on for sample scripts, including some tests to ensure we don’t over-grant rights.
Some time ago I came across a strange issue where I found a number of duplicated SQL Agent jobs, the odd thing is SQL will not allow you to have more than one agent job with the same name – they need to be unique.
This got me scratching my head a little at first, so I started out with some basic checks of the msdb tables.
This is example #5008 of just how poor the SQL Agent database design is. Example #1 is the absurd date-time notation.