Kevin Feasel – Page 1478

Grouping Sets Of Tables In Biml ETL Loads

Published 2017-09-11 by Kevin Feasel

Ben Weissman puts together clusters of tables for data loads:

The table meta.containers could technically also be a temporary table. We’ve decided against that so you can see what’s happening behind the scenes.

Let’s focus on the meta.tables table for now. It has three columns:

– TableName – guess what we’ll store in there
– Container – this one will hold the information, which container we want this table to be loaded it, which will be automatically populated by our stored procedure
– Cost – this column will hold the load cost of this specific table. In our opinion, this should ideally be the average time it took SSIS to load this table in the recent past. If you don’t have that information available, it might as well something like the size of this table in either Gigabytes or Rows. The more accurate this column is, the better your results will be.

The only tricky part in Ben’s code is figuring out appropriate values for Cost, but if you’ve got rough timing measures or even good priors, you can get to a reasonable solution quickly. And if time is of the essence, you can model, simulate, and apply results as part of an analytics project.

Comments closed

Chart Style Controls

Published 2017-09-11 by Kevin Feasel

Wolfgang Strasser shows off a new feature in Power BI:

The theme documentation provides a list of available visual names, cardNames and property names.

At this point some further explanation is needed for the hierarchy within the theme definition:

visualName corresponds to available PBI visuals like treeMap, card, columnChart,…
styleName (as of today I am not sure whereto this corresponds to PBI Desktop language.. :-)) maybe someone can further explain this to me
cardName corresponds to the formatting card/option within Power BI Desktop. Attention here: the name in the theme JSON file is defined different than the User Interface name + do not forget case-sensitivity! (i.e. general => General; categoryAxis => X-Axis, valueAxis => Y-Axis, ..). See the documentation for the rest of the mapping.

This is good news if it makes it easier for developers to write CVD-friendly reports.

Comments closed

Message Transformation Within Kafka

Published 2017-09-08 by Kevin Feasel

Robin Moffatt shows how to use Single Message Transforms inside Kafka Connect to reshape messages as you send them downstream:

Single Message Transforms (SMT) is a functionality within Kafka Connect that enables the transformation … of single messages. Clever naming, right?! Anything that’s more complex, such as aggregating or joins streams of data should be done with Kafka Streams — but simple transformations can be done within Kafka Connect itself, without needing a single line of code.

SMTs are applied to messages as they flow through Kafka Connect; inbound it modifies the message before it hits Kafka, outbound and the message in Kafka remains untouched but the data landed downstream is modified.

There’s quite a bit you can do with this, so check it out.

Comments closed

Using The Kubernetes Dashboard

Published 2017-09-08 by Kevin Feasel

Andrew Pruski shows how to set up and use the Kubernetes dashboard inside Azure Container Services:

But not only can existing objects be viewed, new ones can be created.

In my last post I created a single pod running SQL Server, I want to move on from that as you’d generally never just deploy one pod. Instead you would create what’s called a deployment.

The dashboard makes it really simple to create deployments. Just click Deployments on the right-hand side menu and fill out the details:

Check it out; this looks like a good way of managing Kubernetes on the small, or getting an idea of what it can do.

Comments closed

Overfitting On Decision Trees

Published 2017-09-08 by Kevin Feasel

Ramandeep Kaur explains overfitting as well as how to prevent overfitting on decision trees:

Causes of Overfitting

There are two major situations that could cause overfitting in DTrees:

Overfitting Due to Presence of Noise – Mislabeled instances may contradict the class labels of other similar records.

Overfitting Due to Lack of Representative Instances – Lack of representative instances in the training data can prevent refinement of the learning algorithm.

A good model must not only fit the training data well
but also accurately classify records it has never seen.

How to avoid overfitting?

There are 2 major approaches to avoid overfitting in DTrees.

approaches that stop growing the tree earlier, before it reaches the point where it perfectly classifies the training data.
approaches that allow the tree to overfit the data, and then post-prune the tree.

Click through for more details on these two approaches.

Comments closed

Calculation And Filtering With DAX

Published 2017-09-08 by Kevin Feasel

Koen Verbeeck is looking to optimize code which uses CALCULATE and FILTER together:

There have already been many posts/articles/books written about the subject of how CALCULATE and FILTER works, so I’m not going to repeat all that information here. Noteworthy resources (by “the Italians” of course):

Filter Arguments in CALCULATE

How CALCULATE works in DAX

From SQL to DAX: Filtering Data

In this blog post I’d rather discuss a performance issue I had to tackle at a client. There were quite a lot of measures of the following format:

1	CALCULATE(measureX,FILTER(tableY,columnZ = “expression”))

Click through for a couple iterations of this.

Comments closed

Truncation Versus Deletion

Published 2017-09-08 by Kevin Feasel

Richie Lee contrasts two methods of getting rid of data:

I’ve been using TRUNCATE TABLE to clear out some temporary tables in a database. It’s a very simple statement to run, but I never really knew why it was so much quicker than a delete statement. So let’s look at some facts:

The TRUNCATE TABLE statement is a DDL operation, whilst DELETE is a DML operation.
TRUNCATE Table is useful for emptying temporary tables, but leaving the structure for more data. To remove the table definition in addition to its data, use the DROP TABLE statement.

Read on for more details and a couple scripts to test out Richie’s statements.

Comments closed

I/O Latency And Performance Tuning

Published 2017-09-08 by Kevin Feasel

Andy Galbraith is starting a new toolbox series. His first post is an introduction and a look at drive latency:

You look at the numbers again, and now you find that disk latency, which had previously been fine, is now completely in the tank during the business day, showing that I/O delays are through the roof.

What happened?

This demonstrates the concept of shifting bottleneck – while CPU use was through the roof, the engine so bogged down that it couldn’t generate that much I/O, but once the CPU issue was resolved queries started moving through more quickly until the next choke point was met at the I/O limit. Odds are once you resolve the I/O situation, you would find a new bottleneck.

How do you ever defeat a bad guy that constantly moves around and frequently changes form?

Click through for some pointers on disk latency and trying to figure out when it becomes a problem.

Comments closed

Hurricane Tracking With Power BI

Published 2017-09-08 by Kevin Feasel

Chris Albrektson has updated his hurricane tracker to watch Irma:

Here’s a (SQL) to the Power BI Hurricane tracker that I did last year for Hurricane Matthew. It’s not 100% perfect but it gets the job done. It should update every couple of hours, enjoy. Stay safe my fellow Floridians.

Click here to see the live report

Check it out.

Comments closed

Rights And Roles In SQL Server

Published 2017-09-08 by Kevin Feasel

Slava Murygin walks us through rights assignment with roles:

Problem description:
1. Need to create a group/user “User1”, which has to have only CRUD (Create-Read-Update-Delete) permissions for data in schema called “Schema1”.
2. Need to create a group/user “User2”, which has to have similar permissions as “User1” and have to be able create Views/Procedures/Functions in schema called “Schema2”.
3. The group/user “User1” has to have Select/Execute permissions for all newly created objects in “Schema2”.

Solution: Create a special database role for group/user “User2”.

Read on for sample scripts, including some tests to ensure we don’t over-grant rights.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Author: Kevin Feasel

Causes of Overfitting

How to avoid overfitting?