Press "Enter" to skip to content

Day: October 12, 2021

Detecting Hard-to-Classify Data

Kaushal Mukherjee takes us through a new Python package:

The article explains the algorithm behind the recently introduced Python package named PyHard, based on the concept of Instance Space Analysis. It helps in assessing the quality of a dataset and identifying what are the instances which are hard/easy to classify. With the help of this algorithm we can separate out noisy instances. It also provides an interactive visualization tool to deep dive into the instance space.

Click through for the details. I’m going to wait for PyHard 2: PyHarder. Or maybe PyHardWithAVengeance. But it’ll all go downhill by the time we get to PyHard 5.

Comments closed

Scaling Out vs Scaling Up

Jordan Braiuka compares two models for scaling:

We often get questions from customers about the best way to add capacity to their cluster. Is it better to add nodes, or simply to increase the capacity in their nodes? Unfortunately, the truth is there is no best way—like all complex issues in distributed systems, there are benefits and drawbacks to each scaling approach. 

While each of our highly distributed systems (Apache CassandraApache Kafka, etc.) have slightly different implementations of scaling, the concepts remain consistent across most distributed systems. 

Click through for a comparison between the two approaches. As the article indicates, both are meaningful strategies and your choice might come down to a combination of the technology stack and the problem at hand.

Comments closed

SSMS Tools Pack 5.7

Mladen Prajdic announces SSMS Tools Pack 5.7.0.0:

Run script on multiple targets
Improved: Show databases that are not in normal state and are not accessible.
SQL History
Improved: Enable showing confirmation dialog when restoring the last session.
Search Results in Grid Mode
Improved: Added warning that searching through large result sets will block SSMS.

These are the improvements; click through for the list of what’s new.

Comments closed

Testing sp_ineachdb

Aaron Bertrand takes us to the Island of Misfit Databases:

The only database that requires extra handling is the one that contains a tab, because SQL Server doesn’t know how to generate file names when that character is present. I am sure there are a bunch of other less common but equally exotic characters that may cause the same problem.

This is how I actually tested sp_ineachdb, to make sure it was ready for just about any bad idea anyone used to name a database, and could handle various possible database states (for a lot more background on this procedure, and why it is better than the undocumented, unsupported, and buggy sp_msforeachdb, see this and this). Here you can see that the procedure works against all these poorly-named databases, and skips databases that are inaccessible (rather than raise an exception).

Click through to see the list of databases Aaron uses. Technically, I think Aaron’s blog post also counts as a Halloween post because some of those databases are spooky.

Comments closed

Star Schemas versus Header-Detail Tables in Power BI

Marco Russo and Alberto Ferrari lay out another proof that the star schema is the right schema for Power BI:

We already shown in a previous article (Power BI – Star schema or single table – SQLBI) how the star schema proves to be the best option when compared with a single table model. Single-table models are the evil: do not be tempted by them, choose a star schema.

In this article, I want to show you an example in the opposite direction. A single table model denormalizes everything in one table, and we already learned that it is bad. But what if we keep a more normalized structure, as it often happens in header/detail models (like orders and order lines)? Is a header/detail model better than a star schema? The quick answer is: “No. Nope. No way. Not at all. Are you kidding me? No.”. Nonetheless, this might be just our personal opinion. The goal of the article is to provide you with some numbers and considerations to prove the previous statement.

Read on and you make the call.

Comments closed

Searching T-SQL Objects

Rob Farley has a quick script to find references in SQL Server:

As a consultant, the kind of work that I do from customer to customer can change a bit. Sometimes I’m reviewing people’s environments; sometimes I’m performance tuning; sometimes I’m developing code or reports or cubes; sometimes I’m writing T-SQL, but it’s often DAX or PowerShell.

Click through for a quick script to search modules for a particular string.

Comments closed