Press "Enter" to skip to content

Day: April 5, 2021

Displaying Metrics on Graphite Dashboards

Nick Campion takes us through working with Graphite:

Graphite is a free and open-source software. It is used as a time-series database monitoring tool, where you can collect, store and display time-series data in real-time. As you can monitor certain metrics of this data using Graphite, it has a very useful and simple dashboard used to visualize these metrics.

This article will show you how to display a metric on your Graphite dashboard.

Click through for more information.

Comments closed

Kafka Sans ZooKeeper

Ben Stopford and Ismael Juma give us a preview:

So we’re very pleased to say that the early access of the KIP-500 code has been committed to trunk and is expected to be included in the upcoming 2.8 release. For the first time, you can run Kafka without ZooKeeper. We call this the Kafka Raft Metadata mode, typically shortened to KRaft (pronounced like craft) mode.

Beware, there are some features that are not available in this early-access release. We do not yet support the use of ACLs and other security features or transactions. Also, both partition reassignment and JBOD are unsupported in KRaft mode (these are anticipated to be available in an Apache Kafka release later in the year). Hence, consider the quorum controller experimental software—we don’t advise subjecting it to production workloads. If you do try out the software, however, you’ll find a host of new advantages: It’s simpler to deploy and operate, you can run Kafka in its entirety as a single process, and it can accommodate significantly more partitions per cluster (see measurements below).

Read on for more information. This is a big deal for Kafka.

Comments closed

Describing the Physical Join Operators

Deepthi Goguri explains the three physical join operators in SQL Server:

Merge join is not a bad thing and it may be efficient already in your execution plan. What you have to observe when you see the merge joins and performance slow on that plan is to focus on the upstream operations that are going into the merge join. Whether the data is presorted as you already have an index or whether the data is presorted in SQL Server own way then in that case, you can simply check if you can just add that missing column in the index and place in the last key column in the index or use a different join algorithm will be better. The other scenario might be you have lots of duplicate values in your data. If that is the case SQL Server will be using the work tables to handle how the duplicate values can be joined on. So, if you see the duplicate values or using tempdb, then finding the better options will be good.

Click through for more detail. Each physical operator has its place and does quite well within it, but the challenge comes when the optimizer thinks a particular route is better than the one you had in mind.

Comments closed

Comparing CSV to Parquet File Loading Performance in Power BI

Chris Webb has a comparison for us:

Earlier in this series on importing data from ADLSgen2 into Power BI I showed how partitioning a table in your dataset can improve refresh performance. In that post I used CSV files in ADLSgen2 as my source and created one partition per CSV file, but after my recent discovery that importing data from multiple Parquet files can be tuned to be a lot faster than importing data from CSV files, I decided to try creating partitions linked to Parquet files instead.

Click through for the experiment and its results.

Comments closed

Query Plans and Window Functions

Erik Darling has a two-fer here. First, window functions and parallelism:

When windowing functions don’t have a Partition By, the parallel zone ends much earlier on than it does with one.

That doesn’t mean it’s always slower, though. My general experience is the opposite, unless you have a good supporting index.

But “good supporting index” is for tomorrow. You’re just going to have to deal with that.

Second, columnstore behavior with respect to window functions:

Not only is the parallel version of the row mode plan a full second slower, but… look at that batch mode plan.

Look at it real close. There’s a sort before the Window Aggregate, despite reading from the same nonclustered index that the row mode plan uses.

But the row mode plan doesn’t have a Sort in it. Why?

Check out both posts.

Comments closed

Automated File Cleanup with Powershell

Jeffrey Hicks does some spring cleaning:

Springtime is approaching in North America. Where I live, the snow has finally melted and we have blue skies with warmer temperatures. Of course, this means Spring Cleaning. Time to clear out the winter debris and spruce up the house. For me, this is also a good time for some computing housecleaning as well. I don’t know about your Windows environment, but I tend to accumulate a lot of junk. Most of the time I don’t see it, but I know it’s there. While the junk normally doesn’t have a negative impact, I think mentally, I like clearing things out and tidying up. So I pulled out some older PowerShell code, freshened it up, and now I have a set of tools for clearing out junk and temporary folders. Let me show you what I came up with.

Click through for the scripts.

Comments closed