Press "Enter" to skip to content

Month: June 2020

Cost-Cutting in Confluent Platform

Nick Bryan shares some techniques for reducing the cost of running on Confluent Platform:

To start, there are several Confluent Platform features that can greatly reduce your Kafka cluster’s infrastructure footprint. For use cases involving high data ingestion rates, lengthy data retention periods, or stringent disaster recovery requirements, Confluent Platform can help to reduce infrastructure costs by up to 50%.

One of the most important features for this cost category is Tiered Storage.

Read on for a few tips.

Comments closed

Good Practices for Naming Things in Power BI

Chris Webb shares some thoughts on the power of names:

What’s wrong with this picture? Look at the names:

– The tables and columns have the same names that they had in the data source, in this case a SQL Server database. Note the table name prefixes of “Dim” for dimensions and “Fact” for fact tables.
– The column and measure names either don’t have spaces or use underscores instead of spaces.
– What on earth does the measure name _PxSysF even mean?

Chris mentions that some of the ideas in the post may be controversial, but to be honest, I don’t think any of them are. The important thing here is to keep your audience in mind.

Comments closed

Mounting a Disk Image in Powershell

Jack Vamvas shows us how we can mount a disk image from ISO in Powershell:

I want to set up a script to Mount a Disk in an automated way utilising Powershell ? The image exists as an ISO on a network path and requires to be made available as a drive letter & path. It doesn’t have to be a dedicated drive letter – just the next letter after the highest. So for example , if I already have E:, F:,G:  than I want it to be set as I: 

For no extra charge, Jack also shows us how to dismount a disk image.

Comments closed

Backups on AWS RDS

Grant Fritchey shows how you can back up a database on Amazon’s RDS:

Which results in the following:

Msg 262, Level 14, State 1, Line 1
BACKUP DATABASE permission denied in database ‘HamShackRadio’.
Msg 3013, Level 16, State 1, Line 1
BACKUP DATABASE is terminating abnormally.

Completion time: 2020-06-26T08:34:23.5511314-04:00

In short, by default, you can’t backup SQL Server databases on RDS. However, that’s by default. We can make some changes.

Read on to see the proper way of backing up a database hosted in RDS.

Comments closed

Another Way to Calculate Elapsed Business Hours with DAX

Matt Allington follows up on a previous post:

Then, sometimes (like this time) I discover that someone has a better way to solve the same problem that I shared on my blog. This is what happened last week after I shared my first article about how to calculate the total business hours between 2 date/time stamps. I shared the way I solved this problem last week, but one of my readers, Daniil Bogomazov, shared a brilliant alternative solution to the same problem. The solution is so good that I am sharing his solution with you here today.

Read on for a clever solution and a detailed comparison to Matt’s prior answer.

Comments closed

The Basics of Spark Streaming

Muskan Gupta gives us an introduction to Spark Streaming:

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. It was added to Apache Spark in 2013. We can get data from many sources such as Kafka, Flume etc. and process it using functions such as map, reduce etc. After processing we can push data to filesystem, databases and even to live dashboards.

In Spark Streaming we work on near real time data. It divides the received input stream into batches. The Spark Engine processes the batches and generate final output in batches.

Read on to understand the key mechanisms behind Spark Streaming.

Comments closed

Tips for Optimizing Columnstore Indexes

Ed Pollack continues a series on columnstore indexes:

This is worth a second mention: Avoid updates at all costs! Columnstore indexes do not treat updates efficiently. Sometimes they will perform well, especially against smaller tables, but against a large columnstore index, updates can be extremely expensive.

If data must be updated, structure it as a single delete operation followed by a single insert operation. This will take far less time to execute, cause less contention, and consume far fewer system resources.

Read on for several more tips along these lines.

Comments closed