June 2020 – Curated SQL

Using Flink in Zeppelin Notebooks

Published 2020-06-30 by Kevin Feasel

Jeff Zhang continues a series on using Apache Flink in Zeppelin Notebooks:

With Zeppelin, you can build a real time streaming dashboard without writing any line of javascript/html/css code.
Overall, Zeppelin supports 3 kinds of streaming data analytics:
– Single Mode
– Update Mode
– Append Mode

Read on for examples of each of these, as well as a few tips around user-defined functions.

Comments closed

Cost-Cutting in Confluent Platform

Published 2020-06-30 by Kevin Feasel

Nick Bryan shares some techniques for reducing the cost of running on Confluent Platform:

To start, there are several Confluent Platform features that can greatly reduce your Kafka cluster’s infrastructure footprint. For use cases involving high data ingestion rates, lengthy data retention periods, or stringent disaster recovery requirements, Confluent Platform can help to reduce infrastructure costs by up to 50%.
One of the most important features for this cost category is Tiered Storage.

Read on for a few tips.

Comments closed

Improvements in Spark 3.0

Published 2020-06-30 by Kevin Feasel

Alex Woodie covers some of the improvements available to us in Apache Spark 3.0:

The addition of join hints further enhances the accuracy of the compiler when the built-in algorithms deliver a suboptimal plan. “When the compiler is unable to make the best choice, users can use join hints to influence the optimizer to choose a better plan.”

Ah, join hints—the double-edged sword.

Comments closed

Good Practices for Naming Things in Power BI

Published 2020-06-30 by Kevin Feasel

Chris Webb shares some thoughts on the power of names:

What’s wrong with this picture? Look at the names:
– The tables and columns have the same names that they had in the data source, in this case a SQL Server database. Note the table name prefixes of “Dim” for dimensions and “Fact” for fact tables.
– The column and measure names either don’t have spaces or use underscores instead of spaces.
– What on earth does the measure name _PxSysF even mean?

Chris mentions that some of the ideas in the post may be controversial, but to be honest, I don’t think any of them are. The important thing here is to keep your audience in mind.

Comments closed

Mounting a Disk Image in Powershell

Published 2020-06-30 by Kevin Feasel

Jack Vamvas shows us how we can mount a disk image from ISO in Powershell:

I want to set up a script to Mount a Disk in an automated way utilising Powershell ? The image exists as an ISO on a network path and requires to be made available as a drive letter & path. It doesn’t have to be a dedicated drive letter – just the next letter after the highest. So for example , if I already have E:, F:,G: than I want it to be set as I:

For no extra charge, Jack also shows us how to dismount a disk image.

Comments closed

Conditional Formatting and Charts in Excel via Powershell

Published 2020-06-30 by Kevin Feasel

Mikey Bronowski continues a series on using Powershell to create Excel spreadsheets:

Last week I have introduced you to ImportExcel PowerShell module and its capability to manipulate the worksheets and create pivot tables and pivot charts. This week let’s jump on some other features: conditional formatting and charts.

There’s a lot you can do with this module.

Comments closed

Backups on AWS RDS

Published 2020-06-30 by Kevin Feasel

Grant Fritchey shows how you can back up a database on Amazon’s RDS:

Which results in the following:
Msg 262, Level 14, State 1, Line 1
BACKUP DATABASE permission denied in database ‘HamShackRadio’.
Msg 3013, Level 16, State 1, Line 1
BACKUP DATABASE is terminating abnormally.

Completion time: 2020-06-26T08:34:23.5511314-04:00
In short, by default, you can’t backup SQL Server databases on RDS. However, that’s by default. We can make some changes.

Read on to see the proper way of backing up a database hosted in RDS.

Comments closed

Another Way to Calculate Elapsed Business Hours with DAX

Published 2020-06-30 by Kevin Feasel

Matt Allington follows up on a previous post:

Then, sometimes (like this time) I discover that someone has a better way to solve the same problem that I shared on my blog. This is what happened last week after I shared my first article about how to calculate the total business hours between 2 date/time stamps. I shared the way I solved this problem last week, but one of my readers, Daniil Bogomazov, shared a brilliant alternative solution to the same problem. The solution is so good that I am sharing his solution with you here today.

Read on for a clever solution and a detailed comparison to Matt’s prior answer.

Comments closed

The Basics of Spark Streaming

Published 2020-06-29 by Kevin Feasel

Muskan Gupta gives us an introduction to Spark Streaming:

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. It was added to Apache Spark in 2013. We can get data from many sources such as Kafka, Flume etc. and process it using functions such as map, reduce etc. After processing we can push data to filesystem, databases and even to live dashboards.
In Spark Streaming we work on near real time data. It divides the received input stream into batches. The Spark Engine processes the batches and generate final output in batches.

Read on to understand the key mechanisms behind Spark Streaming.

Comments closed

Tips for Optimizing Columnstore Indexes

Published 2020-06-29 by Kevin Feasel

Ed Pollack continues a series on columnstore indexes:

This is worth a second mention: Avoid updates at all costs! Columnstore indexes do not treat updates efficiently. Sometimes they will perform well, especially against smaller tables, but against a large columnstore index, updates can be extremely expensive.
If data must be updated, structure it as a single delete operation followed by a single insert operation. This will take far less time to execute, cause less contention, and consume far fewer system resources.

Read on for several more tips along these lines.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Month: June 2020