April 2021 – Page 13

Granular Deployment of Power BI Changes with ALM Toolkit

Published 2021-04-07 by Kevin Feasel

Gilbert Quevauvilliers shows off an interesting scenario:

In this blog post I am going to demonstrate how to make a granular deployment where I will create a new column in my City table, and only deploy those changes.
What this means is that by deploying only the column change to my PPU dataset, I am only updating the column in the table.
This now saves me from doing the following tasks previously:
– Time taken to refresh the PBIX file so that the data is up to date.
– Re-uploading my PBIX.
– If configured re-creating the incremental refreshing
– Time and effort to upload and wait for dataset refresh.
– Quick updates to my dataset.
I will not have to worry about saving my PBIX file, file and if configured re-creating the incremental refreshing. This saves me a lot of time and effort.

Click through to see those steps in action.

Comments closed

Sending Large Messages in Kafka

Published 2021-04-06 by Kevin Feasel

The Hadoop in Real World team shows how you can send large messages in Apache Kafka:

By default the messages you can send and manage in Kafka should be less than 1 MB. To increase this limit there are few properties you need to change in both brokers and consumers.
Let’s say your messages can be up to 10 MB. So in this case your Kafka producers are producing messages up to 10 MB. So your Kafka Brokers and consumers should be able to store and receive messages up to 10 MB respectively.
Kafka Producer sends messages up to 10 MB ==> Kafka Broker allows, stores and manages messages up to 10 MB ==> Kafka Consumer receives messages up to 10 MB

Click through to see how, but also recognize that it’s usually a really bad idea to push large messages in a broker system. Even 1MB is probably going too far—I’d try to stay under 1KB if possible.

Comments closed

Visualizing Time Spent on Zoom Calls

Published 2021-04-06 by Kevin Feasel

John Lafleur has a method to ensure depression:

In this article, we will show you how you can understand how much your team leverages Zoom, or spends time in meetings, in a couple of minutes. We will be using Airbyte (an open-source data integration platform) and Tableau (a business intelligence and analytics software) for this tutorial.

Click through for the tutorial.

Comments closed

spkarlyr 1.6 Released

Published 2021-04-06 by Kevin Feasel

Carly Driggers announces a new release of sparklyr:

Sparklyr, an LF AI & Data Foundation Incubation Project, has released version 1.6! Sparklyr is an R Language package that lets you analyze data in Apache Spark, the well-known engine for big data processing, while using familiar tools in R. The R Language is widely used by data scientists and statisticians around the world and is known for its advanced features in statistical computing and graphics.

Click through to see the changes.

Comments closed

Synapse Studio in 5 Minutes

Published 2021-04-06 by Kevin Feasel

Kevin Chant wants 4 minutes and 58 seconds of your time:

In this post I want to do a five minute crash course about Synapse Studio. Because I have recently been asked to do this by colleagues.
In addition, I want to clear up some confusion about what you need to do before you can access Synapse Studio.
Aim of this post is for you will have a better overview of Synapse Studio within five minutes. Which happens to be the estimated reading time of this post.

Click through and be sure to start the stopwatch.

Comments closed

Embracing the XML

Published 2021-04-06 by Kevin Feasel

Grant Fritchey has some advice:

While XML is, without a doubt, a giant pain in the bottom, sometimes, the best way to deal with Extended Events is to simply embrace the XML.
Now, I know, just last week, I suggested ways to avoid the XML. I will freely admit, that is my default position. If I can avoid the XML, I will certainly do it. However, there are times where just embracing the XML works out nicely. Let’s talk about it a little.

Just need to do a little victory dance here. I didn’t explicitly say “embrace the XML” but close enough…

I think the biggest problem DBAs have with XML is that they end up treating it like a dreadful task: I need to shred XML for an extended event. But to do that, I have to learn how to query it using this quasi-language, and so they get stuck trying to fuss with something somebody else did, moving symbols around in the hopes that they get the right incantation. By contrast, a day or two really focusing in on how XQuery and XPath work would clarify a lot and make the process much simpler.

There is a fair counter-point in asking how often you’ll use this, and if the answer is “probably never,” then poke through and just try to get it working. But I’ve got a bit of bad news: “probably never” is probably wrong.

Comments closed

Deploying an Azure Arc Enabled Data Services Controller

Published 2021-04-06 by Kevin Feasel

Chris Adkin continues a series:

If you have been following this series, you should have:
– a basic understanding of Terraform
– a Kubernetes cluster that you can connect to using kubectl
– a basic understanding of Kubernetes services
– a working metalLB load balancer
– a basic understanding of how storage works in the world of Kubernetes
– a Kubernetes storage solution in the form of PX Store, alternatively you can use any solution (for the purposes of this series) which supports persistent volumes, however to use the backup solution in part 9 of the series you will need to use something that supports CSI

From here, Chris explains the importance of the data controller and then deploys one.

Comments closed

Columnstore, Strings, and Windowing Functions

Published 2021-04-06 by Kevin Feasel

Erik Darling has a tale to tell:

The only columns that we were really selecting from the Comments table were UserId and CreationDate, which are an integer and a datetime.
Those are relatively easy columns to deal with, both from the perspective of reading and sorting.
In order to show you how column selection can muck things up, we need to create a more appropriate column store index, add columns to the select list, and use a where clause to restrict the number of rows we’re sorting. Otherwise, we’ll get a 16GB memory grant for every query.

Read on to see how one little (or, well, big) string column can foul up the whole works.

Comments closed

Indexing for Physical Join Operators

Published 2021-04-06 by Kevin Feasel

Deepthi Goguri continues a series on physical join operators:

In the Part1 of decoding the physical join operators, we learned about the different types of physical operators: Nested loops, Merge joins and Hash joins. We have seen when they are useful and how to take advantage of each for the performance of our queries. We have also seen when they are useful and when they needs to be avoided.
In this part, we will know more about these operators and how the indexes really help these operator to perform better so the queries can execute faster.

Read on to see how to define indexes for each of the three physical operators.

Comments closed

Ways to Insert Data into a Hive Table

Published 2021-04-05 by Kevin Feasel

The Hadoop in Real World team has ways to insert data into Hive tables:

There are several different variations and ways when it comes to inserting or loading data into Hive tables. This post will cover 3 broad ways to insert or load data into Hive tables.

Click through for those methods.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

Month: April 2021