Kevin Feasel – Page 602

Making Kafka Clients Faster

Published 2022-03-11 by Kevin Feasel

Yeva Byzek has a few whitepaper recommendations:

Over the years, incredible technical content has been written about data plane performance, general principles and tradeoffs, cloud-native architectures, etc. These writings describe how you can get low latency and high throughput without compromising on a mature and reliable platform that provides persistence, no data loss, audit logs, processing logs, and more—all the things that enable you to go from proof of concept to production. This blog post highlights the top five reading recommendations to help you gain a deeper understanding of what makes applications that run on Confluent Cloud so fast. They cover the key concepts and provide concrete examples of how we do it, and how you can do it too, with specific benchmark testing and configuration guidelines.

Click through for links to those resources.

Comments closed

Azure Purview Announcements

Published 2022-03-11 by Kevin Feasel

Wolfgang Strasser looks at some of the new Azure Purview announcements:

For me, Data lineage is one of those fascinating techniques to better understand your data estate and get a better knowledge how systems are connected and what data flows are there in your data landscape.
Lineage was there in Azure Purview since the beginning (Azure Data Factory, SSIS lineage, Power BI) but this week another very important part of data lineage was put into public preview: Dynamic Lineage Extraction from Azure SQL Databases.

Read on for more information.

Comments closed

Space Invaders on Kubernetes via Helm

Published 2022-03-11 by Kevin Feasel

Andrew Pruski destroys alien Kubernetes pods at work:

A while ago I blogged about an awesome Chaos Engineering tools built by Eugenio Marzo (t) call KubeInvaders.
Since then Eugenio has updated the repo to make it easier to deploy KubeInvaders using Helm! So here’s how to deploy KubeInvaders to Azure Kubernetes Service using Helm.

This is really fun to watch in action.

Comments closed

Database Code Reviews: a Process

Published 2022-03-11 by Kevin Feasel

Kenneth Fisher reviews some code:

I’ll be honest, ever since I did a SQL Homework about doing code reviews I’ve wanted to do a blog post about them. Recently Emily Krager (TikTok | Twitter) did a TikTok about code review suggestions which seemed like a good excuse for me to do this. If you don’t follow her I recommend it, she does a great job of combining humor and technology and is just a lot of fun to listen to. Here is her list as best I was able to transcribe it.

Click through for Kenneth’s thoughts on the topic.

Comments closed

Queries and Batch Mode

Published 2022-03-11 by Kevin Feasel

Erik Darling takes us on a batch mode joyride:

Prior to SQL Server 2019, you needed to have a columnstore index present somewhere for batch mode to kick in for a query.
Somewhere is, of course, pretty loose. Just having one on a table used in a query is often enough, even if a different index from the table is ultimately used.

Batch mode is pretty great and Erik explains why.

Comments closed

Column References in DAX

Published 2022-03-11 by Kevin Feasel

Teo Lachev makes a reference:

Suppose you use a DAX table variable, such as to group by certain columns and add an extension column as a calculation. Then, you want to count the rows in the table by filtering on one of the columns. At your first attempt, you might try using CALCULATE.

That doesn’t work and Teo explains why, as well as what you do need to use.

Comments closed

Snowflake Purchases Streamlit

Published 2022-03-10 by Kevin Feasel

Alex Woodie reports on a purchase:

Cloud data warehousing giant Snowflake showed it’s serious about Python and data science this week when it announced that it plans to spend $800 million to buy Streamlit, a provider of Python-based tools for rapidly developing interactive data applications on the Web.
Co-founded in San Francisco in 2018 by Adrien Treuille, Amanda Kelly, and Thiago Teixeira, Streamlit develops an open source framework of the same name that allows data scientists and machine learning engineers to create and deploy data applications. The software is compatible with other Python-based frameworks, such as NumPy, Pandas, Matplotlib, and Scikit-learn, and uses React to render screens on the front-end.

Streamlit is nice. $800 million nice? That’s a good question.

Comments closed

Map and FlatMap in Spark

Published 2022-03-10 by Kevin Feasel

The Hadoop in Real World team maps some knowledge:

Let’s look at map() first. map() transforms and RDD with N elements to RDD with N elements. Important thing to note is each element is transformed into another element there by the resultant RDD will have the same elements as before.

Click through to see how map() and flatMap() differ.

Comments closed

Power BI Decomposition Trees

Published 2022-03-10 by Kevin Feasel

Gauri Mahajan shows off decomposition trees in Power BI:

A large volume and variety of data generally need data profiling to understand the nature of data. One of the aspects of data is hierarchy and inter-relationships within different attributes in data. Hierarchical data is often nested at multiple levels. To analyze the relationship between different attributes in a data that is hierarchical, drill-down and drill-through are two of the most common techniques that are employed for data exploration as well as use-cases like root cause analysis. While these techniques are standard and have been in the industry for quite a long time, figuring out these relationships and navigating hierarchical data can be a challenging task. Data Analysts or Business Analysts typically perform this analysis on the data before presenting it to the end-users. In certain cases, some domain or business users may be required to perform such analysis on the report itself. In that case, the task becomes even more challenging considering the limited data analysis capabilities offered by a reporting tool compared to a database and query languages like SQL. To help power users perform such analysis on a reporting tool, visualizations like decomposition trees can be used to decompose hierarchical data that is presented in an aggregated manner. The Decomposition tree can support both drill-down as well as drill-through use-cases when the user is provided the flexibility to choose the hierarchy or dimensions on-demand. In the Microsoft technology stack, Power BI is the key reporting tool for authoring reports and supports a wide variety of data sources. Power BI offers a category of visuals which are known as AI visuals. One such visual in this category is the Decomposition Tree.

Read on to see how you can create a decomposition tree, what kind of information it shows, and how you can interact with it to learn more about correlations and causes.

Comments closed

Azure Data Studio Execution Plans

Published 2022-03-10 by Kevin Feasel

Hugo Kornelis is happy (for now):

But I am not writing this post to moan about past issues. I am writing this post because Microsoft has made huge improvements to execution plan support in ADS. These are officially still in preview, but they are already available. However, you will need to take a few steps to see these improvements in action.

Read on to see what you need to do and to get Hugo’s initial thoughts.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Author: Kevin Feasel