September 2021 – Page 6

A Primer on Prometheus

Published 2021-09-21 by Kevin Feasel

Nikhil Varghese provides an introduction to Prometheus:

Prometheus is an open-source monitoring system for processing time series metric data. It collects, organizes, and stores metrics using unique identifiers and timestamps. DevOps teams and developers query that data using PromQL and then visualize it in a UI such as Grafana.

Read on to learn more about the pieces fit together and some of the key terminology.

Comments closed

QDS Cache Cleanup in QDS Toolbox

Published 2021-09-21 by Kevin Feasel

Jared Poche continues a series on the QDS Toolbox:

When configuring Query Store, we have a few options for deciding how it retains data but little control over how it cleans up that data. We can set the max size of our query store, the max number of plans to keep per query, and how long to keep query statistics.
The QDS Cache Cleanup component of the QDS Toolbox gives us a number of other options for what data to remove:

Read on for that list and a good example of where it was useful.

Comments closed

Distribution Techniques in Azure Synapse Analytics

Published 2021-09-21 by Kevin Feasel

Gauri Mahajan takes us through three distribution techniques when working with Azure Synapse Analytics dedicated SQL pool tables:

Data warehouses host much larger volumes of data compared to transactional databases, the volume of reads is much more compared to writes and queries tend to result in much larger result sets compared to queries that retrieve scalar values or paginated record sets from transactional databases. Due to this nature of data warehouses, there is a higher impetus on the server to perform faster. Modern data warehouses like AWS Redshift, Azure Synapse, Snowflake and others employ approaches like data sharding where data is distributed horizontally on multiple nodes which process data in parallel. This approach is highly scalable as nodes can be easily added to a data cluster as the storage and performance need increases. One key aspect that is different for tables hosted on such data warehouses is that tables are distributed horizontally using different distribution algorithms, so that all the nodes in an Azure Synapse cluster have an equal share of responsibility for hosting, processing, and delivering data for any given query to maximize performance.
In this article, we will learn about the table distribution styles supported in an Azure Synapse and how to use them for creating distributed tables.

Read on to learn more. This is an example of something we don’t think about on the SQL Server side, so when moving to Azure Synapse Analytics dedicated SQL pools, it can be easy to get this wrong and end up with sub-optimal performance.

Comments closed

Power BI Data Models with Multiple Aggregation Tables

Published 2021-09-21 by Kevin Feasel

Phil Seamark continues a series on Power BI aggregations:

Why might you consider having more than one aggregation table? The short answer is speed and overall resource efficiency.
Consider a heavily used Power BI report where page-load time is considered critical. A typical report may have half a dozen visuals on a page showing values computed over various grains. If the model used by the report has no aggregation tables, all calculations use the raw fact tables to produce values for each metric.
Adding an aggregation table to the model allows the same calculations as before to use smaller tables to produce the same result. Calculations using smaller aggregation tables will enable the server hosting the data model to use much less effort per query.

It was interesting to see just how easy the process is.

Comments closed

Why You Should Learn about Containers

Published 2021-09-21 by Kevin Feasel

Grant Fritchey has an editorial:

I find myself doing more and more work with containers. Yet, I also find that a lot of people seem to be resistant to the concept. I’m always surprised when technologists reject technology without fully understanding what it does. Let’s talk about this just a little.

I completely agree with Grant’s assessment and want to pile on a bit about cloud services and containers. If you’re using or thinking of using services like Azure Machine Learning or Azure IoT Hub, that work is all containerized. And there’s a lot which runs in containers that we don’t even think about. You don’t need to live your life in containers or run everything from containers, but they’re useful in several contexts.

Comments closed

Deciding on Forced Parameterization or Optimize for Ad Hoc Workloads

Published 2021-09-21 by Kevin Feasel

Erik Darling hosts a showdown:

I often speak with people who are confused about what these settings do, and which one they should be using to fix certain problems.
The first myth is that Optimize For Ad Hoc Workloads has some special effect on queries run outside of stored procedures: it does not. It’s very poorly named in that regard. There are no special optimizations applied because of that setting.

Read the whole thing.

Comments closed

Rehearsing with PowerPoint Presenter Coach

Published 2021-09-21 by Kevin Feasel

Cathrine Wilhelmsen shares a tip for improving those presentation skills:

Did you know that PowerPoint can help you improve your presentation skills? 💡 If you rehearse with the PowerPoint Presenter Coach, you can get real-time feedback on things like your pace and language. If that’s too distracting (it is for me), you can choose to hide the real-time feedback and only view the detailed report at the end of your rehearsal.
The report shows you details about the total time spent, your pace and pitch over time, whether or not you are simply reading from your slides, as well as actionable feedback on your language. The feedback includes whether you are using too many filler words (like umm, actually, or you know), whether you are using too many repetitive words with alternatives you can use instead, whether you are using any words that can be offensive, and specific sentences that you can refine.

Click through for more details.

Comments closed

Accumulators in Spark

Published 2021-09-20 by Kevin Feasel

The Hadoop in Real World team explains what accumulators are in Spark:

Accumulators are like global variables in Spark application. In the real world, accumulators are used as counters and keep to keep track of something at an application level. Accumulators serve a very similar purpose as counters in MapReduce.

Read on for examples, as well as a warning against using them in a map() operation.

Comments closed

Latches to Know

Published 2021-09-20 by Kevin Feasel

Paul Randal wraps up a series on latches with a few miscellaneous entries:

When either a heap or an index is being accessed, internally there’s an object called a HeapDataSetSession or IndexDataSetSession, respectively. When a parallel scan is being performed, the threads doing the actual work of the scan each have a “child” dataset (another instance of the two objects I just described), and the main dataset, which is really controlling the scan, is called the “parent.”
When one of the scan worker threads has exhausted the set of rows it’s supposed to scan, it needs to get a new range by accessing the parent dataset, which means acquiring the ACCESS_METHODS_DATASET_PARENT latch in exclusive mode. While this can seem like a bottleneck, it’s not really, and there’s nothing you can do to stop the threads performing a parallel scan from occasionally showing a LATCH_EX wait for this latch.

Click through to read the whole thing.

Comments closed

Concatenating in SQL Server

Published 2021-09-20 by Kevin Feasel

Lee Markup takes us through a pair of very useful functions in SQL Server:

SQL Server concatenation methods have been enhanced in modern versions of SQL Server. SQL Server 2012 introduced the CONCAT() function. In SQL Server 2017 we get CONCAT_WS().
A common usage of concatenation, or joining column values together in a string, is combining a FirstName and LastName column into a FullName column. Another common usage might be for creating an address column that pulls together building number, street, city and zip code.

Read on to learn more. CONCAT() and CONCAT_WS() are also extremely helpful for change detection in ETL processes. For example, you might have a queue table to process and only want to update records in which relevant source fields changed, ignoring the ones which don’t exist in your destination. A combination of HASHBYTES() and CONCAT_WS() will do the trick quite nicely.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Month: September 2021