Press "Enter" to skip to content

Day: November 2, 2022

Learning to Count in R

Jerry Tuttle does the math:

You would think base R would have a count function such as count(df$Team) and count(df$Team == “NYY”) but this gives the error “could not find function ‘count’”. Base R does not have a count function. Base R has at last four ways to perform a count:

Click through to learn the different ways available to you, including those built into R itself as well as other packages like dplyr. H/T R-Bloggers.

Comments closed

Installing Third-Party WHL Packages in Synapse with DEP

Sabyasachi Samaddar walks through what I consider a real difficulty:

It is really challenging when you need to install third-party .whl packages into a DEP-enabled Azure Synapse Spark Instance.

According to the documentation, https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-portal-add-librar… Installing packages from PyPI is not supported within DEP-enabled workspaces. Hence we cannot just upload the .whl packages into the workspace. We need to upload all the dependencies along with the .whl package and it will be an offline installation. Now Synapse spark clusters come with in-built packages and hence we may find some conflicts when we try to install some third-party packages.

Read on to see what you need to do.

Comments closed

Lessons Learned from Index Tuning

Lee Markum has seven lessons for us:

SQL Server indexing basics are critical to query and server performance. Resources, like CPU and disk, are affected by the indexes that you have, or the ones you’re missing.

In the StackOverflow2013 database we’re going to look at Badges and users. Specifically, I want to start by seeing what badges a user has and when that user received them. Some badges, because of the type of badge it is, can be awarded more than once.

Click through for a demonstration.

Comments closed

Using the T-SQL OUTPUT Clause

Chad Callihan doesn’t make two calls:

Are you familiar with the OUTPUT clause in SQL Server? It can be used with commands like INSERT, UPDATE, DELETE, and MERGE. It can also be used with setting variables in stored procedures. Using the tried and true StackOverflow2013, we’ll narrow it down today to focus on how INSERT/DELETE are typically used for logging table changes as well as an example of how to use OUTPUT with stored procedures.

For really busy transactional systems, this provides a nice boost over making an update and then selecting the new values.

Comments closed

Snapshot Fact Tables in a Data Warehouse

Alex Crampton explains how snapshot fact tables work in data warehousing:

The typical fact table measures activities and is known as a transaction fact table. They support a wide variety of analytic possibilities and can be used to capture detailed information about a particular process. Certain facts cannot be studied easily using this kind of design, if at all.

This blog will outline the characteristics of a transaction fact table vs those of a snapshot fact table, and when the need for a snapshot fact table arises.

Snapshot-based fact tables aren’t ideal for data load times (especially as the table gets large) but they are useful in specific circumstances, as Alex points out.

Comments closed