Press "Enter" to skip to content

Day: September 12, 2024

Working with lapply() in R

Steven Sanderson applies a function:

R is a powerful programming language primarily used for statistical computing and data analysis. Among its many features, the lapply() function stands out as a versatile tool for simplifying code and reducing redundancy. Whether you’re working with lists, vectors, or data frames, understanding how to use lapply() effectively can greatly enhance your programming efficiency. For beginners, mastering lapply() is a crucial step in becoming proficient in R.

Read on to see how lapply() works.

Comments closed

The Importance of Versioning Data

John Mount demonstrates an important concept:

Our business goal is to build a model relating attendance to popcorn sales, which we will apply to future data in order to predict future popcorn sales. This allows us to plan staffing and purchasing, and also to predict snack bar revenue.

In the above example data, all dates in August of 2024 are “in the past” (available as training and test/validation data) and all dates in September of 2024 are “in the future” (dates we want to make predictions for). The movie attendance service we are subscribing to supplies

  • past schedules
  • past (recorded) attendance
  • future schedules, and
  • (estimated) future attendance.

John’s example scenario covers the problem of future estimations interfering with model quality. Another important scenario is when the past changes. As one example, digital marketing providers (think Google, Bing, Amazon, etc.) will provide you impression and click data pretty quickly, and each day they close the books on a prior day’s data at some normal time. For some of these providers, that prior day’s data is yesterday’s data—on Tuesday, provider X closes the books on Monday’s data and promises that it won’t change after that. But for other providers, they might change data over the course of the next 10 days. This means that the data you’re using for model training might change from under you, and you might never know if you don’t keep track of the actual data you used for training at the time of training.

Comments closed

Finding Missing Indexes in SQL Server

Jared Westover goes searching for where those missing indexes got off to:

In the past, while using the missing index Dynamic Management Views (DMVs), something always seemed to be missing from the results. It was hard to put my finger on it then, but looking back, it now seems obvious. Why can’t we see the queries prompting SQL Server to give suggestions? Did you know Microsoft added a DMV with the query text? Since discovering this gem, we no longer need to search through Plan Cache or Query Store.

Click through for the article, but do especially read the list of limitations Jared links to in the summary section before going off and creating a bunch of indexes.

Comments closed

A Reminder for Server Consistency

Chad Callihan resolves an issue:

I connected to the latest SQL Server, opened SSMS, and tried to restore from there. Sure enough, I was presented with the error:

Cannot access the specified path or file on the server. Verify that you have the necessary security privileges and that the path or file exists.

If you know that the service account can access a specific file, type in the full path for the file in the File Name control in the Locate dialog box.

Read on for the solution, which was easy enough, but serves as a reminder that having (and occasionally running!) idempotent configuration scripts can be quite useful.

Comments closed

Implicit Conversions in SQL Server

Vlad Drumea explains what it means implicitly to convert:

If you’re here, you most likely know what a data type conversion is, but, in short, it’s the operation of converting a value from one data type to another.

There are two types of conversions in SQL Server:

  • explicit – which are done by explicitly applying the CAST and CONVERT functions on a column, variable, or value.
  • implicit – when CAST and CONVERT are not used explicitly, but SQL Server ends up doing the conversation behind the scenes due to two distinct data types being compared.

Read on to learn more about which types of implicit conversion are relevant for performance and what you can do instead.

Comments closed

Migrating Power BI Dataflows from Gen1 to Gen2

Reza Rad talks migration:

Unfortunately, there isn’t a migration tool to convert your Power BI dataflow (gen1) to Microsoft Fabric dataflow (gen2). If you have Fabric capacity licenses, it just makes sense to do that migration because Dataflow Gen2 gives you data destinations into four destinations, which we don’t have in Dataflow Gen1. However, converting Gen1 to Gen2 isn’t that complicated. The process is explained in this blog and video.

Click through for the blog post and the video.

Comments closed