Press "Enter" to skip to content

Day: August 12, 2024

Counting Character Occurrences in R

Steven Sanderson counts the ways:

Counting the occurrences of a specific character within a string is a common task in data processing and text manipulation. Whether you’re working with base R or leveraging the power of packages like stringr or stringi, R provides efficient ways to accomplish this. In this post, we’ll explore how to do this using three different methods.

Read on for three separate examples.

Comments closed

An Overview of Cross-Validation

Vinod Chugani explains the benefit of cross-validation in a data science project:

Many beginners will initially rely on the train-test method to evaluate their models. This method is straightforward and seems to give a clear indication of how well a model performs on unseen data. However, this approach can often lead to an incomplete understanding of a model’s capabilities. In this blog, we’ll discuss why it’s important to go beyond the basic train-test split and how cross-validation can offer a more thorough evaluation of model performance. Join us as we guide you through the essential steps to achieve a deeper and more accurate assessment of your machine learning models.

Click through for the full article.

Comments closed

Avoid Mixing DATETIME with other Date Types

Paul White shares some advice:

Microsoft encourages us not to use the datetime data type: 

Avoid using datetime for new work. Instead, use the time, date, datetime2, and datetimeoffset data types. These types align with the SQL Standard, and are more portable. time, datetime2 and datetimeoffset provide more seconds precision. datetimeoffset provides time zone support for globally deployed applications.

Well, ok. Sensible and well-informed people might still choose to use datetime for performance reasons. Common date and time functions have optimised implementations in the SQL Server expression service for the datetime and smalldatetime data types.

Paul has posted the full article on X.

Comments closed

Storing Images in Kusto and Visualizing in Power BI or Data Explorer

Hauke Mallow shares what is probably a bad idea:

Kusto is a fast and scalable database designed to ingest, store, and analyze large volumes of structured and semi-structured data. For non-structured data like images, Azure Storage is typically the best choice. Databases can reference image data on storage via a URL, meaning images are not directly stored in Kusto. However, there are scenarios where storing image data in Kusto is beneficial. In this blog post, we will explore when it makes sense to store images in Kusto, how to store them, and how to visualize this data using Azure Data Explorer dashboards or Power BI.

I suppose the main benefit would be displaying images in Azure Data Explorer, as that tool might not support loading in external images from a storage account or other sane location. But this feels more like a neat parlor trick than something I’d actively recommend.

Comments closed

Monitoring DirectQuery Connection Openings in Power BI

Chris Webb digs into the numbers:

In the past I have blogged about how important the number of connections from a Power BI DirectQuery semantic model to a data source is for performance. It’s also true that in some cases opening a connection, or some of the operations associated with opening a connection, can be very slow. As a result it can be useful to see when your semantic model opens connections to a data source, and you can do this with Log Analytics.

Click through to see how you can do this and some of the information it provides.

Comments closed

SQL Server Compilation Time and Storage

Kendra Little explains how storage can affect query compilation time:

Up till now, I’ve thought of compilation time in SQL Server as being dependent only on CPU resources– not something that requires fast storage to be speedy. But that’s not quite right.

Slow storage can result in periodic long compile time in SQL Server. And long compile time not only extends the runtime for the query, it can also result in blocking with waits for compile locks.

Click through for more details, as well as a video by Erik Darling on compile-time locks.

Comments closed

Choosing between Add-Type and New-Object

Patrick Gruenauer contrasts two options in Powershell:

Predefined .NET classes: PowerShell makes certain predefined .NET classes directly available without you having to load them with “Add-Type”.

You can simply use “New-Object, to create instances of these classes. This includes many commonly used classes such as “System.String”, “System.DateTime”, “System.IO.FileInfo”

Read on for a few examples of this, as well as when you would want to use Add-Type instead.

Comments closed