Press "Enter" to skip to content

Day: December 20, 2024

Using complete.cases in R

Steven Sanderson has no time for missing data:

Data analysis in R often involves dealing with missing values, which can significantly impact the quality of your results. The complete.cases function in R is an essential tool for handling missing data effectively. This comprehensive guide will walk you through everything you need to know about using complete.cases in R, from basic concepts to advanced applications.

Using complete.cases to find observations with missing values is great. Using it to eliminate observations with missing values can sometimes be helpful, depending on just how many missing values you have.

Leave a Comment

Shrinking ONNX Files

Pete Warden breaks out the shrink ray:

I’ve been using the ONNX Runtime a lot recently, and while it has been a lot of fun, there are a few things I’ve missed from the TensorFlow Lite world. The biggest (no pun intended) is the lack of tools to shrink the model file size, something that’s always been essential in the mobile app world. You can quantize using the standard ONNX tools, but in my experience you’ll often run into accuracy problems because all of the calculations are done at lower precision. These are usually fixable, but require some time and effort.

Read on for Pete’s preferred alternative and a new tool to help with this.

Leave a Comment

Scanning Fabric Workspaces via Semantic Link Labs

Sandeep Pawar takes us through the Scanner API:

It’s finally here! Thanks to Michael Kovalsky, one of the most requested & anticipated APIs in now available in Semantic Link Labs (v0.8.10) – the Scanner API. The Scanner API in Fabric Admin REST APIs allows Fabric administrators to retrieve detailed metadata about their organization’s Fabric items, supporting governance and compliance efforts. It provides information such as item names, descriptions, date created, lineage, connection strings etc. It’s not new, we have been using it in Power BI for a long time but in the Fabric world, it’s even more important given the number of items and configurations.

Read on to see what’s available and how this works.

Leave a Comment

Vector Search in Azure Databases

Paul Hernandez describes the current state of production-ready vector search options in Azure:

Vector databases and vector search are becoming increasingly important in modern applications due to their ability to handle complex and high-dimensional data efficiently. In today’s data-driven world, applications such as recommendation systems, image and video retrieval, natural language processing, and anomaly detection rely heavily on the ability to search and analyze large volumes of data quickly and accurately. Vector databases store data in the form of vectors, which allows for more sophisticated and nuanced searches compared to traditional databases. Vector search techniques enable these applications to find similar items, detect patterns, and make predictions by comparing the distances between vectors. This capability is crucial for delivering personalized user experiences, improving search accuracy, and enhancing overall application performance. As a result, vector databases and vector search are essential components in the toolkit of modern data scientists and engineers.

In this article, we will discuss several Azure services that support vector search, including Azure Database for PostgreSQL Flexible Server, Azure Cosmos DB, and Azure Cognitive Search. Each of these services offers unique features and capabilities that make them suitable for implementing vector search in various applications.

Click through for details, as well as links to more resources. Paul didn’t include Azure SQL Database’s vector capabilities, though that’s in preview right now and I’m not actually sure how well it will perform compared to these other options.

2 Comments