Press "Enter" to skip to content

Month: December 2024

Using complete.cases in R

Steven Sanderson has no time for missing data:

Data analysis in R often involves dealing with missing values, which can significantly impact the quality of your results. The complete.cases function in R is an essential tool for handling missing data effectively. This comprehensive guide will walk you through everything you need to know about using complete.cases in R, from basic concepts to advanced applications.

Using complete.cases to find observations with missing values is great. Using it to eliminate observations with missing values can sometimes be helpful, depending on just how many missing values you have.

Leave a Comment

Shrinking ONNX Files

Pete Warden breaks out the shrink ray:

I’ve been using the ONNX Runtime a lot recently, and while it has been a lot of fun, there are a few things I’ve missed from the TensorFlow Lite world. The biggest (no pun intended) is the lack of tools to shrink the model file size, something that’s always been essential in the mobile app world. You can quantize using the standard ONNX tools, but in my experience you’ll often run into accuracy problems because all of the calculations are done at lower precision. These are usually fixable, but require some time and effort.

Read on for Pete’s preferred alternative and a new tool to help with this.

Leave a Comment

Scanning Fabric Workspaces via Semantic Link Labs

Sandeep Pawar takes us through the Scanner API:

It’s finally here! Thanks to Michael Kovalsky, one of the most requested & anticipated APIs in now available in Semantic Link Labs (v0.8.10) – the Scanner API. The Scanner API in Fabric Admin REST APIs allows Fabric administrators to retrieve detailed metadata about their organization’s Fabric items, supporting governance and compliance efforts. It provides information such as item names, descriptions, date created, lineage, connection strings etc. It’s not new, we have been using it in Power BI for a long time but in the Fabric world, it’s even more important given the number of items and configurations.

Read on to see what’s available and how this works.

Leave a Comment

Vector Search in Azure Databases

Paul Hernandez describes the current state of production-ready vector search options in Azure:

Vector databases and vector search are becoming increasingly important in modern applications due to their ability to handle complex and high-dimensional data efficiently. In today’s data-driven world, applications such as recommendation systems, image and video retrieval, natural language processing, and anomaly detection rely heavily on the ability to search and analyze large volumes of data quickly and accurately. Vector databases store data in the form of vectors, which allows for more sophisticated and nuanced searches compared to traditional databases. Vector search techniques enable these applications to find similar items, detect patterns, and make predictions by comparing the distances between vectors. This capability is crucial for delivering personalized user experiences, improving search accuracy, and enhancing overall application performance. As a result, vector databases and vector search are essential components in the toolkit of modern data scientists and engineers.

In this article, we will discuss several Azure services that support vector search, including Azure Database for PostgreSQL Flexible Server, Azure Cosmos DB, and Azure Cognitive Search. Each of these services offers unique features and capabilities that make them suitable for implementing vector search in various applications.

Click through for details, as well as links to more resources. Paul didn’t include Azure SQL Database’s vector capabilities, though that’s in preview right now and I’m not actually sure how well it will perform compared to these other options.

2 Comments

An Overview of Azure OpenAI and the Azure AI Hub

Tomaz Kastrun has a pair of posts. First up, an overview of Azure OpenAI:

Let’s first address the elephant in the room. We have explored the Azure AI Foundry and the we have also Azure OpenAI. So what is the core difference? Let’s take a look:

The services in the back:

  • Azure AI Services has much broader AI capabilities and simpler integration into applications and usage of the real world. With mostly pre-build API for all services (face recognition, document recognition, speech recognition, computer vision, image recognition, and more) that will allow better interoperabilty and and connection to machine learning services (Azure Machine Learning Service).
  • Azure OpenAI is focusing primarly on OpenAI LLM models (Azure AI services supports many others) and provides great agents for conversations, content tools, RAG and natural language services.

After that comes an overview of the Azure AI Hub and AI projects:

In AI Foundry portal, hubs provide the environment for a team to collaborate and organize work, and help you as a team lead or IT admin centrally set up security settings and govern usage and spend. You can create and manage a hub from the Azure portal or from the AI Foundry portal, and then your developers can create projects from the hub.

In essence, Hubs are the primary top-level Azure resource for AI Foundry. Their purpose is to to govern security, connectivity, and computing resources across playgrounds and projects.

Leave a Comment

Useful PostgreSQL Administrative Queries

Shane Borden shares some queries:

In the spirit of the holiday season, I thought I would write a quick post regarding some of my favorite queries that I use on a day to day basis working on Postgres. Some of these queries I have developed and others were found on the internet (hat tip to those who have previously posted) and further refined.

Click through for several useful queries, as well as a link to a GitHub repo that Shane maintains, containing plenty more.

Leave a Comment