Press "Enter" to skip to content

Author: Kevin Feasel

Mathematical Transformations of Data in R

Steven Sanderson does the math:

Data transformation is a fundamental technique in statistical analysis and data preprocessing. When working with R, understanding how to properly transform data can help meet statistical assumptions, normalize distributions, and improve the accuracy of your analyses. This comprehensive guide will walk you through implementing and visualizing the most common data transformations in R: logarithmic, square root, and cube root transformations, using only base R functions.

Click through for examples.

Comments closed

Azure AI Foundry Notes

Tomaz Kastrun wraps up a series on Azure AI. First up is tracing in Azure AI Foundry:

Tracing is a powerful tool that offers developers an in-depth understanding of the execution process of their generative AI applications. Though still in preview (in the time of writing this post), It provides a detailed view of the execution flow of the application and the essential information for debugging or optimisations.

After that, we can see how to evaluate model results:

With evaluation you performing iterative, systematic evaluations with the right evaluators and measure and address potential response quality, safety, or security concerns throughout the AI development lifecycle, from initial model selection through post-production monitoring.

With the Evaluation in Azure AI Foundry, you can evaluation the GenAI Ops Lifecycle production. In addition, it also gives you the ability to  assess the frequency and severity of content risks or undesirable behavior in AI responses.

Finally, Tomaz wraps up the series with some notes on documentation:

Documentation and material for Azure AI Foundry are plentiful and growing on a daily basis, since the topic on AI and GenAI is evermore so popular.

I appreciate the challenge that Tomaz has of putting together 25 blog posts in a month, especially when they’re all tied to a single theme.

Comments closed

Bulk Inserts and High Unused Space in SQL Server Tables

Vitaly Bruk works through an issue:

High allocated unused space is storage assigned to a SQL Server table that isn’t used. This condition often indicates internal fragmentation. Free space is present within allocated pages. Such fragmentation leads to inefficient storage and can degrade database performance.

Read on for an explanation of the issue, followed by a real-world situation whose ultimate cause was bulk insert operations.

Comments closed

Session, DataFrameWriter, and Table Configurations in Spark

Miles Cole makes a configuration change:

With Spark and Delta Lake, just like with Hudi and Iceberg, there are several ways to enable or disable settings that impact how tables are created. These settings may affect data layout or table format features, but it can be confusing to understand why different methods exist, when each should be used, and how property inheritance works.

While platform defaults should account for most use cases, Spark provides flexibility to optimize various workloads, whether adjusting for read or write performance, or for hot or cold path data processing. Inevitably, the need to adjust configurations from the default will arise. So, how do we do this effectively?

Read on to learn how.

Comments closed

Prompt Flow in Azure AI

Tomaz Kastrun continues a series on Azure AI. First up is an introduction to Prompt Flow:

Prompt flow in Azure AI Foundry is development tool for designing the flows (streamlines) for the complete end-to-end development cycle of LLM’s AI application. You can create, iterate, test, orchestrate, debug, and monitor your flows.

After that, we get a demonstration a Prompt Flow in Python:

Prompty gives you the ability to create an end-to-end solution, like RAG where you can chat with LLM over an article or document, where you can ask to classify the input data (list of URLs,…)

Prompty is a markdown file, structured in YAML and encapsulates a series of metadata fields pivotal for defining the model’s configuration and the inputs. After this front matter is the prompt template, articulated in the Jinja format.

Comments closed

Securing a Kafka Ecosystem

Riya has a breakdown of how to protect your Apache Kafka installation and resources around it:

Apache Kafka is the backbone of many real-time data pipelines, making security an essential aspect of its deployment. Protecting your Kafka ecosystem involves implementing encryption to safeguard data, authentication to verify user identities, and authorization to control access. This guide provides a comprehensive overview of these three pillars of securing Kafka, complete with code examples to help you implement best practices.

Click through for demonstrations of encryption, authentication, and authorization.

Comments closed

The Cost of Everything, Cloud Edition

Kevin Sookocheff noodles on a core concept:

At AWS re:Invent 2023, Amazon CTO Werner Vogels delivered a talk on the laws of frugal architecture. While I initially filed away those insights to review later, a year of cloud architecture experience crystallized a fundamental truth: in cloud computing, cost isn’t just a financial consideration — it is a first-class architectural concern through which we should design and optimize our systems.

Cloud providers charge for every conceivable resource: servers, API calls, data transfer, and computational milliseconds. But cost is more than just a line item on a monthly bill, it is a powerful forcing function that drives better architectural decisions.

Read on for more thoughts on the matter.

Comments closed

Writing Loops in Powershell

Adam Bertram is spinning in circles:

One of PowerShell’s key features is its ability to loop through data collections, enabling you to execute a code block repeatedly.

In this tutorial, we will explore three fundamental loop structures in PowerShell: foreachfor, and do loops.

By the end of this tutorial, you will have a solid understanding of how to use these loops to handle repetitive tasks and process collections of data more effectively.

Click through for the tutorial.

Comments closed

Filter and Parameter Passing from Power BI to Power Apps

Jon Voege runs into a problem:

A simple comment App embedded in Power BI was supposed to show all the comments in a Sharepoint List which matched the filters set in Power BI. But…. Not all comments were showing up, even though the Sharepoint List only held about 10 comments at the time!

Why you ask? Well, the issue stems from the fact that the table of data being sent from Power BI to Power Apps, the PowerBIIntegration.Data object, has a hard limitation of 1000 rows. And those 1000 rows, is all that we get to work with.

Read on to learn more, as well as a workaround to this issue.

Comments closed