Press "Enter" to skip to content

Month: June 2024

An Introduction to healthyR

Steven Sanderson covers a package:

This article will introduce you to the healthyR package. healthyR is a package that provides functions for analyzing and visualizing health-related data. It is designed to make it easier for health professionals and researchers to work with health data in R. It is an experimental package that is still under active development, so some functions may change in the future along with the package structure and scope.

Unfortunately, the package needs some love and attention. Which I am trying to give it. Given that information, I will be updating the package to include more functions and improve the existing ones. I will also be updating the documentation and adding more examples to help users get started with the package.

So let’s get started!

Read on for that overview, including an explanation of why the package exists and several examples of how to use it.

Leave a Comment

An Intro to Vetiver in R

Colin Gillespie introduces an R package for MLOps:

Most R users are familiar with the classic workflow popularised by R for Data Science. Data scientists begin by importing and cleaning the data, then iteratively transform, model, and visualise it. Visualisation drives the modeling process, which in turn prompts new visualisations, and periodically, they summarise their work and report results.

Click through for a demonstration of how to create and deploy a model using vetiver.

Leave a Comment

Dynamic Warehouse and Lakehouse Connections in Data Pipelines

Koen Verbeeck doesn’t want to hard-code the connection string:

When you develop data pipelines in Microsoft Fabric (the Azure Data Factory equivalent in Fabric, not to be confused with deployment pipelines), you will most likely have some activities with a connection to a warehouse, a lakehouse or a KQL database (for the remainder of the blog post I’ll talk about a warehouse, but it can be any of those three data stores). For example, in a Script, Lookup, or Copy activity. When you deploy your data pipeline to another workspace – using, you might’ve guessed it, deployment pipelines – the pipeline itself is copied to the other workspace. E.g., we deploy a pipeline from the development workspace to the test workspace.

Read on to see what this means for warehouse connections and how you can work around the existing messiness.

Leave a Comment

Stop Long-Running SQL Agent Jobs

Lori Brown puts a halt to things:

I have always done this by having a monitoring job that executes on a schedule that runs at a time when you need other jobs to stop.  Of course, you need to be aware that stopping jobs can come with unwanted side effects of some data change that may be unfinished (there may be a rollback) and the stopped job will have to gracefully be re-run at another time.  You will also see the stopped job as cancelled in the job activity monitor.  And, hopefully you are aware that you can tell a job to stop but if it is doing work using a linked server, it may not stop as expected or it can take a while if it is rolling back a transaction.

Read on for an example of how to do this.

Leave a Comment

Linked Server Security Practices

David Seis locks down linked servers:

SQL Server Linked Servers provide a method to directly execute distributed queries on remote databases. However, they are not an ideal tool due to performance issues. If you decide to use them, it’s crucial to ensure they are secure. This post will outline some best practices for securing SQL Server Linked Servers.

There’s not an enormous amount you can do with security and linked servers. If you are looking for a bit more granularity in that regard, PolyBase could be a viable alternative…says the guy who wrote a book on PolyBase…

Leave a Comment

Saving Sensitive Parameters in SSIS Configurations

Andy Brownsword doesn’t just leave passwords in plaintext:

Configuring SSIS projects or packages can necessitate parametering information which may include sensitive values such as authentication details. Parameters are stored as plain text in the database by default. We’ll demonstrate how to protect these values using Sensitive parameters.

Read on to learn how to make an SSIS project parameter sensitive, as well as how to use them afterward.

Leave a Comment

The CLEAN Block in Powershell

Mike Robbins takes us through some relatively new functionality:

PowerShell, a powerful scripting language and automation framework, provides features that enhance script development and execution. Among these features is the clean block, a lesser-known yet beneficial component in PowerShell functions. This article explores the clean block, its purpose, and how to use it effectively in PowerShell scripts.

Read on to learn more about the block and how it works.

Leave a Comment

Combining Flink SQL, Streamlit, and Kafka

Lucia Cerchie has a pair of posts. First up, Lucia sets the stage:

n part 1 of this series, we’ll make an app, hosted on Streamlit, that allows a user to select a stock, in this case SPY, or the SPDR S&P 500 ETF Trust. Upon selection, a live chart of the stock’s bid prices, calculated every five seconds, will appear.

What are the pieces that go into making this work? The source of the data is the Alpaca Market Data API. We’ll hook up a Kafka producer to the websocket stream and send data to a Kafka topic in Confluent Cloud. Then we’ll use Flink SQL within Confluent Cloud’s Flink SQL workspace to tumble an average bid price every five seconds. Finally, we’ll use a Kafka consumer to receive that data and populate it to a Streamlit component in real time. This frontend component will be deployed on Streamlit as well.

Part 2 then closes the trap:

In part one of this series, we walked through how to use Streamlit, Apache Kafka®, and Apache Flink® to create a live data-driven user interface for a market data application to select a stock (e.g., SPY) and discussed the structure of the app at a high level. First, data with information on stock bid prices is moved via an Alpaca websocket, then, it’s produced to a Kafka topic in Confluent Cloud where it is also processed with Flink SQL. 

Now comes the tricky part: running the Kafka consumer and producer in the same application.

Click through for a good demonstration of a practical solution. Lucia also has a GitHub repo with all of the code, a demo of the site in action, and some links to additional resources.

Leave a Comment