Python – Page 16 – Curated SQL

Tips for Choosing a Classifier

Published 2024-07-03 by Kevin Feasel

In this video, I wrap up the series on classification and provide some quick-and-dirty tips on when to use each of the classification algorithms we have discussed.

This was a series I really enjoyed. I’ve had a talk on the topic for a few years, but getting the opportunity to dig in deeper and spend a few hours on the topic was nice. It also helped me fill in some gaps in my understanding and fix a few long-standing bugs in my demo code, so it’s got that going for it as well.

Comments closed

Suspend and Resume Microsoft Fabric Capacity

Published 2024-07-03 by Kevin Feasel

Olivier Van Steenlandt saves some cash:

With only a limited budget for exploring and testing new tools, I had to figure out how to use my budget efficiently. Therefore, before making any decisions, I looked at the Microsoft Fabric pricing and possibilities.

If you want to take a look at the Microsoft Fabric pricing models, you can find an overview via the following link: Microsoft Fabric – Pricing | Microsoft Azure

To avoid any surprises and to be as cost-effective as possible, I created an easy Python script that I can use to pause and start my Microsoft Fabric capacity, or better said resume and suspend.

I highly recommend this for any organization that does not need 24/7 uptime for Fabric capacity. If you run your system 12 hours a day instead of 24, it takes your F64 capacity from $8k a month to $4k.

Comments closed

SHAP and Additive Models

Published 2024-07-01 by Kevin Feasel

Michael Mayer answers a pair of related questions:

Within only a few years, SHAP (Shapley additive explanations) has emerged as the number 1 way to investigate black-box models. The basic idea is to decompose model predictions into additive contributions of the features in a fair way. Studying decompositions of many predictions allows to derive global properties of the model.

What happens if we apply SHAP algorithms to additive models? Why would this ever make sense?

Read on for the answers to these two questions.

Comments closed

Random Date Generation in Python

Published 2024-06-28 by Kevin Feasel

Chris LaGreca spits out some dates:

I often work with time series data and find it useful to have a variety of ways to randomly generate dates. This particular example is great for evenly distributed date partitions. Running the script below with the default arguments will output a list of random dates, one for each month of the year.

It looks like this is generating based off of a uniform distribution, which probably makes the most sense for “give me a day of the month” data generation.

Comments closed

New Video: Multi-Class Classification

Published 2024-06-26 by Kevin Feasel

I have a new video:

In this video, I get past two-class classification and explain how things differ in the multi-class world.

What’s really interesting is that, in many cases, when it comes to code, the answer is “not much.” That’s because libraries like scikit-learn do a lot to smooth over differences between single-class and multi-class classification. But there are still differences that can bite you if you don’t understand how the cases differ.

Comments closed

Writing Excel Spreadsheets to Disk in R and Python

Published 2024-06-24 by Kevin Feasel

Steven Sanderson saves some changes:

When working with data, exporting your results to an Excel file can be very handy. Today, I’ll show you how to write the iris dataset to an Excel file using R and Python. We will explore three R packages: writexl, openxlsx, and xlsx, and the openpyxl library in Python. Let’s dive in!

Click through for these four examples.

Comments closed

Excel File Handling in R and Python

Published 2024-06-17 by Kevin Feasel

Steven Sanderson shows us the ways:

If you often work with Excel files and are looking to streamline your data import and export processes, R and Python offer some powerful packages to help you. Here, I’ll introduce you to some essential tools in both R and Python that will make handling Excel files a breeze.

Read on for three options in R and two in Python.

Comments closed

Formatting DAX Measures via Semantic Link

Published 2024-06-14 by Kevin Feasel

Sandeep Pawar makes it look good:

Last year I shared a blog post on formatting DAX expressions using DAX Formatter API (thanks to SQL BI). I extend the same idea in this blog to format all and save the formatted DAX in all measures in a semantic model using semantic link.

Click through for an important warning, as well as how to do this.

Comments closed

Combining Flink SQL, Streamlit, and Kafka

Published 2024-06-13 by Kevin Feasel

Lucia Cerchie has a pair of posts. First up, Lucia sets the stage:

n part 1 of this series, we’ll make an app, hosted on Streamlit, that allows a user to select a stock, in this case SPY, or the SPDR S&P 500 ETF Trust. Upon selection, a live chart of the stock’s bid prices, calculated every five seconds, will appear.

What are the pieces that go into making this work? The source of the data is the Alpaca Market Data API. We’ll hook up a Kafka producer to the websocket stream and send data to a Kafka topic in Confluent Cloud. Then we’ll use Flink SQL within Confluent Cloud’s Flink SQL workspace to tumble an average bid price every five seconds. Finally, we’ll use a Kafka consumer to receive that data and populate it to a Streamlit component in real time. This frontend component will be deployed on Streamlit as well.

Part 2 then closes the trap:

In part one of this series, we walked through how to use Streamlit, Apache Kafka®, and Apache Flink® to create a live data-driven user interface for a market data application to select a stock (e.g., SPY) and discussed the structure of the app at a high level. First, data with information on stock bid prices is moved via an Alpaca websocket, then, it’s produced to a Kafka topic in Confluent Cloud where it is also processed with Flink SQL.

Now comes the tricky part: running the Kafka consumer and producer in the same application.

Click through for a good demonstration of a practical solution. Lucia also has a GitHub repo with all of the code, a demo of the site in action, and some links to additional resources.

Comments closed

Tweedie Distributions and Generalized Linear Modeling

Published 2024-06-07 by Kevin Feasel

Christian Lorentzen talks about Tweedie distributions:

Tweedie distributions and Generalised Linear Models (GLM) have an intertwined relationship. While GLMs are, in my view, one of the best reference models for estimating expectations, Tweedie distributions lie at the heart of expectation estimation. In fact, basically all applied GLMs in practice use Tweedie distributions with three notable exceptions: the binomial, the multinomial and the negative binomial distribution.

Read on for a bit more about its history and how it ties in with several other distributions.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Category: Python