Python – Page 20 – Curated SQL

Running Python in Excel

Published 2023-08-25 by Kevin Feasel

Excel-maker Microsoft and Anaconda, a key distributor of Python tools, unveiled a collaboration this week that will see Python integrated with Excel.

The new Anaconda Python Distribution in Excel, which is currently in beta, will bring Python data analysis and data science capabilities to the popular spreadsheet program from Microsoft. The integration will enable users to use a variety of Python libraries and tools to prep, manipulate, analyze, and visualize data in Excel.

It’s still in preview, but it is interesting to see.

Comments closed

A Brief Overview of 21 ETL Tools in Python

Published 2023-08-24 by Kevin Feasel

Adron Hall makes a list:

Here are summaries of each of the tools you’ve mentioned along with examples of how to implement the ETL (Extract, Transform, Load) process using each tool within a Python workflow:

Apache Spark: Apache Spark is a powerful open-source cluster-computing framework that provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. It’s commonly used for processing large-scale data and running complex ETL pipelines. Example Implementation:

Read on for summaries and samples for each of the 21 options.

Comments closed

Training a Code-First Model in Azure ML

Published 2023-08-23 by Kevin Feasel

I have a new video:

In this video, we walk through the code in an Azure Machine Learning project and see how the pieces fit together.

There are a few more videos to go in this Azure ML series and I would recommend going through them in order to understand how we got to this video, but this one is what I’ve been building toward.

Comments closed

SeamlessM4T: Multimodal Speech and Text Translation

Published 2023-08-23 by Kevin Feasel

Facebook has announced a new library:

Today, we’re introducing SeamlessM4T, the first all-in-one multimodal and multilingual AI translation model that allows people to communicate effortlessly through speech and text across different languages. SeamlessM4T supports:

Speech recognition for nearly 100 languages

Speech-to-text translation for nearly 100 input and output languages

Speech-to-speech translation, supporting nearly 100 input languages and 36 (including English) output languages

Text-to-text translation for nearly 100 languages

Text-to-speech translation, supporting nearly 100 input languages and 35 (including English) output languages

The open source library is available on GitHub and you can also get the model itself on HuggingFace. The nicest thing about all of this is that, unlike existing translation services, you can run it entirely offline and perform the inference on local compute.

Comments closed

Text-to-Video with Azure Open AI and Semantic Kernel

Published 2023-08-22 by Kevin Feasel

Sabyasachi Samaddar continues a series on generating video from a series of text prompts:

Welcome back to the second part of our journey into the world of Azure and OpenAI! In the first part, we explored how to transform text into video using Azure’s powerful AI capabilities. This time, we’re taking a step further by orchestrating our application flow with Semantic Kernel.

Semantic Kernel is a powerful tool that allows us to understand and manipulate the meaning of text in a more nuanced way. By using Semantic Kernel, we can create more sophisticated workflows and generate more meaningful results from our text-to-video transformation process.

In this part of the series, we will focus on how Semantic Kernel can enhance our application and provide a smoother, more efficient workflow. We’ll dive deep into its features, explore its benefits, and show you how it can revolutionize your text-to-video transformation process.

Read on for an understanding of how Semantic Kernel fits in and what you can do with it.

Comments closed

Shuffling Columns: R and Python Options

Published 2023-08-11 by Kevin Feasel

Tom Shafer does some testing:

Last year I benchmarked a few ways of shuffling columns in a data.table, but what about pandas? I didn’t know, so let’s revisit those tests and add a few more operations! pandas winds up being much more competitive than I expected.

Click through for those findings and the code Tom used for the task. H/T R-Bloggers.

Comments closed

Orchestrating Azure Data Explorer Queries via Apache Airflow

Published 2023-07-25 by Kevin Feasel

Michael Spector does some automation:

Apache Airflow is a widely used task orchestration framework, which gained its popularity due to Python-based programmatic interface – the language of first choice by Data engineers and Data ops. The framework allows defining complex pipelines that move data around different parts, potentially implemented using different technologies.

The following article shows how to setup managed instance of Apache Airflow and define a very simple DAG (direct acyclic graph) of tasks that does the following:

Uses Azure registered application to authenticate with the ADX cluster.

Schedules daily execution of a simple KQL query that calculates HTTP errors statistics based on Web log records for the last day.

Click through for the process.

Comments closed

Model Diagnostics in Python

Published 2023-07-18 by Kevin Feasel

Christian Lorentzen has released a new package:

Version 1.0.0 of the new Python package for model-diagnostics was just released on PyPI. If you use (machine learning or statistical or other) models to predict a mean, median, quantile or expectile, this library offers tools to assess the calibration of your models and to compare and decompose predictive model performance scores.

This looks like a really useful package, so check it out.

Comments closed

Parameterizing Jupyter Notebooks

Published 2023-07-07 by Kevin Feasel

John Mount shows off a feature:

I’d like to share a great new feature in the wvpy package (available at PyPi).

This package is useful in converting Jupiter notebooks to/from python, and also in rendering many parameterized notebooks. The idea is to make Jupyter notebook easier to use in production.

The latest feature is an extension of notebook parameterization. In addition to the init_code and output_suffix features, which allow adding arbitrary code to notebooks and saving multiple renders of the same notebook under different (non-coliding!) names. The new sheet_vars feature allows the insertion of arbitrary data into notebook renders (in addition to the earlier code insertion facility).

Click through for an example on how to use this. Several years ago, I would have considered this to be outstanding. Today, I think it’s cool, but I’ve also gravitated toward using notebooks as an intermediary step rather than a final product, so it’s less critical for me these days.

Comments closed

Thoughts on Fabric Data Wrangler

Published 2023-06-30 by Kevin Feasel

Gilbert Quevauvilliers tries out a tool:

I was going through my twitter feed and I came across this tweet where they spoke about the Data Wrangler Calling all #Python users! Have you tried Data Wrangler in #MicrosoftFabric?

I thought I would give this a try and that was the idea for my blog post. I honestly had no idea that firstly was this possible, but second that it is so easy for data wrangler to do all the hard work for me

I am going to demonstrate 2 transformations in this blog post, the first will be changing the d_date from date to datetime and then using the columns from examples I am going to create a new column where it concatenates 2 columns delimited with a double pipe command.

Read on for Gilbert’s thoughts.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Category: Python