Press "Enter" to skip to content

Category: Python

SeamlessM4T: Multimodal Speech and Text Translation

Facebook has announced a new library:

Today, we’re introducing SeamlessM4T, the first all-in-one multimodal and multilingual AI translation model that allows people to communicate effortlessly through speech and text across different languages. SeamlessM4T supports:

  • Speech recognition for nearly 100 languages
  • Speech-to-text translation for nearly 100 input and output languages
  • Speech-to-speech translation, supporting nearly 100 input languages and 36 (including English) output languages
  • Text-to-text translation for nearly 100 languages
  • Text-to-speech translation, supporting nearly 100 input languages and 35 (including English) output languages

The open source library is available on GitHub and you can also get the model itself on HuggingFace. The nicest thing about all of this is that, unlike existing translation services, you can run it entirely offline and perform the inference on local compute.

Comments closed

Text-to-Video with Azure Open AI and Semantic Kernel

Sabyasachi Samaddar continues a series on generating video from a series of text prompts:

Welcome back to the second part of our journey into the world of Azure and OpenAI! In the first part, we explored how to transform text into video using Azure’s powerful AI capabilities. This time, we’re taking a step further by orchestrating our application flow with Semantic Kernel.

Semantic Kernel is a powerful tool that allows us to understand and manipulate the meaning of text in a more nuanced way. By using Semantic Kernel, we can create more sophisticated workflows and generate more meaningful results from our text-to-video transformation process.

In this part of the series, we will focus on how Semantic Kernel can enhance our application and provide a smoother, more efficient workflow. We’ll dive deep into its features, explore its benefits, and show you how it can revolutionize your text-to-video transformation process.

Read on for an understanding of how Semantic Kernel fits in and what you can do with it.

Comments closed

Orchestrating Azure Data Explorer Queries via Apache Airflow

Michael Spector does some automation:

Apache Airflow is a widely used task orchestration framework, which gained its popularity due to Python-based programmatic interface – the language of first choice by Data engineers and Data ops. The framework allows defining complex pipelines that move data around different parts, potentially implemented using different technologies.

The following article shows how to setup managed instance of Apache Airflow and define a very simple DAG (direct acyclic graph) of tasks that does the following:

  • Uses Azure registered application to authenticate with the ADX cluster.
  • Schedules daily execution of a simple KQL query that calculates HTTP errors statistics based on Web log records for the last day.

Click through for the process.

Comments closed

Model Diagnostics in Python

Christian Lorentzen has released a new package:

Version 1.0.0 of the new Python package for model-diagnostics was just released on PyPI. If you use (machine learning or statistical or other) models to predict a mean, median, quantile or expectile, this library offers tools to assess the calibration of your models and to compare and decompose predictive model performance scores.

This looks like a really useful package, so check it out.

Comments closed

Parameterizing Jupyter Notebooks

John Mount shows off a feature:

I’d like to share a great new feature in the wvpy package (available at PyPi).

This package is useful in converting Jupiter notebooks to/from python, and also in rendering many parameterized notebooks. The idea is to make Jupyter notebook easier to use in production.

The latest feature is an extension of notebook parameterization. In addition to the init_code and output_suffix features, which allow adding arbitrary code to notebooks and saving multiple renders of the same notebook under different (non-coliding!) names. The new sheet_vars feature allows the insertion of arbitrary data into notebook renders (in addition to the earlier code insertion facility).

Click through for an example on how to use this. Several years ago, I would have considered this to be outstanding. Today, I think it’s cool, but I’ve also gravitated toward using notebooks as an intermediary step rather than a final product, so it’s less critical for me these days.

Comments closed

Thoughts on Fabric Data Wrangler

Gilbert Quevauvilliers tries out a tool:

I was going through my twitter feed and I came across this tweet where they spoke about the Data Wrangler Calling all #Python users! Have you tried Data Wrangler in #MicrosoftFabric?

I thought I would give this a try and that was the idea for my blog post. I honestly had no idea that firstly was this possible, but second that it is so easy for data wrangler to do all the hard work for me

I am going to demonstrate 2 transformations in this blog post, the first will be changing the d_date from date to datetime and then using the columns from examples I am going to create a new column where it concatenates 2 columns delimited with a double pipe command.

Read on for Gilbert’s thoughts.

Comments closed

Read and Write Data with PySpark

Dustin Vannnoy has two of the three R’s down:

Every Spark pipeline involves reading data from a data source or table. For data engineers we usually end the pipelines by writing the transformed data. In this tutorial we walk through some of the most common format and cloud storage locations for reading and writing with Spark. We’ll save some of the advanced Delta Lake capabilities for another tutorial.

Click through to see how to read from and write to CSV, JSON, and Parquet formats. Dustin has examples of working with Azure Blob Storage, S3, and Google Cloud Storage, and even some database examples with JDBC.

Comments closed

Using SHAP to Gauge Geographic Effects in R or Python

Michael Mayer runs an analysis:

This is the next article in our series “Lost in Translation between R and Python”. The aim of this series is to provide high-quality R and Python code to achieve some non-trivial tasks. If you are to learn R, check out the R tab below. Similarly, if you are to learn Python, the Python tab will be your friend.

This post is heavily based on the new {shapviz} vignette.

I appreciate the effort to include both R and Python code in this analysis, and recommend you peruse both sets of code listings.

Comments closed

PyPI and Malicious Code

Steven Vaughan-Nichols gives us the story:

The Python Package Index (PyPI), is the most popular Python programming language software repository. It’s also a mess. Earlier this year, the FortiGuard team discovered zero-day malware in three PyPI packages called “colorslib,” “httpslib,” and “libhttps.”  Before that, 2022 closed with  PyTorch-nightly on Linux being poisoned with a fake dependency. More recently, PyPI had to stop new user registrations and project creations because of a flood of malicious users. PyPI isn’t the only one to notice the user trouble. The Python Software Foundation (PSF) received three subpoenas for PyPI user data. What is going on here!?

Read on to learn more about what’s happening with the most popular Python repository.

Comments closed