Press "Enter" to skip to content

Category: Python

Querying the Power BI REST API from Fabric Spark

Gerhard Brueckl makes the call:

Microsoft Fabric has a lot of different components which usually work very well together. However, even though Power BI is a fundamental part of Fabric, there is not really a tight integration between Data Engineering components and Power BI. In this blog post I will show you an easy and reusable way to query the Power BI REST API via Fabric SQL in a very straight forward way. The extracted data can then be stored in the data lake e.g. to create a history of your dataset refreshes, the state of your workspaces or any other information that is provided by the REST API.

Click through for a list of operations, followed by the code you’ll need to pull this off.

Comments closed

Running Python in Excel

Alex Woodie reports on a new beta:

Excel-maker Microsoft and Anaconda, a key distributor of Python tools, unveiled a collaboration this week that will see Python integrated with Excel.

The new Anaconda Python Distribution in Excel, which is currently in beta, will bring Python data analysis and data science capabilities to the popular spreadsheet program from Microsoft. The integration will enable users to use a variety of Python libraries and tools to prep, manipulate, analyze, and visualize data in Excel.

It’s still in preview, but it is interesting to see.

Comments closed

A Brief Overview of 21 ETL Tools in Python

Adron Hall makes a list:

Here are summaries of each of the tools you’ve mentioned along with examples of how to implement the ETL (Extract, Transform, Load) process using each tool within a Python workflow:

  1. Apache Spark: Apache Spark is a powerful open-source cluster-computing framework that provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. It’s commonly used for processing large-scale data and running complex ETL pipelines. Example Implementation:

Read on for summaries and samples for each of the 21 options.

Comments closed

Training a Code-First Model in Azure ML

I have a new video:

In this video, we walk through the code in an Azure Machine Learning project and see how the pieces fit together.

There are a few more videos to go in this Azure ML series and I would recommend going through them in order to understand how we got to this video, but this one is what I’ve been building toward.

Comments closed

SeamlessM4T: Multimodal Speech and Text Translation

Facebook has announced a new library:

Today, we’re introducing SeamlessM4T, the first all-in-one multimodal and multilingual AI translation model that allows people to communicate effortlessly through speech and text across different languages. SeamlessM4T supports:

  • Speech recognition for nearly 100 languages
  • Speech-to-text translation for nearly 100 input and output languages
  • Speech-to-speech translation, supporting nearly 100 input languages and 36 (including English) output languages
  • Text-to-text translation for nearly 100 languages
  • Text-to-speech translation, supporting nearly 100 input languages and 35 (including English) output languages

The open source library is available on GitHub and you can also get the model itself on HuggingFace. The nicest thing about all of this is that, unlike existing translation services, you can run it entirely offline and perform the inference on local compute.

Comments closed

Text-to-Video with Azure Open AI and Semantic Kernel

Sabyasachi Samaddar continues a series on generating video from a series of text prompts:

Welcome back to the second part of our journey into the world of Azure and OpenAI! In the first part, we explored how to transform text into video using Azure’s powerful AI capabilities. This time, we’re taking a step further by orchestrating our application flow with Semantic Kernel.

Semantic Kernel is a powerful tool that allows us to understand and manipulate the meaning of text in a more nuanced way. By using Semantic Kernel, we can create more sophisticated workflows and generate more meaningful results from our text-to-video transformation process.

In this part of the series, we will focus on how Semantic Kernel can enhance our application and provide a smoother, more efficient workflow. We’ll dive deep into its features, explore its benefits, and show you how it can revolutionize your text-to-video transformation process.

Read on for an understanding of how Semantic Kernel fits in and what you can do with it.

Comments closed

Orchestrating Azure Data Explorer Queries via Apache Airflow

Michael Spector does some automation:

Apache Airflow is a widely used task orchestration framework, which gained its popularity due to Python-based programmatic interface – the language of first choice by Data engineers and Data ops. The framework allows defining complex pipelines that move data around different parts, potentially implemented using different technologies.

The following article shows how to setup managed instance of Apache Airflow and define a very simple DAG (direct acyclic graph) of tasks that does the following:

  • Uses Azure registered application to authenticate with the ADX cluster.
  • Schedules daily execution of a simple KQL query that calculates HTTP errors statistics based on Web log records for the last day.

Click through for the process.

Comments closed

Model Diagnostics in Python

Christian Lorentzen has released a new package:

Version 1.0.0 of the new Python package for model-diagnostics was just released on PyPI. If you use (machine learning or statistical or other) models to predict a mean, median, quantile or expectile, this library offers tools to assess the calibration of your models and to compare and decompose predictive model performance scores.

This looks like a really useful package, so check it out.

Comments closed

Parameterizing Jupyter Notebooks

John Mount shows off a feature:

I’d like to share a great new feature in the wvpy package (available at PyPi).

This package is useful in converting Jupiter notebooks to/from python, and also in rendering many parameterized notebooks. The idea is to make Jupyter notebook easier to use in production.

The latest feature is an extension of notebook parameterization. In addition to the init_code and output_suffix features, which allow adding arbitrary code to notebooks and saving multiple renders of the same notebook under different (non-coliding!) names. The new sheet_vars feature allows the insertion of arbitrary data into notebook renders (in addition to the earlier code insertion facility).

Click through for an example on how to use this. Several years ago, I would have considered this to be outstanding. Today, I think it’s cool, but I’ve also gravitated toward using notebooks as an intermediary step rather than a final product, so it’s less critical for me these days.

Comments closed