Python – Page 34 – Curated SQL

Time Series Estimation with Facebook’s Prophet

Published 2021-07-12 by Kevin Feasel

Dan Lantos looks at the Prophet library:

This article (part of a short series) aims to introduce the Prophet library, discuss it at a high level and run through a basic example of forecasting the FTSE 100 index. Future articles will discuss exactly how Prophet achieves its results, how to interpret the output and how to improve the model.
Please see this article (by my talented colleague Gavita) for an introduction to time-series forecasting algorithms.

Click through for part one in an ongoing series.

Comments closed

Optimizing BERT Models on Google Colab

Published 2021-07-08 by Kevin Feasel

Kevin Jacobs fine-tunes some NLP processes:

BERT is a language model and can thus be used for predicting the next word in a sentence. Furthermore, BERT can be used for automatic summarization, text classification and many more downstream tasks. Google Colab provides you with a cloud-based environment on which you can train your machine learning models on a GPU. The downside is that your data is uploaded to the Google cloud. Google Colab gives you the opportunity to finetune BERT.

Click through to see how.

Comments closed

Joining Lists in Python via Nested Loops

Published 2021-06-25 by Kevin Feasel

Jon Fletcher has a quick one for us:

In this blog post I will show you how to join two 2D Python lists together.
The code is in the screenshot below.

Nested loops is an easy algorithm to understand, and even here, the whole process is 10 lines of code.

Comments closed

What’s New in data_algebra

Published 2021-06-11 by Kevin Feasel

John Mount has an update on a Python package:

The data algebra is a modern realization of elements of Codd’s 1969 relational model for data wrangling (see also Codd’s 12 rules).
The idea is: most data manipulation tasks can usefully be broken down into a small number of fundamental data transforms plus composition. In Codd’s initial writeup, composition was expressed using standard mathematical operator notation. For “modern” realizations one wants to use a composition notation that is natural for the language you are working in. For Python the natural composition notation is method dispatch.

Click through to see how it works and what’s new in the latest version.

Comments closed

Hosting a Python API with Flask

Published 2021-06-04 by Kevin Feasel

Mrinal Walia shows how you can build a Python API, such as one for generating machine learning predictions, using Flask:

Deployment is a crucial move in the ML workflow. It is a mark where we want to implement our ML model into utilization. Later, we can practice the model in practical life.
But how can we design the model as a treatment? We can develop an Application Programming Interface (API). With that, we can reach the model universally, can be a mobile application or web application. In Python, there’s a library that can assist us in building an API. It’s named Flask.
This article will explain how to construct a REST API for our machine learning model utilizing Flask. Without further ado, let’s begun!

Flask is the first step, but then I’d want to reverse proxy it with gunicorn or Nginx afterward.

Comments closed

Running Dask on AKS

Published 2021-05-20 by Kevin Feasel

Tsuyoshi Matsuzaki sets up Dask as a distributed service:

In my last post, I showed you tutorial for running Apache Spark on managed kubernetes, Azure Kubernetes Service (AKS).
In this post, I’ll show you the tutorial for running distributed workloads of Dask on AKS.
By using Dask, you can run Scikit-Learn compliant functions and jobs for data which cannot fit in memory, or run in distributed manners. For simplicity, here I’ll use built-in Dask ML function (dask_ml.linear_model.LinearRegression) in this tutorial. (With the same manners, you can also run regular sklearn functions.)
Cloud managed kubernetes will make you speed up this large ML workloads.

Click through for the process. I’ve had some positive experiences with Dask as a dashboarding tool. It’s definitely one of the better ones if you’re big into Python.

Comments closed

Creating Diagrams from Code

Published 2021-05-11 by Kevin Feasel

Sheldon Hull walks us through the diagrams package in Python:

LucidChart, Draw.io and other tools are great for a quick solution.
Mermaid also provides a nice simple text based diagramming tool that is integrated with many markdown tools.
For me, this just never fit. I like a bit of polish and beauty in a visual presentation and most of these are very utilitarian in their output.
I came across diagrams and found it a perfect fit for intuitive and beautiful diagram rendering of cloud architecture, and figured it would be worth a blog post to share this.

Back when GitPitch was still a viable concern, I had just gotten into using the diagrams package. It takes some getting used to and has very strong preferences on the sorts of relationships diagram elements can have, but it’s good at its job.

Comments closed

When PyODBC fast_executemany Isn’t

Published 2021-05-04 by Kevin Feasel

Jon Morisi troubleshoots a performance issue:

I recently had a project in which I needed to transfer a 60 GB SQLite database to SQL Server. After some research I found the sqlite3 and pyodbc modules, and set about scripting connections and insert statements.
The basic form of my script is to import the modules, setup the database connections, and iterate (via cursor) over the rows of the select statement creating insert statements and executing them.
The issue here is that this method results in single inserts being sent one at a time yielding less than satisfactory performance. Inserting 35m+ rows in this fashion takes ~5hrs on my system.

Jon tries out a few different options. It would appear that there is no easy bulk insertion operation with PyODBC.

Comments closed

Azure ML and Azure SQL DB

Published 2021-03-23 by Kevin Feasel

I remembered that I had another blog and actually wrote something technical:

Not too long ago, I worked through an interesting issue with Azure Machine Learning. The question was, what’s the best way to read from Azure SQL Database, perform model processing, and then write results out to Azure SQL Database? Oh, by the way, I want to use a service principal rather than SQL authentication. Here’s what I’ve got.

This turned out to be a lot more work than I first expected.

Comments closed

Date Cleaning with PySpark

Published 2021-03-10 by Kevin Feasel

Robert J. Blackburn needs to do some cleanup work:

The function will accept the dataframe and a list of columns to process. Because of syntax restrictions the steps are broken out into multiple statements and a sub-function. Luckily, Spark’s lazy evaluation will optimize it during runtime.

Click through for the function in question.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Category: Python