Python – Page 40 – Curated SQL

ML Services: PYTHONHOME and PATH

Published 2021-01-11 by Kevin Feasel

Niels Berglund troubleshoots some issues:

In the last post, which looks at using Python 3.9 in SQL Server Machine Learning Services, I wrote this at the very end:
It looks like all is good, but maybe not? In a future post we’ll look at an issue we have introduced – but for now, let us bask in the glory of having created a new Python language extension.
In the post, we wrote a new language extension to handle Python 3.9, and that just worked fine. However, when I was doing some other things, I noticed some side effects, and in this post, we look at those side effects and how to solve them.

Click through to learn more.

Comments closed

Writing a Python Language Extension for ML Services

Published 2021-01-06 by Kevin Feasel

Niels Berglund shows how you can bring your own Python 3.9 runtime to SQL Server Machine Learning Services:

When I wrote we’d look at it in a future post I thought to myself; “how hard can it be?”. I had read the steps of how to build a Python language extension for Windows here, and it didn’t seem that hard: some Boost, CMake, compile, and Bob’s your uncle! Well, it turned out it was somewhat more complicated than what I anticipated. So, if you are interested – read on!

I was going to say that the steps seem a bit complicated but not overly terrible, though Niels’s conclusion leaves me wondering.

Comments closed

From R data.frame to Pandas DataFrame

Published 2021-01-05 by Kevin Feasel

Tomaz Kastrun builds a function:

Once we have a data.frame we want to generate Python dictionary that will hold schema and data for direct creation (or import) into your favourite Python environment. Function RtoPy does the needed transformation:

Click through for the code.

Comments closed

Using the Open Source R or Python Runtime with Machine Learning Services

Published 2020-12-30 by Kevin Feasel

Niels Berglund walks us through using the open source extensibility framework to install R or Python:

When Java became a supported language in SQL Server 2019, Microsoft mentioned that communication between ExternalHost and the language extension should be based on an API, regardless of the external language. The API is the Extensibility Framework API for SQL Server. Having an API ensures simplicity and ease of use for the extension developer.
From the paragraph above, one can assume that Microsoft would like to see 3rd party development of language extensions. That assumption turned out to be accurate as, mentioned above, Microsoft open-sourced the Java language extension, together with the include files for the extension API, in September 2020! This means that anyone interested can now create a language extension for their own favorite language!
However, open sourcing the Java extension was not the only thing Microsoft did. They also created and open-sourced language extensions for R and Python!

Click through for more detail and a walkthrough on installation of Python.

Comments closed

End-to-End ML with Azure Databricks

Published 2020-12-17 by Kevin Feasel

Tomaz Kastrun takes us through a machine learning scenario using Azure Databricks:

In the past couple of days we looked into configurations and infrastructure and today it is again time to do an analysis, let’s call it end-to-end analysis using R or Python or SQL.

Read on for the process.

Comments closed

Checking Array Monotonicity with Python

Published 2020-10-27 by Kevin Feasel

Pawan Khowal shows us how to check whether an array is constantly increasing:

In this puzzle you have to check whether an array is Monotonic or not

Though it looks like the solution actually shows whether the array is strictly monotonic rather than monotonic (i.e., weakly monotonic), as it allows for ties.

Comments closed

Finding Unique Words between Texts in Python

Published 2020-10-26 by Kevin Feasel

Pawan Khowal walks us through a Python puzzle:

In this puzzle we have to find un-common words between two strings
Sample Input List
a = ‘Hi I am Pawan here only’
b = ‘here only’

Click through for the solution.

Comments closed

Measuring Advertising Effectiveness

Published 2020-10-12 by Kevin Feasel

Layla Yang and Hector Leano walk us through measuring how effective an advertising campaign was:

At a high level we are connecting a time series of regional sales to regional offline and online ad impressions over the trailing thirty days. By using ML to compare the different kinds of measurements (TV impressions or GRPs versus digital banner clicks versus social likes) across all regions, we then correlate the type of engagement to incremental regional sales in order to build attribution and forecasting models. The challenge comes in merging advertising KPIs such as impressions, clicks, and page views from different data sources with different schemas (e.g., one source might use day parts to measure impressions while another uses exact time and date; location might be by zip code in one source and by metropolitan area in another).
As an example, we are using a SafeGraph rich dataset for foot traffic data to restaurants from the same chain. While we are using mocked offline store visits for this example, you can just as easily plug in offline and online sales data provided you have region and date included in your sales data. We will read in different locations’ in-store visit data, explore the data in PySpark and Spark SQL, and make the data clean, reliable and analytics ready for the ML task. For this example, the marketing team wants to find out which of the online media channels is the most effective channel to drive in-store visits.A

Click through for the article as well as notebooks.

Comments closed

Building a CRUD Application with Cloudera Operational DB and Flask

Published 2020-10-08 by Kevin Feasel

Shlomi Tubul puts together a proof of concept app:

In this blog, I will demonstrate how COD can easily be used as a backend system to store data and images for a simple web application. To build this application, we will be using Phoenix, one of the underlying components of COD, along with Flask. For storing images, we will be using an HBase (Apache Phoenix backend storage) capability called MOB (medium objects). MOB allows us to read/write values from 100k-10MB quickly.
*For development ease of use, you can also use the Phoenix query server instead of COD. The query server is a small build of phoenix that is meant for development purposes only, and data is deleted in each build.

Click through for the demo and for a link to the GitHub repo.

Comments closed

SQL Server R and Python Language Extensions Now Open Source

Published 2020-09-25 by Kevin Feasel

The SQL Server team has an announcement:

Previously, we announced a Java extension. Today, we are sharing that we are open sourcing the R and Python language extensions for SQL Server for both Windows and Linux on GitHub.
These extensions are the latest examples using an evolved programming language extensibility architecture which allows integration with a new type of language extension. This new architecture gives customers the freedom to bring their own runtime and execute programs using that runtime in SQL Server, while leveraging the existing security and governance that the SQL Server programming language extensibility architecture provides.

Very interesting.

Comments closed

Category: Python