Python – Page 35 – Curated SQL

Research with R and Production with Python

Published 2021-02-19 by Kevin Feasel

Matt Dancho and Jarrell Chalmers lay out an argument:

The decision can be challenging because they both Python and R have clear strengths.
– R is exceptional for Research – Making visualizations, telling the story, producing reports, and making MVP apps with Shiny. From concept (idea) to execution (code), R users tend to be able to accomplish these tasks 3X to 5X faster than Python users, making them very productive for research.
– Python is exceptional for Production ML – Integrating machine learning models into production systems where your IT infrastructure relies on automation tools like Airflow or Luigi.

They make a pretty solid argument. I’ve launched success R-based projects using SQL Server Machine Learning Services, but outside of ML Services, my team’s much more likely to deploy APIs in Python, and we’re split between Dash and Shiny for visualization. H/T R-Bloggers

Comments closed

ML Services: PYTHONHOME and PATH

Published 2021-01-11 by Kevin Feasel

Niels Berglund troubleshoots some issues:

In the last post, which looks at using Python 3.9 in SQL Server Machine Learning Services, I wrote this at the very end:
It looks like all is good, but maybe not? In a future post we’ll look at an issue we have introduced – but for now, let us bask in the glory of having created a new Python language extension.
In the post, we wrote a new language extension to handle Python 3.9, and that just worked fine. However, when I was doing some other things, I noticed some side effects, and in this post, we look at those side effects and how to solve them.

Click through to learn more.

Comments closed

Writing a Python Language Extension for ML Services

Published 2021-01-06 by Kevin Feasel

Niels Berglund shows how you can bring your own Python 3.9 runtime to SQL Server Machine Learning Services:

When I wrote we’d look at it in a future post I thought to myself; “how hard can it be?”. I had read the steps of how to build a Python language extension for Windows here, and it didn’t seem that hard: some Boost, CMake, compile, and Bob’s your uncle! Well, it turned out it was somewhat more complicated than what I anticipated. So, if you are interested – read on!

I was going to say that the steps seem a bit complicated but not overly terrible, though Niels’s conclusion leaves me wondering.

Comments closed

From R data.frame to Pandas DataFrame

Published 2021-01-05 by Kevin Feasel

Tomaz Kastrun builds a function:

Once we have a data.frame we want to generate Python dictionary that will hold schema and data for direct creation (or import) into your favourite Python environment. Function RtoPy does the needed transformation:

Click through for the code.

Comments closed

Using the Open Source R or Python Runtime with Machine Learning Services

Published 2020-12-30 by Kevin Feasel

Niels Berglund walks us through using the open source extensibility framework to install R or Python:

When Java became a supported language in SQL Server 2019, Microsoft mentioned that communication between ExternalHost and the language extension should be based on an API, regardless of the external language. The API is the Extensibility Framework API for SQL Server. Having an API ensures simplicity and ease of use for the extension developer.
From the paragraph above, one can assume that Microsoft would like to see 3rd party development of language extensions. That assumption turned out to be accurate as, mentioned above, Microsoft open-sourced the Java language extension, together with the include files for the extension API, in September 2020! This means that anyone interested can now create a language extension for their own favorite language!
However, open sourcing the Java extension was not the only thing Microsoft did. They also created and open-sourced language extensions for R and Python!

Click through for more detail and a walkthrough on installation of Python.

Comments closed

End-to-End ML with Azure Databricks

Published 2020-12-17 by Kevin Feasel

Tomaz Kastrun takes us through a machine learning scenario using Azure Databricks:

In the past couple of days we looked into configurations and infrastructure and today it is again time to do an analysis, let’s call it end-to-end analysis using R or Python or SQL.

Read on for the process.

Comments closed

Checking Array Monotonicity with Python

Published 2020-10-27 by Kevin Feasel

Pawan Khowal shows us how to check whether an array is constantly increasing:

In this puzzle you have to check whether an array is Monotonic or not

Though it looks like the solution actually shows whether the array is strictly monotonic rather than monotonic (i.e., weakly monotonic), as it allows for ties.

Comments closed

Finding Unique Words between Texts in Python

Published 2020-10-26 by Kevin Feasel

Pawan Khowal walks us through a Python puzzle:

In this puzzle we have to find un-common words between two strings
Sample Input List
a = ‘Hi I am Pawan here only’
b = ‘here only’

Click through for the solution.

Comments closed

Measuring Advertising Effectiveness

Published 2020-10-12 by Kevin Feasel

Layla Yang and Hector Leano walk us through measuring how effective an advertising campaign was:

At a high level we are connecting a time series of regional sales to regional offline and online ad impressions over the trailing thirty days. By using ML to compare the different kinds of measurements (TV impressions or GRPs versus digital banner clicks versus social likes) across all regions, we then correlate the type of engagement to incremental regional sales in order to build attribution and forecasting models. The challenge comes in merging advertising KPIs such as impressions, clicks, and page views from different data sources with different schemas (e.g., one source might use day parts to measure impressions while another uses exact time and date; location might be by zip code in one source and by metropolitan area in another).
As an example, we are using a SafeGraph rich dataset for foot traffic data to restaurants from the same chain. While we are using mocked offline store visits for this example, you can just as easily plug in offline and online sales data provided you have region and date included in your sales data. We will read in different locations’ in-store visit data, explore the data in PySpark and Spark SQL, and make the data clean, reliable and analytics ready for the ML task. For this example, the marketing team wants to find out which of the online media channels is the most effective channel to drive in-store visits.A

Click through for the article as well as notebooks.

Comments closed

Building a CRUD Application with Cloudera Operational DB and Flask

Published 2020-10-08 by Kevin Feasel

Shlomi Tubul puts together a proof of concept app:

In this blog, I will demonstrate how COD can easily be used as a backend system to store data and images for a simple web application. To build this application, we will be using Phoenix, one of the underlying components of COD, along with Flask. For storing images, we will be using an HBase (Apache Phoenix backend storage) capability called MOB (medium objects). MOB allows us to read/write values from 100k-10MB quickly.
*For development ease of use, you can also use the Phoenix query server instead of COD. The query server is a small build of phoenix that is meant for development purposes only, and data is deleted in each build.

Click through for the demo and for a link to the GitHub repo.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: Python