Press "Enter" to skip to content

Category: Python

Exploratory Time Series Analysis

The authors at Knoyd have a post on exploratory data analysis of a time series data set:

From the plot above we can clearly see that time-series has strong seasonal and trend components. To estimate the trend component we can use a function from the pandas library called rolling_mean and plot the results. If we want to make the plot more fancy and reusable for another time-series it is a good idea to make a function. We can call this function plot_moving_average.

The second part of the series promises to use Box-Jenkins to forecast future values.

Comments closed

Implementing K Nearest Neighbors In Python

Atul Harsha gives us a demo on k nearest neighbors in Python:

In order to make any predictions, you have to calculate the distance between the new point and the existing points, as you will be needing k closest points.

In this case for calculating the distance, we will use the Euclidean distance. This is defined as the square root of the sum of the squared differences between the two arrays of numbers

Specifically, we need only first 4 attributes(features) for distance calculation as the last attribute is a class label. So for one of the approach is to limit the Euclidean distance to a fixed length, thereby ignoring the final dimension.

Check it out.

Comments closed

Building TensorFlow Neural Networks On Spark With Keras

Jules Damji has an example of using the PyCharm IDE to use Keras to build TensorFlow neural network models on the Databricks MLflow library:

Our example in the video is a simple Keras network, modified from Keras Model Examples, that creates a simple multi-layer binary classification model with a couple of hidden and dropout layers and respective activation functions. Binary classification is a common machine learning task applied widely to classify images or text into two classes. For example, an image is a cat or dog; or a tweet is positive or negative in sentiment; and whether mail is spam or not spam.

But the point here is not so much to demonstrate a complex neural network model as to show the ease with which you can develop with Keras and TensorFlow, log an MLflow run, and experiment—all within PyCharm on your laptop.

Click through for the video and explanation of the process.

Comments closed

Executing ML Services Scripts From Jupyter Notebooks

Kyle Weller has an inception moment with Python and SQL Server Machine Learning Services:

While this example is trivial with the Iris dataset, imagine the additional scale, performance, and security capabilities that you now unlocked. You can use any of the latest open source R/Python packages to build Deep Learning and AI applications on large amounts of data in SQL Server. We also offer leading edge, high-performance algorithms in Microsoft’s RevoScaleR and RevoScalePy APIs. Using these with the latest innovations in the open source world allows you to bring unparalleled selection, performance, and scale to your applications.

Normally I see examples come straight from SQL Server or maybe C#, but it’s a bit fun to see one originate in Python on order to execute Python in SQL Server.

Comments closed

Interacting With SQL Server From Pandas

Tomaz Kastrun shows how to use pyodbc to interact with a SQL Server database from Pandas:

In the SQL Server Management Studio (SSMS), the ease of using external procedure sp_execute_external_script has been (and still will be) discussed many times. But the reason for this short blog post is the fact that, changing Python environments using Conda package/module management within Microsoft SQL Server (Services), is literally impossible. Scenarios, where you want to build  a larger set of modules (packages) but are impossible to be compatible with your SQL Server or Conda, then you would need to set up a new virtual environment and start using Python from there.

Communicating with database to load the data into different python environment should not be a problem. Python Pandas module is an easy way to store dataset in a table-like format, called dataframe. Pandas is very powerful python package for handling data structures and doing data analysis.

Click through for examples of reading and writing data.

Comments closed

Building Cone Plots In Plotly

The Plotly blog shows how to use Python to build 3D cone plots using Plotly:

This plot uses an explicitly defined vector field. A vector field refers to an assignment of a vector to each point in a subset of space.

In this plot, we visualize a collection of arrows that simply model the wind speed and direction at various levels of the atmosphere.

3-D weather plots can be useful to research scientists to gain a better understanding of the atmospheric profile, such as during the prediction of severe weather events like tornadoes and hurricanes.

Sometimes a 3D plot is the best answer.  When it is, this looks like a good solution.  H/T R-bloggers

Comments closed

Building Recurrent Neural Networks Using TensorFlow

Ahmet Taspinar walks us through creating a recurrent neural network topology using TensorFlow:

As we have also seen in the previous blog posts, our Neural Network consists of a tf.Graph() and a tf.Session(). The tf.Graph() contains all of the computational steps required for the Neural Network, and the tf.Session is used to execute these steps.

The computational steps defined in the tf.Graph can be divided into four main parts;

  1. We initialize placeholders which are filled with batches of training data during the run.

  2. We define the RNN model and to calculate the output values (logits)

  3. The logits are used to calculate a loss value, which then

  4. is used in an Optimizer to optimize the weights of the RNN.

As a lazy casual, I’ll probably stick with letting Keras do most of the heavy lifting.

Comments closed

Grouping And Aggregating In SQL, R, And Python

Dejan Sarka has a few examples of aggregation in different languages, including SQL, R, and Python:

The query calculates the coefficient of variation (defined as the standard deviation divided the mean) for the following groups, in the order as they are listed in the GROUPING SETS clause:

  • Country and education – expression (g.EnglishCountryRegionName, c.EnglishEducation)
  • Country only – expression (g.EnglishCountryRegionName)
  • Education only – expression (c.EnglishEducation)
  • Over all dataset- expression ()

Note also the usage of the GROUPING() function in the query. This function tells you whether the NULL in a cell comes because there were NULLs in the source data and this means a group NULL, or there is a NULL in the cell because this is a hyper aggregate. For example, NULL in the Education column where the value of the GROUPING(Education) equals to 1 indicates that this is aggregated in such a way that education makes no sense in the context, for example aggregated over countries only, or over the whole dataset. I used ordering by NEWID() just to shuffle the results. I executed query multiple times before I got the desired order where all possibilities for the GROUPING() function output were included in the first few rows of the result set. Here is the result.

GROUPING SETS is an underappreciated bit of SQL syntax.

Comments closed

Constrained Optimization In Python: pyomo

Jeff Schecter introduces us to pyomo, a Python package for constrained optimization problems:

Constrained optimization is a tool for minimizing or maximizing some objective, subject to constraints. For example, we may want to build new warehouses that minimize the average cost of shipping to our clients, constrained by our budget for building and operating those warehouses. Or, we might want to purchase an assortment of merchandise that maximizes expected revenue, limited by a minimum number of different items to stock in each department and our manufacturers’ minimum order sizes.

Here’s the catch: all objectives and constraints must be linear or quadratic functions of the model’s fixed inputs (parameters, in the lingo) and free variables.

Constraints are limited to equalities and non-strict inequalities. (Re-writing strict inequalities in these terms can require some algebraic gymnastics.) Conventionally, all terms including free variables live on the lefthand side of the equality or inequality, leaving only constants and fixed parameters on the righthand side.

To build your model, you must first formalize your objective function and constraints. Once you’ve expressed these terms mathematically, it’s easy to turn the math into code and let pyomo find the optimal solution.

I haven’t touched it in a decade, but I did have some success with LINGO for solving the same type of problem.

Comments closed

Executing Python Code With SQL Server

Chris Hyde has started a series on executing external scripts with SQL Server ML Services:

We’ll start off with a simple step-by-step introduction to the sp_execute_external_script stored procedure.  This is the glue that enables us to integrate SQL Server with the Python engine by sending the output of a T-SQL query over to Python and getting a result set back. .  For example, we could develop a stored procedure to be used as a data set in an SSRS report that returns statistical data produced by a Python library such as SciPy.  In this post we’ll introduce the procedure and how to pass a simple data set into and back out of Python, and we’ll get into manipulating that data in a future post.

If you’d like to follow along with me, you’ll need to make sure that you’re running SQL Server 2017 or later, and that Machine Learning Services has been installed with the Python option checked.  Don’t worry if you installed both R and Python, as they play quite nicely together.  The SQL Server Launchpad service should be running, and you should make sure that the ability to execute the sp_execute_external_script procedure is enabled by running the following code:

Chris’s talks on ML Services (either R or Python) were great and I expect this series to be as well.

Comments closed