Press "Enter" to skip to content

Category: Python

Python in Power BI Desktop

David Eldersveld dives into using Python as an external tool in Power BI:

Why use Python as an external “tool”? Even though Python isn’t a “tool” in the same sense as the “Big 3” community tools focused this month, I want to show how versatile the External Tools feature is. I also want to encourage people to use imagination and also explore how Power BI isn’t really as closed as some people think–at least the data model…

Some of these ideas are not exclusive to Python, but there’s enough variety in the Power BI and data science communities for people to possibly figure out if some of this might be useful within the context of their own environments, skills, and organizations.

David also follows up with a series of sample ideas.

Comments closed

Comparing Gradient Descent to the Normal Equation for Small Data Sets

Pushkara Sharma compares two techniques for regression:

In this article, we will see the actual difference between gradient descent and the normal equation in a practical approach. Most of the newbie machine learning enthusiasts learn about gradient descent during the linear regression and move further without even knowing about the most underestimated Normal Equation that is far less complex and provides very good results for small to medium size datasets.

If you are new to machine learning, or not familiar with a normal equation or gradient descent, don’t worry I’ll try my best to explain these in layman’s terms. So, I will start by explaining a little about the regression problem.

I was surprised by the results.

Comments closed

Comparing Koalas to PySpark

Tori Tompkins gives us an understanding of where Koalas fits in the Spark world:

One significant difference between Spark’s implementation of Dataframes and pandas is its immutability.

With Spark dataframes, you are unable to make changes to the existing object but rather create a brand new dataframe based on the old one. Pandas dataframes, however, allow you to edit the object in place. With Koalas, whilst still spark Dataframes under the hood, have kept the mutable syntax of pandas.

It does this by introducing this concept of an ‘Internal Frame’. This holds the spark immutable dataframe and manages the mapping between the Koalas column names and Spark column names. It also manages the Koalas index names to spark column name to replicate the index functionality in pandas (covered below). It acts as a bridge between Spark and Koalas by mimicking the pandas API with Spark. This Internal Frame replicates the mutable functionality of pandas by creating copies of the internal frame but appearing to be mutable.

Read the whole thing.

Comments closed

Delaying Loops in Python

Jon Fletcher gives us three methods to delay a Python loop:

In this blog post, I will run through 3 different ways to execute a delayed Python loop. In these examples, we will aim to run the loop once every minute.
To show the delay, we will print out the current datetime using the datetime module.

I think I’d be a bit concerned about all three potentially failing—option 1 can get out of sync if it takes more than 1 second to process inside the loop, option 2 might miss a minute, and option 3 can also sometimes miss. But it’s interesting to think about.

Comments closed

Stock Price Predictions with LSTM Models

Thenuja Shanthacumaran walks us through training a Long Short-Term Memory neural network model for predicting stock prices:

LSTM could not process a single data point. it needs a sequence of data for processing and able to store historical information. LSTM is an appropriate algorithm to make prediction and process based-on time-series data. It’s better to work on the regression problem.

The stock market has enormously historical data that varies with trade date, which is time-series data, but the LSTM model predicts future price of stock within a short-time period with higher accuracy when the dataset has a huge amount of data.

Click through for the process and a demo.

Comments closed

Installing TensorFlow and Keras for R on SQL Server 2019 ML Services

I have a post on using TensorFlow and Keras in R on SQL Server 2019 Machine Learning Services:

What I’m doing is building a new virtual environment named r-reticulate, which is what the reticulate package in R desires. Inside that virtual environment, I’m installing the latest versions of tensorflow-probabilitytensorflow , and keras. I had DLL loading problems with TensorFlow 2.1 on Windows, so if you run into those, the proper solution is to ensure that you have the appropriate Visual C++ redistributables installed on your server.

Then, I switched back to the base virtual environment and installed the same packages. My thinking here is that I’ll probably need them for other stuff as well (and don’t tell anybody, but I’m not very good with Python environments).

Please continue not to tell anybody that I’m not very good with Python environments. I tend to dump things in the base environment, forget which one I’m in, and all kinds of other bad practices. I think I’m secretly undermining myself in Python, but I don’t have enough proof yet.

Comments closed

Portfolio Optimization with SAS and Python

Sophia Rowland shows off the sastopypackage:

I started by declaring my parameters and sets, including my risk threshold, my stock portfolio, the expected return of my stock portfolio, and covariance matrix estimated using the shrinkage estimator of Ledoit and Wolf(2003). I will use these pieces of information in my objective function and constraints. Now I will need SWAT, sasoptpy, and my optimization model object.

Read on for a demo.

Comments closed

Avoiding Overfitting and Underfitting in Neural Networks

Manas Narkar provides some advice on optimizing neural network models:

Adding Dropout

Dropout is considered as one of the most effective regularization methods. Dropout is basically randomly zero-ing or dropping out features from your layer during the training process, or introducing some noise in the samples. The key thing to note is that this is only applied at training time. At test time, no values are dropped out. Instead, they are scaled. The typical dropout rate is between 0.2 to 0.5.

Click through for a demo on dropout, as well as coverage of several other techniques.

Comments closed

Genetic Algorithms in Python

Abhinav Choudhary walks us through building a genetic algorithm library in Python:

Here are quick steps for how the genetic algorithm works:

1. Initial Population– Initialize the population randomly based on the data.
2. Fitness function– Find the fitness value of the each of the chromosomes(a chromosome is a set of parameters which define a proposed solution to the problem that the genetic algorithm is trying to solve)
3. Selection– Select the best fitted chromosomes as parents to pass the genes for the next generation and create a new population
4. Cross-over– Create new set of chromosome by combining the parents and add them to new population set
5. Mutation– Perfrom mutation which alters one or more gene values in a chromosome in the new population set generated. Mutation helps in getting more diverse oppourtinity.Obtained population will be used in the next generation

I’m a sucker for genetic algorithms (and even more so its cousin, genetic programming). And there are still good use cases for genetic algorithms, especially in creating scoring functions for neural networks.

Comments closed

Pandas UDFs and Python Type Hints in Spark 3.0

Hyukjin Kwon announces some updates forthcoming in Apache Spark 3.0:

The Pandas UDFs work with Pandas APIs inside the function and Apache Arrow for exchanging data. It allows vectorized operations that can increase performance up to 100x, compared to row-at-a-time Python UDFs.

The example below shows a Pandas UDF to simply add one to each value, in which it is defined with the function called pandas_plus_one decorated by pandas_udf with the Pandas UDF type specified as PandasUDFType.SCALAR.

Click through for explanations and demos for each.

Comments closed