Press "Enter" to skip to content

Category: Python

Stock Price Predictions with LSTM Models

Thenuja Shanthacumaran walks us through training a Long Short-Term Memory neural network model for predicting stock prices:

LSTM could not process a single data point. it needs a sequence of data for processing and able to store historical information. LSTM is an appropriate algorithm to make prediction and process based-on time-series data. It’s better to work on the regression problem.

The stock market has enormously historical data that varies with trade date, which is time-series data, but the LSTM model predicts future price of stock within a short-time period with higher accuracy when the dataset has a huge amount of data.

Click through for the process and a demo.

Comments closed

Installing TensorFlow and Keras for R on SQL Server 2019 ML Services

I have a post on using TensorFlow and Keras in R on SQL Server 2019 Machine Learning Services:

What I’m doing is building a new virtual environment named r-reticulate, which is what the reticulate package in R desires. Inside that virtual environment, I’m installing the latest versions of tensorflow-probabilitytensorflow , and keras. I had DLL loading problems with TensorFlow 2.1 on Windows, so if you run into those, the proper solution is to ensure that you have the appropriate Visual C++ redistributables installed on your server.

Then, I switched back to the base virtual environment and installed the same packages. My thinking here is that I’ll probably need them for other stuff as well (and don’t tell anybody, but I’m not very good with Python environments).

Please continue not to tell anybody that I’m not very good with Python environments. I tend to dump things in the base environment, forget which one I’m in, and all kinds of other bad practices. I think I’m secretly undermining myself in Python, but I don’t have enough proof yet.

Comments closed

Portfolio Optimization with SAS and Python

Sophia Rowland shows off the sastopypackage:

I started by declaring my parameters and sets, including my risk threshold, my stock portfolio, the expected return of my stock portfolio, and covariance matrix estimated using the shrinkage estimator of Ledoit and Wolf(2003). I will use these pieces of information in my objective function and constraints. Now I will need SWAT, sasoptpy, and my optimization model object.

Read on for a demo.

Comments closed

Avoiding Overfitting and Underfitting in Neural Networks

Manas Narkar provides some advice on optimizing neural network models:

Adding Dropout

Dropout is considered as one of the most effective regularization methods. Dropout is basically randomly zero-ing or dropping out features from your layer during the training process, or introducing some noise in the samples. The key thing to note is that this is only applied at training time. At test time, no values are dropped out. Instead, they are scaled. The typical dropout rate is between 0.2 to 0.5.

Click through for a demo on dropout, as well as coverage of several other techniques.

Comments closed

Genetic Algorithms in Python

Abhinav Choudhary walks us through building a genetic algorithm library in Python:

Here are quick steps for how the genetic algorithm works:

1. Initial Population– Initialize the population randomly based on the data.
2. Fitness function– Find the fitness value of the each of the chromosomes(a chromosome is a set of parameters which define a proposed solution to the problem that the genetic algorithm is trying to solve)
3. Selection– Select the best fitted chromosomes as parents to pass the genes for the next generation and create a new population
4. Cross-over– Create new set of chromosome by combining the parents and add them to new population set
5. Mutation– Perfrom mutation which alters one or more gene values in a chromosome in the new population set generated. Mutation helps in getting more diverse oppourtinity.Obtained population will be used in the next generation

I’m a sucker for genetic algorithms (and even more so its cousin, genetic programming). And there are still good use cases for genetic algorithms, especially in creating scoring functions for neural networks.

Comments closed

Pandas UDFs and Python Type Hints in Spark 3.0

Hyukjin Kwon announces some updates forthcoming in Apache Spark 3.0:

The Pandas UDFs work with Pandas APIs inside the function and Apache Arrow for exchanging data. It allows vectorized operations that can increase performance up to 100x, compared to row-at-a-time Python UDFs.

The example below shows a Pandas UDF to simply add one to each value, in which it is defined with the function called pandas_plus_one decorated by pandas_udf with the Pandas UDF type specified as PandasUDFType.SCALAR.

Click through for explanations and demos for each.

Comments closed

Distributed Model Training with Cloudera ML

Zuling Kang and Anand Patil show us how to train models across several nodes using Cloudera Machine Learning:

Deep learning models are generally trained using the stochastic gradient descendent (SGD) algorithm. For each iteration of SGD, we will sample a mini-batch from the training set, feed it into the training model, calculate the gradient of the loss function of the observed values and the real values, and update the model parameters (or weights). As it is well known that the SGD iterations have to be executed sequentially, it is not possible to speed up the training process by parallelizing iterations. However, as processing one single iteration for a number of commonly used models like CIFAR10 or IMAGENET takes a long time, even using the most sophisticated GPU, we can still try to parallelize the feedforward computation as well as the gradient calculation within each iteration to speed up the model training process.

In practice, we will split the mini-batch of the training data into several parts, like 4, 8, 16, etc. (in this article, we will use the term sub-batch to refer to these split parts), and each training worker takes one sub-batch. Then the training workers do feedforward, gradient computation, and model updating using the sub-batches, respectively, just as in the monolithic training mode. After these steps, a process called model average is invoked, averaging the model parameters of all the workers participating in the training, so as to make the model parameters exactly the same when a new training iteration begins. Then the new round of the training iteration starts again from the data sampling and splitting step.

Read on for the high-level explanation, followed by some Python code working in TensorFlow.

Comments closed

Counting Table Tennis Ball Bounces

Evgeni Chasnovski has some fun counting:

On May 7th 2020 Dan made a successful attempt to beat a world record for the longest duration to control a table tennis ball with a bat. He surpassed current record duration of 5h2m37s by 18 minutes and 27 seconds for a total of 5h21m4s. He also live streamed the event on his “TableTennisDaily” YouTube channel, which later was uploaded (important note for the future: this video is a result of live stream and not a “shot and uploaded” one). During cheering for Dan in real time I got curious about actual number of bounces he made.

And thus the quest begins.

As counting manually is error-prone and extremely boring, I decided to do this programmatically. The idea of solution is straightforward: somehow extract audio from the world record video, detect bounces (as they have distinctive sound) and count them.

Click through for the process as well as a link to a Git repo with the Python code.

Comments closed

Avoiding Loops in Python with NumPy

Swantika Gupta walks us through vectorization and broadcasting with NumPy:

Vectorization is a powerful ability within NumPy which is used to speed up the code execution without using loop. It expresses operations as occurring on entire arrays rather than their individual elements.

Looping over an array or any data structure in Python has a lot of overhead involved. In NumPy, Vectorized Operations delegates the looping internally to highly optimized C and Fortran functions, making for cleaner and faster Python code. So, vectorization refers to the concept of replacing explicit for-loops with array expressions, which can then be computed internally with a low-level language, like C.

Read on for a few examples of this and broadcasting.

Comments closed