Press "Enter" to skip to content

Category: Python

Implementing Homomorphic Encryption with SEAL

Tsuyoshi Matsuzaki has a tutorial on using Microsoft SEAL:

Microsoft SEAL is a homomorphic encryption (HE) library, developed by Microsoft Research.

With homomorphic encryption (HE), the encrypted item can be used on computation without decryption. For sensitive data (such as, privacy data in healthcare), the customers can operate their own data without submitting private text to cloud service providers. (See below.)

Click through to see how it all works. Homomorphic encryption is a clever solution to an important class of data security problems and I’m happy to see walkthroughs like this be available.

Comments closed

Automating Excel Report Creation with Python

Mira Celine Klein needs to create Excel reports:

In this article you will learn how to get data from Python into an Excel file and add some formatting. Excel reports are a great way to communicate data or results, especially to people who don’t use Python. Another great advantage is that you can create automated reports: You define once what the reports should look like, and then you can create it very quickly for, for example, different subgroups of data, or data that is updated regularly.

The first part of the article describes the most important functions and actions, for example, setting column widths, changing font colors, or adding hyperlinks to other sheets. In the second part, all of these features are combined in one Excel file.

This looks a lot like programming against the Excel COM objects in Powershell but maybe a little easier.

Comments closed

Testability and Functional Code

I describe why the functional approach to writing code makes it testable:

Another important aspect of functional programming relevant to writing testable Python code is that functions should not have side effects.  In other words, functions take inputs and convert them to outputs; they don’t do anything else.  This approach is aspirational rather than entirely realistic—after all, saving to the database is a side effect, and most applications would be fairly boring if they offered absolutely no way to modify the data.  It just happens to be the case that our outlier detection engine can be close to side effect-free because we do not create files, save to a database, or push results to some third-party service.  With most applications, however, we do not tend to be so lucky.

Click through for an excerpt from the draft of an upcoming book as well as a bit of elucidation on key points. The specific language I’m talking about here is Python but the concepts apply to most languages.

Comments closed

Graph Analysis with NetworkX

Tori Tompkins introduces us to a Python package:

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex graphs. It’s a really cool package that contains heaps of graph algorithms for all different uses. In this tutorial, I will cover how to create a graph from an edge list and different ways we can query it.

Unsure what a graph is exactly? Check out my Data Science Moments video which introduces graphs and their uses in 5 minutes:

Click through for that video, as well as a way to load, process, and display graph data.

Comments closed

Profiling Python Code

Adrian Tam shows how you can test the performance of calls in Python:

Profiling is a technique to figure out how time is spent in a program. With this statistics, we can find the “hot spot” of a program and think about ways of improvement. Sometimes, hot spot in unexpected location may hint a bug in the program as well.

In this tutorial, we will see how we can use the profiling facility in Python. Specifically, you will see

– How we can compare small code fragments using timeit module

– How we can profile the entire program using cProfile module

– How we can invoke a profiler inside an existing program

– What the profiler cannot do

Read on for those techniques.

Comments closed

Remapping Database Columns in Python

John Mount performs mapping en masse:

The tricky part is: data science application scale easily has hundreds of string valued variables, each having hundreds of thousands of tracked values. The possibility of a large number of variable values or level renders the CASE/WHEN solution undesirable- as the query size is proportional to the number variables and values. The JOIN solutions build a query size proportional to the number of variables (again undesirable, but tolerable). However, super deeply nested queries are just not what relational databases expect. And a sequence of updates isn’t easy to support as a single query or view.

As an example of remapping, John shows translating “a” in a column to 1, “b” to 2, “d” to 3, etc.—that is, perhaps mapping each unique string to a unique number.

Comments closed

Debugging Code in Python

Adrian Tam takes us through debugging options with Python:

The purpose of a debugger is to provide you a slow motion button to control the flow of a program. It also allow you to freeze the program at certain point of time and examine the state.

The simplest operation under a debugger is to step through the code. That is to run one line of code at a time and wait for your acknowledgment before proceeding into next. The reason we want to run the program in a stop-and-go fashion is to allow us to check the logic and value or verify the algorithm.

For a larger program, we may not want to step through the code from the beginning as it may take a long time before we reached the line that we are interested in. Therefore, debuggers also provide a breakpoint feature that will kick in when a specific line of code is reached. From that point onward, we can step through it line by line.

This is something I definitely need to get better at when doing Python development.

Comments closed

Using Azure DevOps to Deploy Python Functions to Azure Function Apps

Rayis Imayev has a trick question for us:

Can I create a CI/CD pipeline to deploy Python Function to Azure Function App using Windows self-hosted Azure DevOps agent?

My short answer to this question is Yes and NoYes, you can use Windows self-hosted Azure DevOps agent to deploy Python function to the Linux based Azure Function App; and, No, you can’t use Windows self-hosted Azure DevOps agent to build Python code since it will require collection/compilation/build of all Python-depended libraries on a Linux OS platform.

Click through for the full answer.

Comments closed

Anomaly Detection in Two Ways

Muhammad Asad Iqbal Khan shows how you can use isolation forests and kernel density estimation for outlier detection:

Just like the random forests, isolation forests are built using decision trees. They are implemented in an unsupervised fashion as there are no pre-defined labels. Isolation forests were designed with the idea that anomalies are “few and distinct” data points in a dataset.

Recall that decision trees are built using information criteria such as Gini index or entropy. The obviously different groups are separated at the root of the tree and deeper into the branches, the subtler distinctions are identified. Based on randomly picked characteristics, an isolation forest processes the randomly subsampled data in a tree structure. Samples that reach further into the tree and require more cuts to separate them have a very little probability that they are anomalies. Likewise, samples that are found on the shorter branches of the tree are more likely to be anomalies, since the tree found it simpler to distinguish them from the other data.

Click through for descriptions and the code.

Comments closed

Enhancing Color Photographs via Generative Adversarial Networks

Neil Saunders re-colorizes photographs:

When I’m not at the computer writing R code, I can often be found at the computer processing photographs. Or at the computer browsing Twitter, which is how I came across Stuart Humphryes, a digital artist who enhances autochromes. Autochromes are early colour photographs, generated using a process patented by the Lumière brothers in 1903. You can find and download many examples of them online. Stuart uses a variety of software tools to clean, enhance and balance the colours, resulting in bright vivid images that often have a contemporary feel, whilst at the same time retaining the somewhat “dreamy” quality of the original.

Having read that one of his tools uses neural networks, I was keen to discover how easy it is to achieve something similar using freely-available software found online. The answer is “quite easy” – although achieving results as good as Stuart’s is somewhat more difficult. Here’s how I went about it.

Click through for the process and some really nice-looking post-production photographs.

Comments closed