Press "Enter" to skip to content

Category: Python

Sentiment Analysis with Python

Sanil Mhatre performs a bit of sentiment analysis:

Previous articles in this series have focused on platforms like Azure Cognitive Services and Oracle Text features to perform the core tasks of Natural Language Processing (NLP) and Sentiment Analysis. These easy-to-use platforms allow users to quickly analyze their text data with easy-to-use pre-built models. Potential drawbacks of this approach include lack of flexibility to customize models, data locality & security concerns, subscription fees, and service availability of the cloud platform. Programming languages like Python and R are convenient when you need to build your models for Natural Language Processing and keep your code as well as data contained within your data centers. This article explains how to do sentiment analysis using Python.

Python is a versatile and modern general-purpose programming language that is powerful, fast, and easy to learn. Python runs on interpreters, making it compatible with multiple platforms, and is widely used in applications for web platforms, graphical interfaces, data science, and machine learning. Python is increasingly gaining popularity in data analysis and is one of the most widely used languages for data science. You can learn more about Python from the official Python Software Foundation website.

Click through to see what’s available in the NLP world for Python. The short version is “a lot.”

Comments closed

Generating Python Documentation with Sphinx

Evan Seabrook generates some docs:

As you can see above, we have docstrings defined for properties, methods and the classes themselves. Ultimately, these docstrings will be used by Sphinx to generate the documentation. If you’re using a different docstring format, you can use a Sphinx extension called Napoleon to use your existing docstrings. Once your project has a level of docstring usage that you’re happy with, we can move on to the next step of configuring Sphinx.

And that’s the downside to this: you get auto-generated documentation, which means it’s only as good as your developers’ ability to explain the code.

Comments closed

Calculus and Python

Muhammad Asad Iqbal Khan performs derivatives in Python:

Derivatives are one of the most fundamental concepts in calculus. They describe how changes in the variable inputs affect the function outputs. The objective of this article is to provide a high-level introduction to calculating derivatives in PyTorch for those who are new to the framework. PyTorch offers a convenient way to calculate derivatives for user-defined functions.

Read on to see how you can use PyTorch to do this.

Comments closed

Working with Notebooks in Azure ML

I have started a new series:

In the prior series, Low-Code Machine Learning with Azure ML, we saw how to get started with Azure Machine Learning in a fairly pain-free way, especially for developers getting started with machine learning. In this series, I will assume that you already know all of those details and instead, we’re going to go full-code.

There are a few different ways in which we can go full-code with Azure ML. Today, we’re going to look at the easiest of those methods: using Jupyter notebooks within Azure ML Studio.

Read on for the first post in the series.

Comments closed

Implementing Homomorphic Encryption with SEAL

Tsuyoshi Matsuzaki has a tutorial on using Microsoft SEAL:

Microsoft SEAL is a homomorphic encryption (HE) library, developed by Microsoft Research.

With homomorphic encryption (HE), the encrypted item can be used on computation without decryption. For sensitive data (such as, privacy data in healthcare), the customers can operate their own data without submitting private text to cloud service providers. (See below.)

Click through to see how it all works. Homomorphic encryption is a clever solution to an important class of data security problems and I’m happy to see walkthroughs like this be available.

Comments closed

Automating Excel Report Creation with Python

Mira Celine Klein needs to create Excel reports:

In this article you will learn how to get data from Python into an Excel file and add some formatting. Excel reports are a great way to communicate data or results, especially to people who don’t use Python. Another great advantage is that you can create automated reports: You define once what the reports should look like, and then you can create it very quickly for, for example, different subgroups of data, or data that is updated regularly.

The first part of the article describes the most important functions and actions, for example, setting column widths, changing font colors, or adding hyperlinks to other sheets. In the second part, all of these features are combined in one Excel file.

This looks a lot like programming against the Excel COM objects in Powershell but maybe a little easier.

Comments closed

Testability and Functional Code

I describe why the functional approach to writing code makes it testable:

Another important aspect of functional programming relevant to writing testable Python code is that functions should not have side effects.  In other words, functions take inputs and convert them to outputs; they don’t do anything else.  This approach is aspirational rather than entirely realistic—after all, saving to the database is a side effect, and most applications would be fairly boring if they offered absolutely no way to modify the data.  It just happens to be the case that our outlier detection engine can be close to side effect-free because we do not create files, save to a database, or push results to some third-party service.  With most applications, however, we do not tend to be so lucky.

Click through for an excerpt from the draft of an upcoming book as well as a bit of elucidation on key points. The specific language I’m talking about here is Python but the concepts apply to most languages.

Comments closed

Graph Analysis with NetworkX

Tori Tompkins introduces us to a Python package:

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex graphs. It’s a really cool package that contains heaps of graph algorithms for all different uses. In this tutorial, I will cover how to create a graph from an edge list and different ways we can query it.

Unsure what a graph is exactly? Check out my Data Science Moments video which introduces graphs and their uses in 5 minutes:

Click through for that video, as well as a way to load, process, and display graph data.

Comments closed

Profiling Python Code

Adrian Tam shows how you can test the performance of calls in Python:

Profiling is a technique to figure out how time is spent in a program. With this statistics, we can find the “hot spot” of a program and think about ways of improvement. Sometimes, hot spot in unexpected location may hint a bug in the program as well.

In this tutorial, we will see how we can use the profiling facility in Python. Specifically, you will see

– How we can compare small code fragments using timeit module

– How we can profile the entire program using cProfile module

– How we can invoke a profiler inside an existing program

– What the profiler cannot do

Read on for those techniques.

Comments closed

Remapping Database Columns in Python

John Mount performs mapping en masse:

The tricky part is: data science application scale easily has hundreds of string valued variables, each having hundreds of thousands of tracked values. The possibility of a large number of variable values or level renders the CASE/WHEN solution undesirable- as the query size is proportional to the number variables and values. The JOIN solutions build a query size proportional to the number of variables (again undesirable, but tolerable). However, super deeply nested queries are just not what relational databases expect. And a sequence of updates isn’t easy to support as a single query or view.

As an example of remapping, John shows translating “a” in a column to 1, “b” to 2, “d” to 3, etc.—that is, perhaps mapping each unique string to a unique number.

Comments closed