Press "Enter" to skip to content

Category: Python

Iteratively Tuning Graph Neural Networks

Luis Bermudez takes us through the process of tuning one flavor of neural network:

We made our own implementations of OGB leaderboard entries for two popular GNN frameworks: GraphSAGE and a Relational Graph Convolutional Network (RGCN). We then designed and executed an iterative experimentation approach for hyperparameter tuning where we seek a quality model that takes minimal time to train. We define quality by running an unconstrained performance tuning loop, and use the results to set thresholds in a constrained tuning loop that optimizes for training efficiency.

Read on to see how they did it.

Comments closed

R and Python Interop via Reticulate

Fabian Scheler combines R and Python:

I am way more experienced with R than with Python and prefer to code in this language when possible. This applies, especially when it is about visualizations. Plotly and ggplot2 are fantastic packages that provide a lot of flexibility. However, every language has its limitations, and the best results stem from their efficient combination.

This week, I created the candlestick below, and I think it’s an excellent case study to illustrate a few things:

Read on to learn more about using reticulate to execute Python code and interact with the results in R.

Comments closed

Logging in Python

Daniel Chung shows off the logging module in Python:

Note that now all five messages were output, so the default level that the root logger logs is now “DEBUG.” The log record attributes (such as %(asctime)s) that can be used to format the output can be found in the logging documentation.

Although there is a default logger, we usually want to make and use other loggers that can be configured separately. This is because we may want a different severity level or format for different loggers. 

Next on the todo list is to implement the Reader monad to hide that logging deeper in your code base so that you a) don’t need to see logging code everywhere, and b) don’t forget to include logging in some function.

Comments closed

Topic Modeling with Python

Sanil Mhatre takes us through topic modeling:

Topic modeling is a powerful Natural Language Processing technique for finding relationships among data in text documents. It falls under the category of unsupervised learning and works by representing a text document as a collection of topics (set of keywords) that best represent the prevalent contents of that document. This article will focus on a probabilistic modeling approach called Latent Dirichlet Allocation (LDA), by walking readers through topic modeling using the team health demo dataset. Demonstrations will use Python and a Jupyter notebook running on Anaconda. Please follow instructions from the “Initial setup” section of the previous article to install Anaconda and set up a Jupyter notebook.

The second article of this series, Text Mining and Sentiment Analysis: Power BI Visualizations, introduced readers to the Word Cloud, a common technique to represent the frequency of keywords in a body of text. Word Cloud is an image composed of keywords found within a body of text, where the size of each word indicates its frequency in that body of text. This technique is limited in its ability to discover underlying topics and themes in the text, because it only relies on the frequency of keywords to determine their popularity. Topic modeling overcomes these limitations and uncovers deeper insights from text data using statistical modeling for discovering the topics (collection of words) that occur in text documents.

Read on for an informative article with plenty of code.

Comments closed

Data Profiling in Python

Brendan Tierney uses Python to look at some data:

One of the most common, and sometimes boring, task when working with datasets is writing some code to profile the data. Most data scientists will have built a set of tools/scripts to help them with this regular and slightly boring task. As with most IT tasks we should be trying to automate what we can, to allow us to spend more time on more important tasks, such as deriving insights and delivering value to the business, instead of repeatedly writing code to produce various statistics about the data and drawing pretty pictures.

I’ve written previously about automating and using some data profiling libraries to help us with this task. There are lots of packages available on pypi.og and on GitHub. Below I give examples of 5 Python Data Profiling libraries, with links to their GitHubs.

Brendan includes some good examples of libraries here so check it out.

Comments closed

Data Visualization in Python

Mehreen Saeed uses a few data visualization libraries in Python:

Data visualization is an important aspect of all AI and machine learning applications. You can gain key insights of your data through different graphical representations. In this tutorial, we’ll talk about a few options for data visualization in Python. We’ll use the MNIST dataset and the Tensorflow library for number crunching and data manipulation. To illustrate various methods for creating different types of graphs, we’ll use the Python’s graphing libraries namely matplotlib, Seaborn and Bokeh.

Bokeh results can look really nice, although it does feel like it requires a lot more developer time and effort to get it right. Click through for examples of each of the three libraries.

Comments closed

Dynamic DAGs with Apache Airflow

Bhavya Garg explains how we can create dynamic directed acyclic graphs in Apache Airflow:

Airflow dynamic DAGs can save you a ton of time. As you know, Apache Airflow is written in Python, and DAGs are created via Python scripts. That makes it very flexible and powerful (even complex sometimes). By leveraging Python, you can create DAGs dynamically based on variables, connections, a typical pattern, etc. This very nice way of generating DAGs comes at the price of higher complexity and subtle tricky things that you must know

Read on for an example.

Comments closed

Snowflake Purchases Streamlit

Alex Woodie reports on a purchase:

Cloud data warehousing giant Snowflake showed it’s serious about Python and data science this week when it announced that it plans to spend $800 million to buy Streamlit, a provider of Python-based tools for rapidly developing interactive data applications on the Web.

Co-founded in San Francisco in 2018 by Adrien Treuille, Amanda Kelly, and Thiago Teixeira, Streamlit develops an open source framework of the same name that allows data scientists and machine learning engineers to create and deploy data applications. The software is compatible with other Python-based frameworks, such as NumPy, Pandas, Matplotlib, and Scikit-learn, and uses React to render screens on the front-end.

Streamlit is nice. $800 million nice? That’s a good question.

Comments closed