Press "Enter" to skip to content

Category: Python

Data Visualization in Python

Mehreen Saeed uses a few data visualization libraries in Python:

Data visualization is an important aspect of all AI and machine learning applications. You can gain key insights of your data through different graphical representations. In this tutorial, we’ll talk about a few options for data visualization in Python. We’ll use the MNIST dataset and the Tensorflow library for number crunching and data manipulation. To illustrate various methods for creating different types of graphs, we’ll use the Python’s graphing libraries namely matplotlib, Seaborn and Bokeh.

Bokeh results can look really nice, although it does feel like it requires a lot more developer time and effort to get it right. Click through for examples of each of the three libraries.

Comments closed

Dynamic DAGs with Apache Airflow

Bhavya Garg explains how we can create dynamic directed acyclic graphs in Apache Airflow:

Airflow dynamic DAGs can save you a ton of time. As you know, Apache Airflow is written in Python, and DAGs are created via Python scripts. That makes it very flexible and powerful (even complex sometimes). By leveraging Python, you can create DAGs dynamically based on variables, connections, a typical pattern, etc. This very nice way of generating DAGs comes at the price of higher complexity and subtle tricky things that you must know

Read on for an example.

Comments closed

Snowflake Purchases Streamlit

Alex Woodie reports on a purchase:

Cloud data warehousing giant Snowflake showed it’s serious about Python and data science this week when it announced that it plans to spend $800 million to buy Streamlit, a provider of Python-based tools for rapidly developing interactive data applications on the Web.

Co-founded in San Francisco in 2018 by Adrien Treuille, Amanda Kelly, and Thiago Teixeira, Streamlit develops an open source framework of the same name that allows data scientists and machine learning engineers to create and deploy data applications. The software is compatible with other Python-based frameworks, such as NumPy, Pandas, Matplotlib, and Scikit-learn, and uses React to render screens on the front-end.

Streamlit is nice. $800 million nice? That’s a good question.

Comments closed

Building a Recommender in Spark

Avinash Sooriyarachchi makes a recommendation:

There has been an exponential increase in the volume and variety of data at our disposal to build recommenders and notable advances in compute and algorithms to utilize in the process. Particularly, the means to store, process and learn from image data has dramatically increased in the past several years. This allows retailers to go beyond simple collaborative filtering algorithms and utilize more complex methods, such as image classification and deep convolutional neural networks, that can take into account the visual similarity of items as an input for making recommendations. This is especially important given online shopping is a largely visual experience and many consumer goods are judged on aesthetics.

In this article, we’ll change the script and show the end-to-end process for training and deploying an image-based similarity model that can serve as the foundation for a recommender system. Furthermore, we’ll show how the underlying distributed compute available in Databricks can help scale the training process and how foundational components of the Lakehouse, Delta Lake and MLflow, can make this process simple and reproducible.

Click through for the process.

Comments closed

AutoML with pycaret

Brendan Tierney looks at the pycaret library:

In this post we will have a look at using the AutoML feature in the Pycaret Python library. AutoML is a popular topic and allows Data Scientists and Machine Learning people to develop potentially optimized models based on their data. All requiring the minimum of input from the Data Scientist. As with all AutoML solutions, care is needed on the eventual use of these models. With various ML and AI Legal requirements around the World, it might not be possible to use the output from AutoML in production. But instead, gives the Data Scientists guidance on creating an optimized model, which can then be deployed in production. This facilitates requirements around model explainability, transparency, human oversight, fairness, risk mitigation and human in the loop.

Read on for a tutorial as well as additional resources.

Comments closed

Variable Typing in Python

Adrian Tam notes that Python is for the birds:

Python is a duck typing language. It means the data types of variables can change as long as the syntax is compatible. Python is also a dynamic programming language. Meaning we can change the program while it runs, including defining new functions and the scope of name resolution. Not only these give us a new paradigm in writing Python code, but also a new set of tools for debugging. In the following, we will see what we can do in Python that cannot be done in many other languages. After finishing this tutorial you will know

– How Python manages the variables you defined

– How Python code uses a variable and why we don’t need to define its type like C or Java

Read on to learn how.

Comments closed

The Architecture of Project Bansai

Tsuyoshi Matsuzaki takes us through the architecture for Project Bansai:

Project Bonsai is a reinforcement learning framework for machine teaching in Microsoft Azure.

In generic reinforcement learning (RL), data scientists will combine tools and utilities (such like, Gym, RLlib, Ray, etc) which can be easily customized with familiar Python code and ML/AI frameworks, such as, TensorFlow or PyTorch.
But, in engineering tasks with machine teaching for autonomous systems or intelligent controls, data scientists will not always explore and tune attributes for AI. In successful practices, the professionals for operations or engineering (non-AI specialists) will tune attributes for some specific control systems (simulations) to train in machine teaching, and data scientists will assist in cases where the problem requires advanced solutions.

Read on to see how it works.

Comments closed

Azure ML and the Python SDK in VS Code

I continue a series on getting beyond the basics with Azure ML. First up, we get up close and personal in development:

Notebooks are great for ad hoc work or simple data analysis but we will want more robust tools if we wish to perform proper code development, testing, and deployment. This is where Visual Studio Code comes into play, particularly the Azure Machine Learning extension.

Then, I get into the Python SDK:

Over the past two posts, we have started using the Azure Machine Learning SDK for Python but I’ve only touched on the topic. In this post, we are going to dive into the topic.

Read on for more info on each.

Comments closed

Building a Simple Streamlit App

I jump into a new web framework:

In the course of working on my book, I wanted to build an easy-to-use website for outlier detection. The idea here is that I have a REST API to perform the outlier detection work but I’d like something a little easier to read than JSON blobs coming out of Postman. That’s where Streamlit comes into play.

Click through to see how it all works. I was impressed with how easy it was to build a decent interactive website.

Comments closed

Generating Python Documentation with Sphinx

Evan Seabrook generates some docs:

As you can see above, we have docstrings defined for properties, methods and the classes themselves. Ultimately, these docstrings will be used by Sphinx to generate the documentation. If you’re using a different docstring format, you can use a Sphinx extension called Napoleon to use your existing docstrings. Once your project has a level of docstring usage that you’re happy with, we can move on to the next step of configuring Sphinx.

And that’s the downside to this: you get auto-generated documentation, which means it’s only as good as your developers’ ability to explain the code.

Comments closed