Press "Enter" to skip to content

Category: Python

Dynamic DAGs with Apache Airflow

Bhavya Garg explains how we can create dynamic directed acyclic graphs in Apache Airflow:

Airflow dynamic DAGs can save you a ton of time. As you know, Apache Airflow is written in Python, and DAGs are created via Python scripts. That makes it very flexible and powerful (even complex sometimes). By leveraging Python, you can create DAGs dynamically based on variables, connections, a typical pattern, etc. This very nice way of generating DAGs comes at the price of higher complexity and subtle tricky things that you must know

Read on for an example.

Comments closed

Snowflake Purchases Streamlit

Alex Woodie reports on a purchase:

Cloud data warehousing giant Snowflake showed it’s serious about Python and data science this week when it announced that it plans to spend $800 million to buy Streamlit, a provider of Python-based tools for rapidly developing interactive data applications on the Web.

Co-founded in San Francisco in 2018 by Adrien Treuille, Amanda Kelly, and Thiago Teixeira, Streamlit develops an open source framework of the same name that allows data scientists and machine learning engineers to create and deploy data applications. The software is compatible with other Python-based frameworks, such as NumPy, Pandas, Matplotlib, and Scikit-learn, and uses React to render screens on the front-end.

Streamlit is nice. $800 million nice? That’s a good question.

Comments closed

Building a Recommender in Spark

Avinash Sooriyarachchi makes a recommendation:

There has been an exponential increase in the volume and variety of data at our disposal to build recommenders and notable advances in compute and algorithms to utilize in the process. Particularly, the means to store, process and learn from image data has dramatically increased in the past several years. This allows retailers to go beyond simple collaborative filtering algorithms and utilize more complex methods, such as image classification and deep convolutional neural networks, that can take into account the visual similarity of items as an input for making recommendations. This is especially important given online shopping is a largely visual experience and many consumer goods are judged on aesthetics.

In this article, we’ll change the script and show the end-to-end process for training and deploying an image-based similarity model that can serve as the foundation for a recommender system. Furthermore, we’ll show how the underlying distributed compute available in Databricks can help scale the training process and how foundational components of the Lakehouse, Delta Lake and MLflow, can make this process simple and reproducible.

Click through for the process.

Comments closed

AutoML with pycaret

Brendan Tierney looks at the pycaret library:

In this post we will have a look at using the AutoML feature in the Pycaret Python library. AutoML is a popular topic and allows Data Scientists and Machine Learning people to develop potentially optimized models based on their data. All requiring the minimum of input from the Data Scientist. As with all AutoML solutions, care is needed on the eventual use of these models. With various ML and AI Legal requirements around the World, it might not be possible to use the output from AutoML in production. But instead, gives the Data Scientists guidance on creating an optimized model, which can then be deployed in production. This facilitates requirements around model explainability, transparency, human oversight, fairness, risk mitigation and human in the loop.

Read on for a tutorial as well as additional resources.

Comments closed

Variable Typing in Python

Adrian Tam notes that Python is for the birds:

Python is a duck typing language. It means the data types of variables can change as long as the syntax is compatible. Python is also a dynamic programming language. Meaning we can change the program while it runs, including defining new functions and the scope of name resolution. Not only these give us a new paradigm in writing Python code, but also a new set of tools for debugging. In the following, we will see what we can do in Python that cannot be done in many other languages. After finishing this tutorial you will know

– How Python manages the variables you defined

– How Python code uses a variable and why we don’t need to define its type like C or Java

Read on to learn how.

Comments closed

The Architecture of Project Bansai

Tsuyoshi Matsuzaki takes us through the architecture for Project Bansai:

Project Bonsai is a reinforcement learning framework for machine teaching in Microsoft Azure.

In generic reinforcement learning (RL), data scientists will combine tools and utilities (such like, Gym, RLlib, Ray, etc) which can be easily customized with familiar Python code and ML/AI frameworks, such as, TensorFlow or PyTorch.
But, in engineering tasks with machine teaching for autonomous systems or intelligent controls, data scientists will not always explore and tune attributes for AI. In successful practices, the professionals for operations or engineering (non-AI specialists) will tune attributes for some specific control systems (simulations) to train in machine teaching, and data scientists will assist in cases where the problem requires advanced solutions.

Read on to see how it works.

Comments closed

Azure ML and the Python SDK in VS Code

I continue a series on getting beyond the basics with Azure ML. First up, we get up close and personal in development:

Notebooks are great for ad hoc work or simple data analysis but we will want more robust tools if we wish to perform proper code development, testing, and deployment. This is where Visual Studio Code comes into play, particularly the Azure Machine Learning extension.

Then, I get into the Python SDK:

Over the past two posts, we have started using the Azure Machine Learning SDK for Python but I’ve only touched on the topic. In this post, we are going to dive into the topic.

Read on for more info on each.

Comments closed

Building a Simple Streamlit App

I jump into a new web framework:

In the course of working on my book, I wanted to build an easy-to-use website for outlier detection. The idea here is that I have a REST API to perform the outlier detection work but I’d like something a little easier to read than JSON blobs coming out of Postman. That’s where Streamlit comes into play.

Click through to see how it all works. I was impressed with how easy it was to build a decent interactive website.

Comments closed

Generating Python Documentation with Sphinx

Evan Seabrook generates some docs:

As you can see above, we have docstrings defined for properties, methods and the classes themselves. Ultimately, these docstrings will be used by Sphinx to generate the documentation. If you’re using a different docstring format, you can use a Sphinx extension called Napoleon to use your existing docstrings. Once your project has a level of docstring usage that you’re happy with, we can move on to the next step of configuring Sphinx.

And that’s the downside to this: you get auto-generated documentation, which means it’s only as good as your developers’ ability to explain the code.

Comments closed

Sentiment Analysis with Python

Sanil Mhatre performs a bit of sentiment analysis:

Previous articles in this series have focused on platforms like Azure Cognitive Services and Oracle Text features to perform the core tasks of Natural Language Processing (NLP) and Sentiment Analysis. These easy-to-use platforms allow users to quickly analyze their text data with easy-to-use pre-built models. Potential drawbacks of this approach include lack of flexibility to customize models, data locality & security concerns, subscription fees, and service availability of the cloud platform. Programming languages like Python and R are convenient when you need to build your models for Natural Language Processing and keep your code as well as data contained within your data centers. This article explains how to do sentiment analysis using Python.

Python is a versatile and modern general-purpose programming language that is powerful, fast, and easy to learn. Python runs on interpreters, making it compatible with multiple platforms, and is widely used in applications for web platforms, graphical interfaces, data science, and machine learning. Python is increasingly gaining popularity in data analysis and is one of the most widely used languages for data science. You can learn more about Python from the official Python Software Foundation website.

Click through to see what’s available in the NLP world for Python. The short version is “a lot.”

Comments closed