Press "Enter" to skip to content

Day: February 4, 2022

Sentiment Analysis with Python

Sanil Mhatre performs a bit of sentiment analysis:

Previous articles in this series have focused on platforms like Azure Cognitive Services and Oracle Text features to perform the core tasks of Natural Language Processing (NLP) and Sentiment Analysis. These easy-to-use platforms allow users to quickly analyze their text data with easy-to-use pre-built models. Potential drawbacks of this approach include lack of flexibility to customize models, data locality & security concerns, subscription fees, and service availability of the cloud platform. Programming languages like Python and R are convenient when you need to build your models for Natural Language Processing and keep your code as well as data contained within your data centers. This article explains how to do sentiment analysis using Python.

Python is a versatile and modern general-purpose programming language that is powerful, fast, and easy to learn. Python runs on interpreters, making it compatible with multiple platforms, and is widely used in applications for web platforms, graphical interfaces, data science, and machine learning. Python is increasingly gaining popularity in data analysis and is one of the most widely used languages for data science. You can learn more about Python from the official Python Software Foundation website.

Click through to see what’s available in the NLP world for Python. The short version is “a lot.”

Comments closed

Generating Python Documentation with Sphinx

Evan Seabrook generates some docs:

As you can see above, we have docstrings defined for properties, methods and the classes themselves. Ultimately, these docstrings will be used by Sphinx to generate the documentation. If you’re using a different docstring format, you can use a Sphinx extension called Napoleon to use your existing docstrings. Once your project has a level of docstring usage that you’re happy with, we can move on to the next step of configuring Sphinx.

And that’s the downside to this: you get auto-generated documentation, which means it’s only as good as your developers’ ability to explain the code.

Comments closed

Calculus and Python

Muhammad Asad Iqbal Khan performs derivatives in Python:

Derivatives are one of the most fundamental concepts in calculus. They describe how changes in the variable inputs affect the function outputs. The objective of this article is to provide a high-level introduction to calculating derivatives in PyTorch for those who are new to the framework. PyTorch offers a convenient way to calculate derivatives for user-defined functions.

Read on to see how you can use PyTorch to do this.

Comments closed

Run Spark within Azure ML Compute

James Nguyen makes an announcement:

Following the blog post on Turning AML compute into Ray and Dask , we’ve added a new exciting capability to run Spark within AML compute where Spark shares the same context with your ML code. The Spark version is 3.2.1 with support for Delta Lake and Synapse SQL read/write. This enables users of AML to perform powerful data transformation and even Spark ML within AML interactive notebook or in a job run. 

Traditionally, Azure ML integrates with Spark Synapse or external compute services via a pipeline step or better via magic command like %synapse, but the computing context is separate from your AML logic so you still need to run Spark in a separate step and persist the output to some storage and load it in your AML script.

With this approach, Spark is available right within your AML code whether it’s AML notebook, python script or pipeline step. It shares the common computing context and most of the cases you can just directly convert the Spark Dataframe to Pandas and Dask Dataframe without persisting first to an intermediary storage.

I’ll have to try this out to see if it makes up for their getting rid of the Spark-based curated environments last year.

Comments closed

From Cosmos DB to Dedicated SQL Pools via Synapse Link

Jovan Popovic shows off Azure Synapse Link:

At the time of writing this article, the dedicated SQL pool doesn’t have the ability to read data from CosmosDB/Dataverse using the Synapse link. There are scenarios where you would need to use CosmosDB data in your dedicated SQL pool, so you would need to find a way how to load data. In theory, you could create an ADF pipeline that reads data from CosmosDB or Dataverse and store data in the dedicated SQL pool as a target. This might be a problem if your Pipeline is reading data directly from CosmosDB account because it might impact both operational workload performance and cost. The analytical storage is the recommended location that you should use to fetch all data from CosmosDB/Dataverse.

In this post, I will describe how to use a two-step approach where you export your data using the serverless SQL pool via Synapse link into Azure Data Lake storage, and then load data into the dedicated SQL pool table. This process is shown in the following figure:

A couple of weeks back, I wrote about another method of doing this through the Spark pool. Now you have two options.

Comments closed

Restoring PostgreSQL Backups in Azure

Grant Fritchey tests a restore plan:

I recently wrote an article about PostgreSQL restores (and by extension, backups) over on Simple-Talk. The restore process within PostgreSQL, without 3rd party involvement, can be a little tricky. However, when you are using a Platform as a Service offering, like Azure Database for PostgreSQL, things get quite a bit easier. Let’s explore this just a little.

Read the whole thing if you’re thinking about PostgreSQL or Azure Database for PostgreSQL.

Comments closed

Change Data Capture and Availability Groups

Jeff Iannucci raises job awareness:

If you’ve ever had to implement Change Data Capture (CDC) for a database in an Availability Group, then you know that the CDC jobs don’t really consider the Availability Group. The capture and cleanup jobs created are set up as if the database exists only on a single instance.

And that’s a problem, because I would guess quite lot of databases are in Availability Groups. Maybe even some of yours. If you have this issue, I’ve put together a step-by-step solution in this post.

Click through for the Microsoft way and the Jeff way.

Comments closed