Press "Enter" to skip to content

Month: November 2021

Restrictions for Parallel Insertion

Erik Darling summarizes the main man:

I’d like to start this post off by thanking my co-blogger Joe Obbish for being lazy and not blogging about this when he first ran into it three years ago.

Now that we’re through with pleasantries, let’s talk turkey.

Over in this post, by Arvind Shyamsundar, which I’m sure Microsoft doesn’t consider official documentation since it lacks a GUID in the URL, there’s a list of… things about parallel inserts.

Read on for the summary of limitations.

Comments closed

Secure FTP in Powershell

Chad Baldwin needs to send a file:

Over the years I’ve written dozens of these, one thing that often hangs me up are the “copy to SFTP” and “copy from SFTP” steps. usually what happens is I build two scripts…an “export script”, which has a manual step of “open FileZilla and upload file”, and then an Import script with another manual step to download the file.

After some Google searching to see how to handle SFTP in PowerShell, I ran into this StackOverflow answer (the creator of WinSCP) which introduced me to some cool new alternatives for dealing with FTP, FTPS, SFTP, SCP, etc using PowerShell.

Click through to see one way to do it, as well as some additional recommendations from Viewers Like You.

Comments closed

Voronoi Diagrams with R and x11()

Tomaz Kastrun creates a Voronoi diagram:

Yes. Finally, the Voronoi diagrams with the use of x11() function. This diagram is presentation of a plane that is partitioned every time, a user clicks on the canvas of x11. This plane is partitioned into smaller regions that are close to given set of points.

Partitioning into smaller regions or convex polygons happens in such manner that each polygon contains only one generating point and every point in a given polygon is closer to its generating point than to any other.

I had to take a look out of curiosity, and yes, the x11() function does work on Windows as well.

Comments closed

GPU-Accelerated Analysis on Databricks using PyTorch + Huggingface

Srijith Rajamohan walks us through an example of sentiment analysis using the PyTorch and Huggingface libraries on Databricks:

Sentiment analysis is commonly used to analyze the sentiment present within a body of text, which could range from a review, an email or a tweet. Deep learning-based techniques are one of the most popular ways to perform such an analysis. However, these techniques tend to be very computationally intensive and often require the use of GPUs, depending on the architecture and the embeddings used. Huggingface (https://huggingface.co) has put together a framework with the transformers package that makes accessing these embeddings seamless and reproducible. In this work, I illustrate how to perform scalable sentiment analysis by using the Huggingface package within PyTorch and leveraging the ML runtimes and infrastructure on Databricks.

Click through for a description of the process, as well as a link to a notebook you can walk through yourself.

Comments closed

Document Classification in Python

Brendan Tierney performs a bit of document classification with scikit-learn and nltk:

Text mining is a popular topic for exploring what text you have in documents etc. Text mining and NLP can help you discover different patterns in the text like uncovering certain words or phases which are commonly used, to identifying certain patterns and linkages between different texts/documents. Combining this work on Text mining you can use Word Clouds, time-series analysis, etc to discover other aspects and patterns in the text. Check out my previous blog posts (post 1post 2) on performing Text Mining on documents (manifestos from some of the political parties from the last two national government elections in Ireland). These two posts gives you a simple indication of what is possible.

We can build upon these Text Mining examples to include other machine learning algorithms like those for Classification. With Classification we want to predict or label a record or document to have a particular value. With Classification this could involve labeling a document as being positive or negative (movie or book reviews), or determining if a document is for a particular domain such as Technology, Sports, Entertainment, etc

Click through for a walkthrough of this process.

Comments closed

Showing Count of Selected Items in a Slicer

Prathy Kamasani wants to track slicer counts:

In one of the projects, I was working on, I received feedback saying it is hard to understand how many items they have selected in a slicer, and it is not the first time I came across this. It is a valid point, especially when you have quite a few items in a slicer, you use a search bar to look for items, you select a couple, but you were not sure how many were selected.

Read on for a rather clever solution to the problem.

Comments closed

Data Source Name Not Found with Postgres Driver

Rayis Imayev troubleshoots a problem:

A very short blog post, just a reminder to myself, but if you have ever tried to connect to a PostgreSQL database using ODBC interface (I know, it already sounds like a very interesting challenge :- ), then you might have experienced this error message: “ERROR [IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified.”

Read on to see the cause of and solution to this problem.

Comments closed