Press "Enter" to skip to content

Curated SQL Posts

Secure FTP in Powershell

Chad Baldwin needs to send a file:

Over the years I’ve written dozens of these, one thing that often hangs me up are the “copy to SFTP” and “copy from SFTP” steps. usually what happens is I build two scripts…an “export script”, which has a manual step of “open FileZilla and upload file”, and then an Import script with another manual step to download the file.

After some Google searching to see how to handle SFTP in PowerShell, I ran into this StackOverflow answer (the creator of WinSCP) which introduced me to some cool new alternatives for dealing with FTP, FTPS, SFTP, SCP, etc using PowerShell.

Click through to see one way to do it, as well as some additional recommendations from Viewers Like You.

Comments closed

Voronoi Diagrams with R and x11()

Tomaz Kastrun creates a Voronoi diagram:

Yes. Finally, the Voronoi diagrams with the use of x11() function. This diagram is presentation of a plane that is partitioned every time, a user clicks on the canvas of x11. This plane is partitioned into smaller regions that are close to given set of points.

Partitioning into smaller regions or convex polygons happens in such manner that each polygon contains only one generating point and every point in a given polygon is closer to its generating point than to any other.

I had to take a look out of curiosity, and yes, the x11() function does work on Windows as well.

Comments closed

GPU-Accelerated Analysis on Databricks using PyTorch + Huggingface

Srijith Rajamohan walks us through an example of sentiment analysis using the PyTorch and Huggingface libraries on Databricks:

Sentiment analysis is commonly used to analyze the sentiment present within a body of text, which could range from a review, an email or a tweet. Deep learning-based techniques are one of the most popular ways to perform such an analysis. However, these techniques tend to be very computationally intensive and often require the use of GPUs, depending on the architecture and the embeddings used. Huggingface (https://huggingface.co) has put together a framework with the transformers package that makes accessing these embeddings seamless and reproducible. In this work, I illustrate how to perform scalable sentiment analysis by using the Huggingface package within PyTorch and leveraging the ML runtimes and infrastructure on Databricks.

Click through for a description of the process, as well as a link to a notebook you can walk through yourself.

Comments closed

Document Classification in Python

Brendan Tierney performs a bit of document classification with scikit-learn and nltk:

Text mining is a popular topic for exploring what text you have in documents etc. Text mining and NLP can help you discover different patterns in the text like uncovering certain words or phases which are commonly used, to identifying certain patterns and linkages between different texts/documents. Combining this work on Text mining you can use Word Clouds, time-series analysis, etc to discover other aspects and patterns in the text. Check out my previous blog posts (post 1post 2) on performing Text Mining on documents (manifestos from some of the political parties from the last two national government elections in Ireland). These two posts gives you a simple indication of what is possible.

We can build upon these Text Mining examples to include other machine learning algorithms like those for Classification. With Classification we want to predict or label a record or document to have a particular value. With Classification this could involve labeling a document as being positive or negative (movie or book reviews), or determining if a document is for a particular domain such as Technology, Sports, Entertainment, etc

Click through for a walkthrough of this process.

Comments closed

Showing Count of Selected Items in a Slicer

Prathy Kamasani wants to track slicer counts:

In one of the projects, I was working on, I received feedback saying it is hard to understand how many items they have selected in a slicer, and it is not the first time I came across this. It is a valid point, especially when you have quite a few items in a slicer, you use a search bar to look for items, you select a couple, but you were not sure how many were selected.

Read on for a rather clever solution to the problem.

Comments closed

Data Source Name Not Found with Postgres Driver

Rayis Imayev troubleshoots a problem:

A very short blog post, just a reminder to myself, but if you have ever tried to connect to a PostgreSQL database using ODBC interface (I know, it already sounds like a very interesting challenge :- ), then you might have experienced this error message: “ERROR [IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified.”

Read on to see the cause of and solution to this problem.

Comments closed

A Primer on Kafka Streams

Bill Bejeck has an introduction to Kafka Streams:

Kafka Streams is an abstraction over Apache Kafka® producers and consumers that lets you forget about low-level details and focus on processing your Kafka data. You could of course write your own code to process your data using the vanilla Kafka clients, but the Kafka Streams equivalent will have far fewer lines, because it’s declarative rather than imperative. As a library, Kafka Streams lets you create a standalone application that can be run anywhere that can connect to a Kafka broker, whether that’s a laptop or a hefty cloud server. You just need to provide it with the host and port name of a broker. Combining Kafka Streams with Confluent Cloud grants you even more processing power with very little code investment.

Click through for a description as well as a whole series of embedded videos.

Comments closed