Press "Enter" to skip to content

Curated SQL Posts

Document Classification in Python

Brendan Tierney performs a bit of document classification with scikit-learn and nltk:

Text mining is a popular topic for exploring what text you have in documents etc. Text mining and NLP can help you discover different patterns in the text like uncovering certain words or phases which are commonly used, to identifying certain patterns and linkages between different texts/documents. Combining this work on Text mining you can use Word Clouds, time-series analysis, etc to discover other aspects and patterns in the text. Check out my previous blog posts (post 1post 2) on performing Text Mining on documents (manifestos from some of the political parties from the last two national government elections in Ireland). These two posts gives you a simple indication of what is possible.

We can build upon these Text Mining examples to include other machine learning algorithms like those for Classification. With Classification we want to predict or label a record or document to have a particular value. With Classification this could involve labeling a document as being positive or negative (movie or book reviews), or determining if a document is for a particular domain such as Technology, Sports, Entertainment, etc

Click through for a walkthrough of this process.

Comments closed

Showing Count of Selected Items in a Slicer

Prathy Kamasani wants to track slicer counts:

In one of the projects, I was working on, I received feedback saying it is hard to understand how many items they have selected in a slicer, and it is not the first time I came across this. It is a valid point, especially when you have quite a few items in a slicer, you use a search bar to look for items, you select a couple, but you were not sure how many were selected.

Read on for a rather clever solution to the problem.

Comments closed

Data Source Name Not Found with Postgres Driver

Rayis Imayev troubleshoots a problem:

A very short blog post, just a reminder to myself, but if you have ever tried to connect to a PostgreSQL database using ODBC interface (I know, it already sounds like a very interesting challenge :- ), then you might have experienced this error message: “ERROR [IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified.”

Read on to see the cause of and solution to this problem.

Comments closed

A Primer on Kafka Streams

Bill Bejeck has an introduction to Kafka Streams:

Kafka Streams is an abstraction over Apache Kafka® producers and consumers that lets you forget about low-level details and focus on processing your Kafka data. You could of course write your own code to process your data using the vanilla Kafka clients, but the Kafka Streams equivalent will have far fewer lines, because it’s declarative rather than imperative. As a library, Kafka Streams lets you create a standalone application that can be run anywhere that can connect to a Kafka broker, whether that’s a laptop or a hefty cloud server. You just need to provide it with the host and port name of a broker. Combining Kafka Streams with Confluent Cloud grants you even more processing power with very little code investment.

Click through for a description as well as a whole series of embedded videos.

Comments closed

From Kafka to Azure Data Explorer

Niels Berglund uses Kafka Connect to link an Apache Kafka topic to Azure Data Explroer:

If you follow my blog, you probably know that I am a huge fan of Apache Kafka and event streaming/stream processing. Recently Azure Data Explorer (ADX) has caught my eye. In fact, in the last few weeks, I did two conference sessions about ADX. A month ago, I published a blog post related to Kafka and ADX: Run Self-Managed Kusto Kafka Connector Serverless in Azure Container Instances.

As the title of that post implies, it looked at the ADX Kafka sink connector and how to run it in Azure. What the post did not look at was how to configure the connector and connect it to ADX. That is what we will do in this post (and maybe in a couple of more posts).

This post serves as a complete tutorial, though Niels does promise future posts on other ingestion methods, so stay tuned.

Comments closed

Automating Single Table Refresh with Azure Data Factory and Azure Automation

Marc Lelijveld wants to refresh a single table:

Back in February, I wrote a blog on how you can trigger a single table to refresh in your Power BI data model. This blog described how you can achieve this goal using a PowerShell script and the ASCmd cmdlets for Analysis Services, which also works for Power BI Premium. In the wrap-up of that blog, I promised to follow-up with a blog on how to achieve the same goal with Azure Data Factory. It took a little bit longer than expected to finalize this post, but here it is!

In this blog, co-authored by my colleague Paulien van Eijk, we will describe how you can automate your single table refresh in the Power BI Service, including all dependencies with downstream dataflows using Azure Data Factory and Azure Automation. All this is based on real life scenarios and a solution build in collaboration between Dave Ruijter, Paulien and me.

Read on for Marc and Paulien’s solution.

Comments closed