Press "Enter" to skip to content

Month: May 2020

Text Mining and Sentiment Analysis in R

Sanil Mhatre walks us through a sentiment analysis scenario in R:

Sentiments can be classified as positive, neutral or negative. They can also be represented on a numeric scale, to better express the degree of positive or negative strength of the sentiment contained in a body of text.

This example uses the Syuzhet package for generating sentiment scores, which has four sentiment dictionaries and offers a method for accessing the sentiment extraction tool developed in the NLP group at Stanford. The get_sentiment function accepts two arguments: a character vector (of sentences or words) and a method. The selected method determines which of the four available sentiment extraction methods will be used. The four methods are syuzhet (this is the default), bingafinn and nrc. Each method uses a different scale and hence returns slightly different results. Please note the outcome of nrc method is more than just a numeric score, requires additional interpretations and is out of scope for this article. The descriptions of the get_sentiment function has been sourced from : https://cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html?

Comments closed

Handling Cross-Database Transactions

Michael J. Swart explains how cross-database transactions work on a single instance:

The transaction is touching two different databases. So it makes sense that the two actions should be atomic and durable together using the one single transaction.

However, databases implement durability and atomicity using their own transaction log. Each transaction log takes care of its own database. So from another point of view, it makes sense that these are two separate transactions.

Which is it? Two transaction or one transaction?

Click through to read the article, and then check the comments for a cautionary tale around database mirroring and cross-database transactions.

Comments closed

Expression-Based Formatting in DAX

Matt Allington points out a new look to a slightly less new feature:

When I say “new”, they are actually not new – this feature has actually been around since the start of 2019. What is “new” is the discoverability of the feature. Prior to the April 2020 release, you had to first hover your mouse button over the section (Title text in the example above). After you hovered your mouse, you would see a vertical … menu (kind of like a vertical ellipsis), then when you hovered over the vertical ellipsis, you would then see the Fx button. Click the mysterious hidden button and only then could you discover the world of expression based formatting.

Click through to learn about expression-based formatting and where it might be useful.

Comments closed

Removing and Refilling All Tables in a Database

Phil Factor has a couple T-SQL scripts for us to remove and reload a test database:

Copying a database using a BCP dataset from a DOS script or from PowerShell is fairly quick and trivial, but what if the data is already in another copy of the database on the same instance? Imagine you have a database build that is a copy of an existing database on the same instance and you want to run a ‘destructive’ test on the data, and do it over and over again.

Deleting all the data in a database should, you’d have thought, be trivial. Truncation doesn’t work because it doesn’t like to truncate tables with foreign keys, even if they are disabled. DELETE is safer but you will need to then reset any identity columns. Deleting large tables in one chunk isn’t scalable, so you have to do it in smaller chunks. I like to disable triggers while I do this as well. The only disadvantage comes if you have triggers that are supposed to be disabled because the simple code just re-enables them all.

This is, I think, one of the biggest selling points for containers where the database is built into the container image. You spin up a container based off of an image, perform your destructive testing, and destroy the container afterward. The next time you need to run these tests, spin up a new container. And if you need to change the data, modify the container. This introduces new challenges like how SQL Server on Linux has limitations which don’t exist on Windows, but for supported functionality, it’s a nice solution.

Comments closed

Solving Last Page Contention Problems

Erik Darling ponders several techniques for solving the last page contention problem in SQL Server:

If one dare go looking, a list of Microsoft Approved™ solutions for dealing with last page contention exists.

I’m going to walk through my experiences with them in different scenarios, so you get a better understanding of which ones work best, and which ones can be tough to implement.

Read the whole thing.

Comments closed

Separating Reports from Datasets in Power BI

Melissa Coates has a video and an article for us:

Including documentation alongside the dataset is helpful for teammates who might need to work on this dataset at some point, or even for yourself in the future. It also means you won’t have an empty report, and you won’t need to delete the empty report once the file is published to the service.

Click through for plenty of useful information on the why and how, as well as what to watch out for.

Comments closed

Speeding Up Pivot Operations in Power Query

Imke Feldmann has a few tricks for making pivot operations in Power Query faster:

Pivot operations in are a very handy feature in  Power Query but they can slow down refresh performance. So with some bittersweet pleasure I can tell that I found a trick to speed them up. The sweetness comes from the fact that the performance improvement is very significant. But the bitterness comes from the fact that I could have used this for almost 4 years now, but was too blind to realize at the time when I first worked with the code.

Read the whole thing and learn why Imke is disappointed in not knowing this four years ago.

Comments closed

Choosing a Recommender

Tori Tompkins walks us through four methods for generating models for recommendation systems:

One of the most popular approaches to recommendation problems is Collaborative Filtering (CF).

This approach is based on the assumption that users who have agreed in the past, also tend to agree in the future. For example, if Alice likes Star Wars, Lord of the Rings and Harry Potter and Bob likes Stars Wars and Lord of the Rings too. Then it is likely that Bob would also enjoy Harry Potter. The collaborative in Collaborative Filtering is in relation to looking at the way that you multiple users interact with the same data and share the same commonalities.

Read on for a comparison of these four systems as well as a handy chart to help you figure out which system might work best for you.

Comments closed