Press "Enter" to skip to content

Author: Kevin Feasel

Getting Started With Azure Cognitive Services

Rolf Tesmer has a demo app showing what Azure Cognitive Services Text Analytics can do:

Each execution of the application on any input file will generate 3 text output files with the results of the assessment.  The application runs at a rate of about 1-2 calls per second (the max send rate cannot exceed 100/min as this is the API limit).

  • File 1 [AzureTextAPI_SentimentText_YYYYMMDDHHMMSS.txt] – the sentiment score between 0 and 1 for each individual line in the Source Text File.  The entire line in the file is graded as a single data point.  0 is negative, 1 is positive.

  • File 2 [AzureTextAPI_SentenceText_YYYYMMDDHHMMSS.txt] – if the “Split Document into Sentences” option was selected then this contains each individual sentence in each individual line with the sentiment score of that sentence between 0 and 1.  0 is negative, 1 is positive.  RegEx is used to split the line into sentences.

  • File 3 [AzureTextAPI_KeyPhrasesText_YYYYMMDDHHMMSS.txt] – the key phrases identified within the text on each individual line in the Source Text File.

Rolf has also put his code on GitHub, so read on and check out his repo.

Comments closed

Pattern Matching With LIKE

Shane O’Neill looks at different ways to match patterns with the LIKE operator:

So let’s test out this bad boy using the WideWorldImporters database, see if we can find everyone with the first name of Leyla.

Simple right? And because [Sales].[Customers] uses the full name, we have to use LIKE.

Now a developer comes along and says “Wait a second, my sister is Leila”. So we try to cheat and add a wildcard in there.

Leonardo!? Well I suppose he does count in this situation, but there’s 2 characters between the ‘e’ and the ‘a’ and I only wanted one.

Click through for a couple pattern matching tricks and look for ways to avoid Lejla in your life.

Comments closed

Unpivoting Wide Tables

Dave Mason has to unpivot a particularly wide table:

Today’s less-than-ugent challenge was to un-pivot the output of RESTORE HEADERONLY. I thought for certain someone else, somewhere, at at some time must have wanted to do the same thing. So I asked the Twitterverse, but no one responded. I guess I’ll have to make do myself without the easy button. No worries, though. We can do this!

Metadata tables are good friends at times like these.

Comments closed

Clusterless Availability Groups

Allan Hirt talks about clusterless AGs:

A quick history lesson: through SQL Server 2016, we have three main variants of AGs:

  • “Regular” AGs (i.e. the ones deployed using an underlying Windows Server failover cluster [WSFC] requiring Active Directory [AD]; SQL Server 2012+)
  •  AGs that can be deployed without AD, but using a WSFC and certificates (SQL Server 2016+ with Windows Server 2016+)
  • Distributed AGs (SQL Server 2016+)

SQL Server v.Next (download the bits here) adds another variant which is, to a degree, a side effect of how things can be deployed in Linux: AGs with no underlying cluster. In the case of a Windows Server-based install, this means that there could be no WSFC, and for Linux, currently no Pacemaker.

Read on for more details, including limitations and expectations.

Comments closed

Global Maps In R

The folks at Sharp Sight Labs show how to create high-quality map visuals in R:

Maps are great for practicing data visualization. First of all, there’s a lot of data available on places like Wikipedia that you can map.

Moreover, creating maps typically requires several essential skills in combination. Specifically, you commonly need to be able to retrieve the data (e.g., scrape it), mold it into shape, perform a join, and visualize it. Because creating maps requires several skills from data manipulation and data visualization, creating them will be great practice for you.

And if that’s not enough, a good map just looks great. They’re visually compelling.

With that in mind, I want to walk you through the logic of building one step by step.

Read on for a step by step process.

Comments closed

Using Azure Data Lake Store With Hadoop

Amit Kulkarni shows how to make Azure Data Lake Store the default file system for a Hadoop cluster:

So to give a concrete example, if the default file system was hdfs://123.23.12.4344:9000 then the /user/filename.txt would resolve to hdfs://123.23.12.4344:9000/user/filename.txt.

Why does the default file system matter? The first answer to this is purely convenience. It is a heck lot easier to simply say /events/sensor1/ than adl://amitadls.azuredatalakestore.net/ in code and configurations. Secondly, many components in Hadoop use relative paths by default. For instance there are a fixed set of places, specified by relative paths, where various applications generate their log files. Finally, many ISV applications running on Hadoop specify important locations by relative paths.

Read on to see how.

Comments closed

That 53rd Week

Jens Vestergaard notes that you can sometimes have a 53rd week in the year:

There are a lot of great examples out there on how to build your own custom Time Intelligence into Analysis Services (MD). Just have a look at this, this, this, this and this. All good sources for solid Time Intelligence in SSAS.
One thing they have in common though, is that they all make the assumption that there is and will always be 52 weeks in a year. The data set I am currently working with is built on ISO 8601 standard. In short, this means that there is an (re-) occurrence of a 53rd full week as opposed to only 52 in the Gregorian version which is defined by: 1 Gregorian calendar year = 52 weeks + 1 day (2 days in a leap year).

The 53rd occurs approximately every five to six years, though this is not always the case. The last couple of times  we saw 53 weeks in a year was in 1995, 2000, 2006, 2012 and 2015. Next time will be in 2020. This gives you enough time to either forget about the hacks and hard-coded fixes in place to mitigate the issue OR bring your code in a good state, ready for the next time.

Dates and currency are hard problems.

Comments closed

Gloom Indexes

David Smith points out an interesting use of R:

Radiohead is known for having some fairly maudlin songs, but of all of their tracks, which is the most depressing? Data scientist and R enthusiast Charlie Thompson ranked all of their tracks according to a “gloom index”, and created the following chart of gloominess for each of the band’s nine studio albums. (Click for the interactive version, crated with with highcharter package for R, which allows you to explore individual tracks.)

Do click through for Charlie’s explanation, including where the numbers come from.

Comments closed

Natural Earth In SQL Server

Jeff Pries shows how to use the Natural Earth data set in SQL Server:

But what about when you need more flexibility in your geographic display?  Some examples of this may be wanting to display something that you can’t find a shape file for (maybe all the states and provinces in North America) or maybe you want to dynamically draw the geography based on some property of the dataset.  Geospatial data queries to the rescue!  Using SQL Server’s native geospatial support, a geospatial query can be created to return something as simple as a point or rectangle, or complex as the geography of an entire country and all of its rivers.

Getting all of the latitude and longitude coordinates to create a useful geospatial query could potentially be an enormous amount of work.  Fortunately, that work has already been done in a freely available resource, thanks to Natural Earth and Laurent Dupuis.  SQL Server 2012 or greater is recommended for this process.

Click through for a walkthrough as well as some introductory queries to get you started with using the data set.

Comments closed