Press "Enter" to skip to content

Day: August 22, 2017

Using NLP To Find Similar Facebook Posts

The folks at Knoyd put together a word embedding example by scraping a Python Facebook group:

We are going to represent the content of a Facebook post using word embeddings and comparing the transformed posts using word mover’s distance. The combination of both have shown lower k-nearest neighbor-document classification error rates compared to other state of the art techniques.

The advantage of word embeddings is that the words which have similar meanings but don’t have any letters in common will still have similar vectors (be close) in the embedded space (e.g. lion and tiger).

There’s a good high-level discussion of techniques in this post.

Comments closed

R Services Internal Communication Mechanisms

Niels Berglund continues his R Services internals series:

When browsing for the symbols, you can use this command: x /1 *!TCP*. By using the option /1 you’ll only see the names, and no addresses. On my machine that gives me quite a lot, but there are two entries that catch my eye: sqllang!Tcp::AcceptConnection and sqllang!Tcp::Close. So let us set breakpoints at those two symbols, and see what happens when we execute our code.

The result when executing the code is that we initially break at sqllang!Tcp::AcceptConnection. Followed somewhat later by breaking at sqllang!Tcp::Close. Cool, this seems to work – let us set some more breakpoints and try to figure out the flow of events.

The first half recapitulates his previous findings, and then he incorporates new information in the second half.

Comments closed

Palindromes In SQL

Lukas Eder rises to a challenge:

SQL is a really cool language. I can write really complex business logic with this logic programming language. I was again thrilled about SQL recently, at a customer site:

But whenever I tweet something like the above, the inevitable happened. I was nerd snipedOleg Šelajev from ZeroTurnaround challenged me to prove that SQL is so awesome:

Given a string, find all substrings from that string, which are palindromes. Challenge accepted! (For the moment, let’s forget about algorithmic complexity.)

His answer is in Postgres syntax, and a commenter includes Oracle syntax.  T-SQL is left as an exercise for the reader.

Comments closed

Automating Index Maintenance On Azure SQL DB

Arun Sirpal shows how to use Azure Automation to rebuild indexes on an Azure SQL Database database:

The answer is via Azure Automation.

At a high level this is what I did.

  • Create an Automation Account.

  • Create a credential.

  • Create a PowerShell Runbook which has the code for index rebuilds.

  • Create a schedule and link it to the above.

  • Configure parameters within the schedule (if any).

  • Configure logging level (if desired).

Click through for the detailed steps.

Comments closed

CLR Strict Security Done Easier

Solomon Rutzky continues his CLR Strict Security series with an easier way of creating a secure assembly:

This solution is easier than Solution 1:

8 steps instead of 22!

No extra Project

However, a very small amount of risk was added by overriding the default MSBuild workflow for SSDT. This risk can be eliminated if Microsoft provides a pre-defined Target for the appropriate event. Please upvote my suggestion to have this happen: Add MSBuild predefined Targets for “BeforeSqlBuild” and “BeforePublish” to SSDT SQL Server Data Projects.

ALSO: Even though we did not sign the assembly with a Strong Name Key, it is still probably a good idea to do that.

If you use CLR, this is worth the read.

Comments closed

Another Reason To Avoid Shrinking Data Files

Frank Gill gives us a demo of how much log space it takes to shrink a database file:

Yesterday, I was running a health assessment for a client. They are running a weekly maintenance plan that is shrinking all of their data files. After I picked myself up off the floor, I searched the web for “Paul Randal shrink” and hit on Paul’s excellent post Why you should not shrink your data files. In the post, Paul (b|t) demonstrates the effect of DBCC SHRINKDATABASE on index fragmentation. After the demo script, Paul writes, “As well as introducing index fragmentation, data file shrink also generates a lot of I/O, uses a lot of CPU and generates *loads* (emphasis Paul’s) of transaction log.”

This led me to ask the question, “How much is *loads*?”. To find an answer, I made the following modification to Paul’s script:

Read on for the answer.  There are legitimate reasons to shrink data files, but it comes at a very high cost.

Comments closed

New Extended Events In SQL Server 2017

Erik Darling unwraps some of his Christmas presents a bit early:

There are, as of RC2 being released, 194(!) new Events to Extend your mind with. Not all of them are interesting to me, and I haven’t had time to pry into all of the ones that are interesting just yet.

This is a rundown of the new Events with names or descriptions that I found interesting, and will try to spend some time with.

I can’t promise anything

After all, getting some of these to fire is tougher than using a Debugger.

There are some interesting events here.

Comments closed

Building Azure Resource Manager Templates

Ed Elliott gives a brief overview of Azure Resource Manager templates and puts together a sample template:

The ARM API deploys resources to Azure, but doesn’t deploy code onto those resources. For example you can use ARM to deploy a virtual machine with SQL Server already installed but you can’t use ARM to deploy a database from an SSDT DacPac.

To save time when designing solutions, it is important to understand that ARM API is used simply for resources and we need to use some other technology such as DSC or PowerShell to manage the deployments onto the infrastructure once it is deployed.

This is a nice overview of the topic, and because it’s Ed (who is much better about this than most), he goes into how to test before even getting into how to create.

Comments closed