Press "Enter" to skip to content

Author: Kevin Feasel

Concurrency In Scala

Matthew Rathbone shows different concurrency options available in Scala:

Scala is a functional programming language that aims to avoid side effects by encouraging you to use immutable variables (called ‘values’), and data structures.

So by default in Scala when you build a list, array, string, or other object, that object is immutable and cannot be changed or updated.

This might seem unrelated, but think about a thread which has been given a list of strings to process, perhaps each string is a website that needs crawling.

In the Java model, this list might be updated by other threads at the same time (adding / removing websites), so you need to make sure you either have a thread-safe list, or you safeguard access to it with the protected keyword or a Mutex.

By default in Scala this list is immutable, so you can be sure that the list cannot be modified by other threads, because it cannot be modified at all.

While this does force you to program in different ways to work around the immutability, it does have the tremendous effect of simplifying thread-safety concerns. The value of this cannot be understated, it’s a huge burden to worry about thread safety all the time, but in Scala much of that burden goes away.

Read the whole thing if you’re looking at writing Spark applications in Scala.  If you’re thinking about functional programming in .NET languages, F# is  there for you.

Comments closed

Linear Support Vector Machines

Ananda Das explains how linear Support Vector Machines work in classifying spam messages:

Linear SVM assumes that the two classes are linearly separable that is a hyper-plane can separate out the two classes and the data points from the two classes do not get mixed up. Of course this is not an ideal assumption and how we will discuss it later how linear SVM works out the case of non-linear separability. But for a reader with some experience here I pose a question which is like this Linear SVM creates a discriminant function but so does LDA. Yet, both are different classifiers. Why ? (Hint: LDA is based on Bayes Theorem while Linear SVM is based on the concept of margin. In case of LDA, one has to make an assumption on the distribution of the data per class. For a newbie, please ignore the question. We will discuss this point in details in some other post.)

This is a pretty math-heavy post, so get your coffee first. h/t R-Bloggers.

Comments closed

SQL Client Aliases

Andrew Pruski explains how to use a lesser-known feature in SQL Server, client aliases:

One of the problems that we ran into when moving to using containers was how to get the applications to connect. Let me explain the situation.

The applications in our production environment use DNS CNAME aliases that reference the production SQL instance’s IP address. In our old QA environment, the applications and SQL instance lived on the same virtual server so the DNS aliases were overwritten by host file entries that would point to 127.0.0.1.

This caused us a problem when moving to containers as the containers were on a separate server listening on a custom tcp port. Port numbers cannot be specified in DNS aliases or host file entries and we couldn’t update the application string (one of the pre-requisites of the project) so we were pretty stuck until we realised that we could use SQL client aliases.

This is definitely a place that you’d want to document changes thoroughly, as my experience is that relatively few DBAs would even think of looking there.

Comments closed

Database File Sizes In Powershell

Rob Sewell has a nice post on checking database file sizes using dbatools in Powershell:

As always, PowerShell uses the permissions of the account running the sessions to connect to the SQL Server unless you provide a separate credential for SQL Authentication. If you need to connect with a different windows account you will need to hold Shift down and right click on the PowerShell icon and click run as a different user.

Lets get the information for a single database. The command has dynamic parameters which populate the database names to save you time and keystrokes

It’s a great post, save for the donut chart…  Anyhow, this is recommended reading.

Comments closed

Azure SQL Database Premium RS

Arun Sirpal describes a new pricing tier for Azure SQL Database:

What Microsoft classifies as IO intensive I am not so sure, personally I have not seen any sort of IOPS figure(s) for what we could expect from each service tier, it’s not like I can just run DiskSpeed and find out. Maybe the underlying storage for Premium RS databases is more geared to work with complex analytical queries, unfortunately I do not have the funds in my Azure account to start playing around with tests for Premium vs. Premium RS (I would love to).

Also and just as important, Premium RS databases run with fewer redundant copies than Premium or Standard databases, so if you get a service failure you may need to recover your database from a backup with up to a 5-minute lag. If you can tolerate 5 minute data loss and you are happy with a reduced number of redundant copies of your database then this is a serious option for you because the price is very different.

It’s a lot less expensive (just under 1/3 the cost of Premium in Arun’s example), so it could be worth checking out.

Comments closed

Splitting A Small Database

Brent Ozar explains why he recommended a client break out a small database:

Listen, I can explain. Really.

We had a client with a 5GB database, and they wanted it to be highly available. The data powered their web site, and that site needed to be up and running in short order even if they lost the server – or an entire data center – or a region of servers.

The first challenge: they didn’t want to pay a lot for this muffler database. They didn’t have a full time DBA, and they only had licensing for a small SQL Server Standard Edition.

Read on for the full explanation.  Given the constraints and expectations, it makes sense, and this is a good example of figuring out how expected future growth can change the bottom line for a DBA.

Comments closed

JSON Dates In SQL Server

Bert Wagner explains how to handle JSON datetime strings in SQL Server:

In SQL Server, datetime2’s format is defined as follows:

YYYY-MM-DD hh:mm:ss[.fractional seconds]

JSON date time strings are defined like:

YYYY-MM-DDTHH:mm:ss.sssZ

Honestly, they look pretty similar. However, there are few key differences:

  • JSON separates the date and time portion of the string with the letter T

  • The Z is optional and indicates that the datetime is in UTC (if the Z is left off, JavaScript defaults to UTC). You can also specify a different timezone by replacing the Z with a + or  along with HH:mm (ie. -05:00 for Eastern Standard Time)

  • The precision of SQL’s datetime2 goes out to 7 decimal places, in JSON and JavaScript it only goes out to 3 places, so truncation may occur.

Read on for a few scripts handling datetime conversions between these types.

Comments closed

Get RTVS To Stop Opening Notepad

Sarah Dutkiewicz figures out how to get R Tools for Visual Studio to stop having R files open in Notepad:

As I have been going through my courses – which use swirl() – I have been looking at how things work, comparing RStudio to RTVS.  One of the things that was maddening for me was going through one of the courses in RTVS and having R files open in Notepad.  Notepad?!?  RStudio wasn’t doing this, so I was even more frustrated.  I could also open R files with Visual Studio right from the file system, so the file association was already in place.  This didn’t make sense.  However… RTVS is an open source project, as is swirl().  So I spent tonight looking at code in GitHub.

Read on for the answer.

Comments closed

Grafana On Elasticsearch

Daniel Berman shows how to replace Kibana with Grafana:

While very similar in terms of what can be done with the data itself within the two tools. The main differences between Kibana and Grafana lie in configuring how the data is displayed. Grafana has richer display features and more options for playing around with how the data is represented in the graphs.

While it takes some time getting accustomed to building graphs in Grafana — especially if you’re coming from Kibana — the data displayed in Grafana dashboards can be read and analyzed more easily.

I prefer Grafana over Kibana for a few reasons, so I’m happy to see Grafana articles popping up.

Comments closed