Press "Enter" to skip to content

Category: Misc Languages

Finding And Fixing The N+1 Problem With ORMs

Richie Rump explains the N+1 problem with object-relational mappers and shows you how to avoid it with Entity Framework:

The problem is that in our original query we’re not getting data from the LinkedPosts entity, just data from Posts and PostTags. Entity Framework knows that it doesn’t have the data for the LinkPosts entity, so it very kindly gets the data from the database for each row in the query results.

Whoops!

Obviously, making multiple calls to the database instead of one call for the same data is slower. This is a perfect example of RBAR (row by agonizing row) processing.

Read the comments for more answers on top of Richie’s.  My answer (only 70% tongue in cheek)?  Functional programming languages don’t require ORMs.

Comments closed

Analyzing Clickstream Data With Spark

Tony Cruz and Denny Lee analyze advertising data in Spark and predict click counts given certain input features:

Let’s look at a concrete example with the Click-Through Rate Prediction dataset of ad impressions and clicks from the data science website Kaggle.  The goal of this workflow is to create a machine learning model that, given a new ad impression, predicts whether or not there will be a click.

To build our advanced analytics workflow, let’s focus on the three main steps:

  • ETL

  • Data Exploration, for example, using SQL

  • Advanced Analytics / Machine Learning

The Databricks blog has a couple other examples, but this was the most interesting one for me.

Comments closed

The Decorator Pattern

Nancy Jain explains the Decorator pattern:

Decorator design pattern is a structural design pattern.

Structural design patterns focus on Class and Object composition and decorator design pattern is about adding responsibilities to objects dynamically.

Decorator design pattern gives some additional responsibility to our base class.

This pattern is about creating a decorator class that can wrap original class and can provide additional functionality keeping class methods signature intact.

I don’t use the Decorator pattern as often as I probably should, but it can be quite useful.

Comments closed

Why .NET And Java Have StringBuilders

Randolph West walks us through a performance troubleshooting issue with a twist:

So we branch the the code in source control, and start writing a helper class to manage the data for us closer to the application. We throw in a SqlDataAdapter, use the Fill() method to bring back all the rows from the query in one go, and then use a caching layer to keep it in memory in case we need it again. SQL Server’s part in this story has now faded into the background. This narrow table consumes a tiny 8 MB of RAM, and having two or more copies in memory isn’t the end of the world for our testing. So far, so good again.

We run the new code, first stepping through to make sure that it still does what it used to, massaging here and there so that in the end, a grid is populated on the application with the results of the query. Success! We then compile it in Release mode, and run it without any breakpoints to do some performance testing.

And then we find that it runs at exactly the same speed to produce exactly the same report, using our caching and SqlDataAdapter, and we’ve wasted another hour of our time waiting for the grid and report. Where did we go wrong?

As people get better at tuning, we start to make assumptions based on prior experience.  That, on net, is a good thing, but as Randolph shows, those assumptions can still be wrong.

Comments closed

Adding IN Search Functionality To .NET

Jay Robinson shows off a few extension methods he creates to make dealing with C# easier:

Then I could use the extension like this:

if (mySeries.In(Enum.Series.ProMazda, Enum.Series.Usf2000)) myChassis = "Tatuus";

As for the other two methods, well… When is a null not a null? When it’s a System.DBNull.Value, of course! SQL Server pros who have spent any time in the .NET Framework will recognize this awkwardness:

var p = new System.Data.SqlClient.SqlParameter("@myParam", System.Data.SqlDbType.Int);
p.Value = (object)myVar ?? System.DBNull.Value;

With the extension, the second line becomes:

p.Value = mVar.ToDbNull();

I like it that Jay ended up going with a different language than T-SQL.  It’s no F#, but it’ll do.

Comments closed

Extractors In Scala

Jyoti Sachdeva explains what extractors are in Scala and why they’re useful:

An extractor is an object that has an unapply method. It takes an object as an input and gives back arguments. Custom extractors are created using the unapply method. The unapply method is called extractor because it takes an element of the same set and extracts some of its parts,  apply method also called injection acts as a constructor, takes some arguments and yields an element of a given set.

Click through for explanatory examples.

Comments closed

Machine Learning With F#

Diogo Souza gives us an introduction to using Accord.NET in F#:

F# is a scripting as well as a REPL language. REPL comes from Read-Eval-Print Loop, which means that the language processes single steps one at a time like reading the user inputs (usually expressions), evaluating their values and, in the end, returning the result to the same user. All that happens in a loop until the loop ends. Visual Studio provides a great F# Interactive view that runs the scripts in REPL mode and shows the results. Take the following Hello World example:

This code just creates a single variable (let keyword) and assigns a string value to it. When you run this code (select all the code text and press Alt + Enter), you’ll see the following result in the F# Interactive window (Figure 5):

You can also use C# with Accord.NET, but there’s a strong bias toward F# among people in the .NET space who work with ML, for the same reason that there’s a bias toward Scala over Java for Spark developers:  the functional programming paradigm works extremely well with mathematical concepts.  Also, in addition to Accord.NET, you might also want to check out Math.NET.  My experience has been that this package tends to be a bit faster than Accord.

Comments closed

Enabling LDAP Authentication On Cassandra

Kurt Greaves shows off a new LDAP authenticator for Apache Cassandra:

The LDAPAuthenticator is implemented using JNDI, and authentication requests will be made by Cassandra to the LDAP server using the username and password provided by the client. At this time only plain text authentication is supported.

If you configure a service LDAP user in the ldap.properties file, on startup Cassandra will authenticate the service user and create a corresponding role in the system_auth.roles table. This service user will then be used for future authentication requests received from clients. Alternatively (not recommended), if you have anonymous access enabled for your LDAP server, the authenticator allows authentication without a service user configured. The service user will be configured as a superuser role in Cassandra, and you will need to log in as the service user to define permissions for other users once they have authenticated.

The authenticator itself is hosted on GitHub, so you can check out its repo too.

Comments closed

When To Use SQL, DAX, Or M In Power BI Models

Paul Turley offers up some guidance on when to use which language when building Power BI models:

As a general rule of thumb, in formal SSAS projects built on a relational data mart or data warehouse that is managed by the same project team as the BI data model, I typically recommend that every table in the model import data from a corresponding view or UDF stored and managed in the relational database. Keep in mind that is the way we’ve been designing Microsoft BI projects for several years. Performing simple tasks like renaming columns in the SSAS data model designer was slow and cumbersome. Performing this part of the data prep in T-SQL was much easier than in SSDT. With the recent advent of Power Query in SQL Server Data Tools, there is a good argument to be made for managing those transformations but the tool is still new and frankly I’m still testing the water. Again, keep changes in one place for future maintenance.

Do your absolute best to avoid writing complex SQL query logic that cannot be traced back to the sources. Complicated queries can become a black box – and a Pandora’s box if they aren’t documented, annotated and easy to decipher.

But do read Paul’s closing grafs on the importance of not being hidebound.

Comments closed