Weights In Graphs

Angshuman Talukdar shows how to use neo4j to solve minimum weighted distance problems:

A sample dataset is created in Neo4j using the CREATE clause in Cypher as given in Query 1 (create clause in Cypher). This loads the data into Neo4j and generates the graph database as shown in Figure 2.

Neo4j has a lot of graph algorithms shipped with it as a package and those are accessible only from the JAVA API. Implementing some of these algorithms in Cypher is quite complex and time consuming. From Neo4j 3.x, the concept of user defined procedures had been introduced called APOC (Awesome Procedures On Cypher). Those are custom implementations of certain functionality, that can’t be (easily) expressed in Cypher itself. The APOC library consists of many (about 300) procedures to help with many different tasks in areas like data integration, graph algorithms or data conversion.

Graph databases aren’t common, but they can be very useful for certain questions like the one Angshuman solves.

Converting To Local Time In M

Chris Webb shows how to convert a datetime from UTC to your local time zone using M:

Here’s a brief explanation of what the query does:

  • First it reads the times from the Excel table and sets the Time column to be datetime data type

  • It then creates a new column called UTC and then takes the values in the Time column and converts them to datetimezone values, using the DateTime.AddZone() function to add a time zone offset of 0 hours, making them UTC times

  • Finally it creates a column called Local and converts the UTC times to my PC’s local time zone using the DateTimeZone.ToLocal() function

There are some limitations to what it does, so you can’t convert to just any time zone while still retaining Daylight Savings Time awareness.

Concurrency In Scala

Matthew Rathbone shows different concurrency options available in Scala:

Scala is a functional programming language that aims to avoid side effects by encouraging you to use immutable variables (called ‘values’), and data structures.

So by default in Scala when you build a list, array, string, or other object, that object is immutable and cannot be changed or updated.

This might seem unrelated, but think about a thread which has been given a list of strings to process, perhaps each string is a website that needs crawling.

In the Java model, this list might be updated by other threads at the same time (adding / removing websites), so you need to make sure you either have a thread-safe list, or you safeguard access to it with the protected keyword or a Mutex.

By default in Scala this list is immutable, so you can be sure that the list cannot be modified by other threads, because it cannot be modified at all.

While this does force you to program in different ways to work around the immutability, it does have the tremendous effect of simplifying thread-safety concerns. The value of this cannot be understated, it’s a huge burden to worry about thread safety all the time, but in Scala much of that burden goes away.

Read the whole thing if you’re looking at writing Spark applications in Scala.  If you’re thinking about functional programming in .NET languages, F# is  there for you.

Scala + Hadoop + HDInsight

Emmanouil Gkatziouras shows that you can run a Hadoop job written in Scala on Azure’s HDInsight:

Previously, we set up a Scala application in order to execute a simple word count on Hadoop.

What comes next is uploading our application to HDInsight. So, we shall proceed in creating a Hadoop cluster on HDInsight.

Read the whole thing, but the upshot is that Scala apps build jar files just like Java would, so there’s nothing special about running them.

Poor Man’s More

Chris Koester has a quick-and-easy file reader in a few lines of C#:

This post describes one way that you can read the top N rows from large text files with C#. This is very useful when working with giant files that are too big to open, but you need to view a portion of them to determine the schema, data types, etc.

I’ve used PowerShell many times to do this with large csv files, but in this example we’re going to use C# and look at the Wikipedia XML dump of pages and articles. The 3017-03-01 dump is very large and comes in at 59.5 GB.

I’ve had to write something similar before on Windows machines where I didn’t have access to more/less.  It’s really helpful for perusing the first few lines of gigantic log files.

Graph Database Basics

Victoria Holt has some good resources on learning more about graph databases:

There is graph support in the next version of SQL Server. The private preview page states

SQL Graph adds graph processing capabilities to SQL Server, which will help you link different pieces of connected data to help gather powerful insights and increase operational agility. Graphs are well suited for applications where relationships are important, such as fraud detection, risk management, social networks, recommendation engines, predictive analysis, dependence analysis, IoT suites, etc.
Initially, SQL Server will support CRUD graph operations and multi-hop graph navigation, and the following functionality will be available in the private preview:

  • Create graph objects, that is, nodes to represent entities and edges to represent relationships between any 2 given nodes. Both Nodes and Edges can have properties associated to them.
  • SQL language extensions to support join free, pattern matching queries for multi-hop navigation

Kennie Pontoppipidan wrote a great blog post on where to find out more information.

Click through for more links to interesting resources.

Easy Code Refactoring Opportunities

Bert Wagner shows a few easy things you can do to clean up C# code:

1. Clean up formatting

The overall format of your code is what makes it possible to quickly navigate to areas of interest. Consistent indentation, line breaks, and patterns help programmers skim large chunks of code. Take the following sloppily formatted code for example:

Read on for the rest.  This has analogues in every language:  the goal is to create simple, concise, easily scannable, and human-readable code which also correctly solves the relevant business problem.

Garbage Collection In Hadoop

Ranjan Banerjee explains how the Java garbage collector works, using Hadoop as an example:

The reason why we all love Java is due to the fact that we can be careless with memory creations and the work of cleaning the mess is performed by the JVM. On a high level, Java heap memory is classified into two phases:

1) Young (eden) space

2)Old space

The eden space is where newly created objects goto. There are various algorithms for garbage collection, but all of them try to first free memory from the young space and for those long lasting memory objects, they are transferred to the old space.

One common issue that can be noticed in running Map Reduce Applications are GC overhead limit exceeded.

Read on for more, including where you can find GC logs.

Contributing To Open Source

Drew Furgiuele explains the process of contributing to an open source project, specifically dbatools:

Step 2: Check out the Github project page what’s in development.

Next, you should visit the project issues page. Here, you’ll find a list of all the features requested, in development and completed on the project. You can also filter the pages to look at current bugs or requested enhancements. Once you see what’s what, if you think of something you want to work on or help with, make a note of it. You should also look at examples of things in development and things that have been completed so you get an idea of the creative and technical process that goes into the project.

Step 3: Speak up!

Head on back to the Slack channel and let everyone know you want to help out. Someone (probably Chrissy) will add your Github account to to the project as a contributor so you can have things assigned to you. Congrats, you’re now on the hook!

I’m happy that the dbatools community has sprung up and hope it’s a gateway to further open source development in the SQL Server community.

Threading And Learning

Jay Robinson has a two-pronged tale:

This reminds me of an old saying: If you’re the smartest person in the room, then you’re in the wrong room.*

Now, this is not a commentary on my current team. I work with some really smart people, and I’m very grateful for that. But while my teammate may be one of the best PHP or Node.js coders I know, that doesn’t necessarily translate to an expertise with the .NET Framework. The true test is this – no matter how smart they are, if they’re not catching my mistakes, then I’m not being held accountable.

There is some good advice here on threading (yes, definitely use the newer threading libraries), but also good advice on surrounding yourself with intelligent people who can catch your mistakes.

Categories

September 2017
MTWTFSS
« Aug  
 123
45678910
11121314151617
18192021222324
252627282930