Press "Enter" to skip to content

Author: Kevin Feasel

Weights In Graphs

Angshuman Talukdar shows how to use neo4j to solve minimum weighted distance problems:

A sample dataset is created in Neo4j using the CREATE clause in Cypher as given in Query 1 (create clause in Cypher). This loads the data into Neo4j and generates the graph database as shown in Figure 2.

Neo4j has a lot of graph algorithms shipped with it as a package and those are accessible only from the JAVA API. Implementing some of these algorithms in Cypher is quite complex and time consuming. From Neo4j 3.x, the concept of user defined procedures had been introduced called APOC (Awesome Procedures On Cypher). Those are custom implementations of certain functionality, that can’t be (easily) expressed in Cypher itself. The APOC library consists of many (about 300) procedures to help with many different tasks in areas like data integration, graph algorithms or data conversion.

Graph databases aren’t common, but they can be very useful for certain questions like the one Angshuman solves.

Comments closed

R Plots In Power BI

Leila Etaati has a three-part series on displaying R visuals in Power BI.  Part 1 shows how to create a scatter plot:

so in the above picture we can see that we have 3 different fields that has been shown in the chart :highway and city speed in y and x axis. while the car’s cylinder varibale has been shown as different cycle size. However may be you need a bigger cycle to differentiate cylinder with 8 to 4 so we able to do that with add another layer by adding a function name

Part 2 shows how to use facet_grid to show multiple plots in one visual:

now I want to add other layer to this chart. by adding year and car drive option to the chart. To do that first choose year and drv  from data field in power BI. As I have mentioned before, now the dataset variable will  hold data about speed in city, speed in highway, number of cylinder, years of cars and type of drive.

I am going to use another function in the ggplot packages name “facet_grid” that helps me to show the different facet in my scatter chart. in this function, year and drv (driver) will be shown against each other.

Part 3 shows how to place charts on a map in R:

Now I have to merg the data to get the location information from “sPDF” into “ddf”. To do that I am going to use” merge” function. As you can see in below code, first argument is our first dataset “ddf” and the second one is the data on Lat and Lon of location (sPDF). the third and forth columns show the main variables for joining these two dataset as “ddf” (x) is “country” and in the second one “sPDF”  is “Admin”. the result will be stored in “df” dataset

Aside from my strong dislike of bar/pie charts on maps, this is good to know, particularly if there is not a built-in or customer Power BI visual to replicate something you can do in R.

Comments closed

Azure Data Lake Store Best Practices

Ust Oldfield provides recommendations on how to size and lay out files in Azure Data Lake Store:

The format of the file has a huge implication for the storage and parallelisation. Splittable formats – files which are row oriented, such as CSV – are parallelizable as data does not span extents. Non-splittable formats, however, – files what are not row oriented and data is often delivered in blocks, such as XML or JSON – cannot be parallelized as data spans extents and can only be processed by a single vertex.

In addition to the storage of unstructured data, Azure Data Lake Store also stores structured data in the form of row-oriented, distributed clustered index storage, which can also be partitioned. The data itself is held within the “Catalog” folder of the data lake store, but the metadata is contained in the data lake analytics. For many, working with the structured data in the data lake is very similar to working with SQL databases.

This is the type of thing that you can easily forget about, but it makes a huge difference down the line.

Comments closed

Inline Outsourcing

Shane O’Neill coins a term:

There’s never enough hours in the day for everything I need to do!

How many times have we heard a complaint similar to that? Especially now-a-days when DBAs are tasked to look after more and more servers and instances. I cannot remember the last time I heard of a DBA taking care of servers in the single digits.

The work of the DBA keeps increasing but the amount of time that we have remains the same. How do we combat this? How do we make it so we are not sprinting just to keep up?

The only answer I have to this problem is this.

Don’t try to re-invent the wheel…let someone else do it.

It’s an interesting riff on the T-SQL Tuesday theme this month.

Comments closed

Azure SQL Automation

Arun Sirpal thinks about Azure automation in the context of how the job market is changing:

This ultimately maps to Query ID 297 where if you click the bar you can see the actual code.

Now, a debate occurred. This code was pretty awful, implicit conversions, GUIDs as cluster keys etc. I took the above code and analysed the execution plan and made some recommendations. I was quickly shut down; I was told to bump up the DTU of the database! Talk about masking the issue with hardware.

Check it out.

Comments closed

Backups Causing Transaction Log Growth In Simple Mode

Andy Mallon explains why the transaction log will grow during a backup even if you’re in simple recovery mode:

When SQL Server begins backing up data pages, it also starts keeping track of transactions, via the transaction log. After it has backed up the last data page, it then also backs up all of the transactions that occurred during the data backup. Upon restore, it will then roll those transactions forward or backward, as necessary, to ensure a consistent image is restored.

In our librarian metaphor, she would keep an activity log, which would include the changes to books A and D from the first update, then also the changes to D, X, Y, and Z from the second update. She would not “fix” the data within the backup, but simply store those update details along with her mashed-up copy. In the unlikely event she had to recreate the books (ie, a restore), then she would go back and spend the effort to piece it back together. During that restore process, she would look at the first transaction and see that her copy of Book A in her backup was too old, but Book D already had the update, and she would roll forward the update to Book A. Next, she would process the second update and see that Books X, Y, and Z had the updates, but D still needed this second update, and she would roll forward that second update to Book D. At this point, she would have successfully reconstructed an image that is consistent to the time the backup completed.

Great metaphor to describe consistency during backups.

Comments closed

Logging R Scripts

Tomaz Kastrun shows the places where you might be able to track R scripts running on your system:

Extensibility Log will store information about the session but it will not store the R or R environment information or data, just session information and data. Navigate to:

C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\LOG\ExtensibilityLog

to check the content and to see, if there is anything useful for your needs.

It’s not a great answer today.

Comments closed

Linchpins

Bert Wagner on the ongoing “what happens to my tech job?” question:

Seth Godin discusses the concept of a Linchpin in his same-titled book. A Linchpin is someone who is so good at what they do that they become indispensable to their organization. Linchpins are the kind of people who are self-motivated and are able to consistently deliver quality work. They are integral to the operation of a business, even if they don’t get all of the glamour of having VP or Director in their title.

And why are Linchpins always guaranteed jobs? In one scenario, Linchpins will outgrow their role and be promoted or find a better job. They are always learning and growing in addition to delivering, and so this is the natural procession. In the alternate scenario, if the Linchpin has to lose his or her current job (ie. think company buyouts where entire departments close), they will either 1) become promoted to elsewhere in the company because management recognizes their great skills or 2) they will have no problem finding work elsewhere, especially with great recommendations from their former employer.

It’s an interesting read.

Comments closed

Exporting SQL Server User Permissions

Rob Sewell shows how to use the Export-SqlUser cmdlet in dbatools:

There are times when DBA’s are required to export database user permissions to a file. This may be for a number of reasons. Maybe for DR purposes, for auditing, for transfer to another database or instance. Sometimes we need to create a new user with the same permissions as another user or perhaps nearly the same permissions. I was having a conversation with my good friend and MVP Cláudio Silva and we were talking about how Export-SqlUser from dbatools could help in these situations and he suggested that I blogged about it so here it is.

The dbatools module (for those that don’t know) is a PowerShell module written by amazing folks in the community designed to make administrating your SQL Server significantly easier using PowerShell. The instructions for installing it are available here It comprises of 182 separate commands at present

Cláudio wrote Export-SqlUser to solve a problem.

Read on to see what this does and how to use it.

Comments closed

Probabilistic Record Linking In Spark

Tom Lous builds a solution to link similar companies together by address:

Recently a colleague asked me to help her with a data problem, that seemed very straightforward at a glance.
She had purchased a small set of data from the chamber of commerce (Kamer van Koophandel: KvK) that contained roughly 50k small sized companies (5–20FTE), which can be hard to find online.
She noticed that many of those companies share the same address, which makes sense, because a lot of those companies tend to cluster in business complexes.

Read on for the solution.  Like many data problems, it turns out to be a lot more complicated than you’d think at first glance.

Comments closed