Press "Enter" to skip to content

Category: Misc Languages

Reviewing the Stack Overflow Developer Survey

Michael Toth looks at the recently-released 2019 Stack Overflow Developer Survey:

Since 2011, Stack Overflow has been surveying their users each year to answer questions about the technologies they use, their work experience, their compensation, and their satisfaction at work. Given Stack Overflow’s place in the broader programming world, they are able to draw quite the audience for their annual surveys.

This year, nearly 90,000 developers participated in the survey! There’s a lot in this survey, and I recommend reviewing it yourself, but I wanted to surface some of the key findings that I thought were particularly relevant to data professionals here.

Stack Overflow says they will be releasing the underlying data for this survey in the coming weeks, so I hope to return to this for a deeper analysis once that’s made available. For now, let’s get into the results!

Michael’s lede involves R versus Python in terms of salaries, but for me, the top line is that functional programmers make more money. Clojure, F#, Scala, Elixir, and Erlang make the top 10 on the global list, including positions 1, 2, 4, and 5. Within the US, Scala, Clojure, Erlang, Kotlin, F#, and Elixir make the top 10, including positions 1, 2, and 4. H/T R-Bloggers

Comments closed

Understanding DNS for Developers

RJ Zaworski explains DNS for web developers:

DNS can use a similar TCP/IP stack, but being parts of a simple system, most DNS operations can also travel the wire on the Internet’s favorite Roulette wheel: the User Datagram Protocol, UDP.

On a good day, UDP is fast, simple, and stripped bare of unnecessary niceties like delivery guarantees and congestion management. But a UDP message may also never be delivered, or it may be delivered twice. It may never get a response, which makes for fun client design–particularly coming from the relatively safe and well-adjusted world of HTTP. With TCP, you get an established connection and all kinds of accommodations when Things Inevitably Go Wrong. UDP? “Best effort” delivery. Which means a packet thrown over the fence with a prayer for a soft landing.

It’s a good read if you’re new to DNS.

Comments closed

Using the Cosmos DB Change Feed

Hasan Savran (who just became a Microsoft MVP, so congrats to him) takes us through the Cosmos DB Change Feed:

Azure Cosmos DB Change Feed exposes Cosmos DB Logs to outside of CosmosDB. CosmosDB notifies you immediately when there is any change in your database. It supports all Inserts and Updates, Delete will be available soon. You can always use soft delete to catch delete events if you need to.

     By knowing what is changed in your database, you can trigger all kind of events and you can make your application work very smart. SQL Server has similar functionality but like many other features Log shipping is usually blocked by DBAs or the company policies. In CosmosDB, you don’t need to do anything to enable Change Feed feature! It’s already enabled, all you need to do is to configure it. Easiest way to catch change feed events is Azure Functions.

When I hear someone describe the change feed, I immediately imagine it as a Kafka topic.

Comments closed

Bring .NET Support to Spark

I have a request that you vote up a Spark issue:

There is a Jira ticket for the Apache Spark project, SPARK-27006. The gist of this ticket is to bring .NET support to Spark, specifically by supporting DataFrames in C# (and hopefully F#). No support for Datasets or RDDs is included in here, but giving .NET developers DataFrame access would make it easy for us to write code which interacts with Spark SQL and a good chunk of the SparkSession object.

You an click through and read everything I have to say, but do go to the Spark ticket and vote for .NET support.

Comments closed

Working with Columns in Spark

Achilleus has a two-parter on working with columns in Spark. Part 1 covers some of the basic syntax and several functions:

Also, we can have typed columns which is basically a column with an expression encoder specified for the expected input and return type.

scala> val name = $"name".as[String]
name: org.apache.spark.sql.TypedColumn[Any,String] = name
scala> val name = $"name"
name: org.apache.spark.sql.ColumnName = name

There are more than 50 methods(67 the last time I counted ) that can be used for transformations on the column object. We will be covering some of the important methods that are generally used.

Part 2 covers other functions including window functions:

17) over
This is one of the most important function that is used in many of the window operations.We can talk about the window function in detail when discuss about aggregation in spark but for now, it will be fair enough to say that over method provides a way to apply an aggregation over a window specification which in turn can be used to specify partition, order and frame boundaries of the aggregation.

Check out both of these posts for useful tidbits.

Comments closed

Selecting a List of Columns from Spark

Unmesha SreeVeni shows us how we can create a list of column names in Scala to pass into a Spark DataFrame’s select function:

Now our example dataframe is ready.
Create a List[String] with column names.
scala> var selectExpr : List[String] = List("Type","Item","Price") selectExpr: List[String] = List(Type, Item, Price)

Now our list of column names is also created.
Lets select these columns from our dataframe.
Use .head and .tail to select the whole values mentioned in the List()

Click through for a demo.

Comments closed

Case Classes In Scala

Shubham Dangare explains what case classes are in Scala:

Case class is scale way to allow pattern matching on an object without requiring a large amount of boilerplate. All you need to do is add a single case keyword modifier to each class that you want to pattern matching using such modifier makes scala compiler add some syntactic conveniences to your class and compiler add companion object(with the apply method)
Adds factory method with the name of the class this means that for instance, you can write StringValue(“X”) to construct a StringValue object instead of using new StringValue(“X”)

Given how useful case classes are in Spark, it’s good to know how they operate. For more background on the topic, Alessandro Lacava has a post from a few years back describing the topic well.

Comments closed

No Type Equivalence In M

Imke Feldmann notes an oddity in types in Power Query:

But this function will not return any matches. I also tried out a (potentially) slower version using Table.SelectColumns(Types, each [Value] = x[Types]) – but still no match. 

What I found particularly frustrating here was, that in some cases, these lookups or filters on type-columns worked.

That behavior seems odd to me. Imke shares a link from Microsoft which explains that the behavior occurs, but the why behind it eludes me.

Comments closed

Using AWS Lambda To Get Into Nice Restaurants

Stephane Maarek gives us the best use of AWS Lambda I’ve seen yet:

One attentive eye would have noticed that the booking platform is not hosted on the restaurant website at http://www.septime-charonne.fr/en/ but instead on https://module.lafourchette.com.

Upon using the Chrome Web Developer Tools to analyze the network calls being made between my browser and the booking service, I stumbled upon an easy to use and completely unprotected REST API:

I love the bonus hack at the end.

Comments closed

Working With WebHDFS From Node.js

Somanth Veettil shows us how to use Node.js to work with the WebHDFS REST API:

There is an npm module, “node-webhdfs,” with a wrapper that allows you to access Hadoop WebHDFS APIs. You can install the node-webhdfs package using npm:
npm install webhdfs 
After the above step, you can write a Node.js program to access this API. Below are a few steps to help you out.

Click through for examples on how the package works.

Comments closed