Press "Enter" to skip to content

Category: Misc Languages

Combining Files In C#

Chris Koester shows how to combine a set of CSVs without duplicating their header rows:

The timings in this post came from combining 8 csv files with 13 columns and a combined total of 9.2 million rows.

I first tried combining the files with the PowerShell technique described here. It was painfully slow and took an hour and a half! This is likely because it is deserializing and then serializing every bit of data in the files, which adds a lot of unnecessary overhead.

Next I tried the C# script below using LINQPad. When reading from and writing to a network share, it took 3 minutes and 56 seconds. Much better! Next I tried it on a local SSD drive and it took just 44 seconds.

Read on for the script itself.  The ReadAllLines method works fine as long as the file isn’t larger than your working memory.

Comments closed

Getting Started With Functional Programming

Jose Gonzalez links to a set of blog posts and videos introducing functional programming:

Since I’ve started to play with (and rave about) functional programming (FP), a lot of people have asked me how to get started.

Instead of writing the same email multiple times, I decided to create a blog post I can refer them to. Also, it’s a central place to put all my notes about the topic.

Here’s a small collection of all the resources I’ve accumulated on my adventure on learning functional programming.

I think the functional paradigm fits relational database development extremely well, better than the object-oriented paradigm.

Comments closed

Reading Extended Events

Dave Mason writes a custom app to read extended events data:

It took me a while to make the transition from SQL Profiler to Extended Events. Eventually I got comfortable enough with it to use it 100% of the time. As I read more about the XEvents architecture (as opposed to just “using” XEvents), I gained a deeper appreciation of just how great the feature is. My only gripe is that there isn’t a way to handle the related events from within SQL Server using T-SQL. DDL triggers can’t be created for XEvents. And they can’t be targeted to Service Broker for Event Notifications (not yet, anyway). For now, the one way I know of to handle an XEvent is by accessing the event_stream target via the .NET framework. I’ll demonstrate with C#.

9/10, would have preferred F# but would read again.

Comments closed

Web App Security

Vishwas Parameshwarappa has an article on securing web applications:

The Cross-site request forgery (CSRF) exploit uses cross-site scripting (mentioned above), browser insecurities, and other techniques to cause a user to unwittingly perform an action within their current authenticated context that allows the attacker to access the user’s account. This type of attack usually occurs when a malicious email, blog, or a message causes a user’s Web browser to perform an unwanted action on a trusted site for which the user is currently authenticated.

This is a nice overview of the most common attack vectors for web applications.

Comments closed

Windows 10 IoT Code To Back Up Databases

Drew Furgiuele writes code to back up your databases using a Raspberry Pi 3 and Windows 10 IoT edition:

The trickiest part of wiring a circuit like this is detecting a button press. Most logic boards don’t know if an input circuit should poll at high or low levels. That’s where pull-ups come in. Above, you can see we set one of the pins for the button to be a pull-up (or an input if we were using another board). That means it will pull the current and look for impedance. The other important thing is our debounce. With circuits, one button press can actually turn into lots because as soon as the switch completes (or interrupts) the circuit, it starts sending signals. A debounce is like a referee saying “only look for a signal for this long” and it will filter out extra “presses” based on current that might linger on a press.

Once we detect our button press, we’re calling the function below. All it does is read the current LED pin values, and looks to see which one is currently lit, and then lights the next one.

Go from understanding general purpose input/output pins to calling SMO via a web service all in one post.  If you’ve got an itch for a weekend project, have at it.

Comments closed

Ring Buffers

Juho Snellman explains ring buffers:

This is of course not a new invention. The earliest instance I could find with a bit of searching was from 2004, with Andrew Morton mentioning in it a code review so casually that it seems to have been a well established trick. But the vast majority of implementations I looked at do not do this.

So here’s the question: Why do people use the version that’s inferior and more complicated? I’ve must have written a dozen ring buffers over the years, and before being forced to really think about it, I’d always just used the first definition. I can understand why a textbook wouldn’t take advantage of unsigned integer wraparound. But it seems like it should be exactly the kind of cleverness that hackers would relish using and passing on.

Check out the comments for more information, a bit of code golf, and multiple links on tying shoelaces.

Comments closed

Querying Genomic Data With Athena

Aaron Friedman explains how to use Amazon Athena to query S3 files:

Recently, we launched Amazon Athena as an interactive query service to analyze data on Amazon S3. With Amazon Athena there are no clusters to manage and tune, no infrastructure to setup or manage, and customers pay only for the queries they run. Athena is able to query many file types straight from S3. This flexibility gives you the ability to interact easily with your datasets, whether they are in a raw text format (CSV/JSON) or specialized formats (e.g. Parquet). By being able to flexibly query different types of data sources, researchers can more rapidly progress through the data exploration phase for discovery. Additionally, researchers don’t have to know nuances of managing and running a big data system. This makes Athena an excellent complement to data warehousing on Amazon Redshift and big data analytics on Amazon EMR 

In this post, I discuss how to prepare genomic data for analysis with Amazon Athena as well as demonstrating how Athena is well-adapted to address common genomics query paradigms.  I use the Thousand Genomes dataset hosted on Amazon S3, a seminal genomics study, to demonstrate these approaches. All code that is used as part of this post is available in our GitHub repository.

This feels a lot like a data lake PaaS process where they’re spinning up a Hadoop cluster in the background, but one which you won’t need to manage. Cf. Azure Data Lake Analytics.

Comments closed

Debugging LINQ

Michael Sorens has an article on using OzCode to debug LINQ statements:

OzCode’s new LINQ debugging capability is tremendous, no doubt about it. But it is not a panacea; it is still constrained by Visual Studio’s own modeling capability. As a case in point, Figure 17 shows another example from my earlier article. This code comes from an open-source application I wrote called HostSwitcher. In a nutshell, HostSwitcher lets you re-route entries in your hosts file with a single click on the context menu attached to the icon in the system tray. I discussed the LINQ debugging aspects of this code in the same article I mentioned previously, LINQ Secrets Revealed: Chaining and Debugging, but if you want a full understanding of the entire HostSwitcher application, see my other article that discusses it at length, Creating Tray Applications in .NET: A Practical Guide.

This is quite interesting.  My big problem with LINQ in the past was that Visual Studio’s debugger treated a LINQ statement as a black box, so if you got anything wrong inside a long chain of commands, good luck figuring it out.  This lowers that barrier a bit, and once you get really comfortable with LINQ, it’s time to give F# a try.

Comments closed

The Central Limit Theorem

Vincent Granville explains the central limit theorem:

The theorem discussed here is the central limit theorem. It states that if you average a large number of well behaved observations or errors, eventually, once normalized appropriately, it has a standard normal distribution. Despite the fact that we are dealing here with a more advanced and exciting version of this theorem (discussing the Liapounov condition), this article is very applied, and can be understood by high school students.

In short, we are dealing here with a not-so-well-behaved framework, and we show that even in that case, the limiting distribution of the “average” can be normal (Gaussian.). More precisely, we show when it is and when it is not normal, based on simulations and non-standard (but easy to understand) statistical tests.

Read on for more details.

Comments closed