Press "Enter" to skip to content

Author: Kevin Feasel

Parsing HL7 Messages With Python

Cristian Satnic has HL7 formatted messages in SQL Server and wishes to parse them using Python:

Each line in the HL7 message is called a segment and then each segment is split into individual fields by | (pipe) characters (typically). HL7 fields have well-defined names and meanings … for example in the example above PID-3 (the 3rd field in the PID segment where the identifier ‘PID’ is not counted) is 12001 and that represents the patient identifier.

For this particular project I’m working on we have HL7 messages stored in a SQL Server 2016 database table where each row in the table contains the raw HL7 2.x message in a particular column. I need to be able to intelligently filter over this HL7 data by looking at values in particular HL7 fields (as shown above). Since this HL7 data is stored in a varchar(MAX) column I could certainly attempt to play games using LIKE comparisons in SQL but that would not get me very far. SQL simply does not understand the complex structure of HL7 and I have no native SQL Server functions at my disposal that I could quickly use to parse this data and filter it.

Cristian has a Jupyter Notebook which takes us through the solution. With SQL Server 2017, there’s the possibility of solving this in a stored procedure using Machine Learning Services.

Comments closed

Testing Cosmos DB’s REST API

Hasan Savran shows how we can test Cosmos DB’s REST API using Postman:

        You have many options to access to CosmosDB. Rest API is one of these options and it is the low level access way to Cosmos DB. You can customize all options of CosmosDB by using REST API. To customize the calls, and pass the required authorization information, you need to use http headers. There are many headers you can set depending on the operation you want to run in CosmosDB.  I am going to cover only the required headers here.

      In the following example, I am going to try to create a database in CosmosDB emulator by using the REST API. First let’s look at the required header fields for this request. These requirement applies to all other REST API calls too.

It’s a little more complicated than just posting to a URL and Hasan has you covered.

Comments closed

Shared Database Privacy

Duncan Greaves has some thoughts about safeguarding privacy in shared databases:

The difficulty with privacy (or more correctly, information confidentiality) in database terms is that databases are supposed to maintain huge amounts of information, and processing and recording data is difficult, if not impossible without them. Public bodies especially, have difficulty in defining and maintaining the boundaries of information disclosure that they should provide, whilst maintaining the utility of the information for the improvement of welfare and services.
  Privacy is contingent on first having a correctly secured database. Additional privacy controls are required when sensitive data leaves the protected trust perimeter of the database to be utilised by third parties.

Click through for more detail.

Comments closed

A Central Repository for Query Store

Tracy Boggiano shares work on centralizing Query Store results across a number of databases:

I’ve worked for SaaS companies for the last 6 years or so.  So our queries are largely the same across our system and by default Query Store is per database.  So it would be handy to have a central repository to help you determine which queries across your whole server are your worse performing queries.  Hence comes my idea to build a central repository.  I believe I put in connect item before it got moved to the new platform for this but never put a new ticket.  So this is the beginning of building something along those lines.  So it will be a work in progress so to speak.  My current company I care about queries that are taking a long time to run.  So I’m going to store the top 50 queries in total duration into a database handily called DBA because that’s where I store all the DBA stuff.  To do this, I have some none client related databases I don’t care about so I create a table to tell which databases to collect the data from.  Then a table to put the information into and job to run every day at midnight and sum up the data.  Now the data is stored in UTC time so the data will be off by whatever timezone difference you are in but with most people being 24×7 shops as SaaS companies that shouldn’t matter and if it does you can edit the query.

This helps to resolve a necessary pain point in Query Store: all of that data is per-database, so if you have a federated system with a large number of equivalent databases, getting cross-system stats is painful.

Comments closed

The Anatomy of a Pester Test

Shane O’Neill takes us through using Pester to test self-contained scripts:

Where things differ…
…could be when you try to accommodate different people and create a .ps1 file that both defines and calls a function. Self Contained scripts, if you would call them that.
Normally the reason that I’ve heard from this is you’re trying to help a non-technical minded person and they just want a file that they can open, hit “run”, and everything is done for them.
Have you ever tried to Pester test those files though? It’s not recommended, especially if your function removes or modifies objects.

Click through for a solution and read Shane’s update as well for a scenario where it doesn’t quite work as hoped.

Comments closed

Generating Reference Numbers With Sequences

Matthew McGiffen shares one technique to generate reference numbers using a sequence and the FORMAT function:

One thing to note is that, while the sequence will generally produce unique number, it is still worth enforcing that in your table definition with a unique constraint i.e.

ALTER TABLE dbo.Orders ADD CONSTRAINT UQ_Orders_OrderReference UNIQUE(OrderReference);

This prevents someone from issuing an UPDATE command that might create a duplicate reference. 

As long as you can live with the occasional gap in your reference number, sequences are a good solution to the problem.

Comments closed

Economic Articles With Data Included

Sebastian Kranz has a Shiny app to help you find economic papers with included data:

One gets some information about the size of the data files and the used code files. I also tried to find and extract a README file from each supplement. Most README files explain whether all results can be replicated with the provided data sets or whether some results require confidential or proprietary data sets. The link allows you to look at the README without the need to download the whole data set.

The main idea is that such a search function could be helpful for teaching economics and data science. For example, my students can use the app to find an interesting topic for a Bachelor or Master Thesis in form of an interactive analysis with RTutor. You could also generate a topic list for a seminar, in which students shall replicate some key findings of a resarch article.

I like this idea, particularly because it promotes the notion that if you’re going to write a paper based on a data set, you ought to provide the data set. There are too many cases of typos or accidental miscodings which take an interesting result and render it mundane (or sometimes even the exact opposite of what the paper reads). H/T R-Bloggers

Comments closed

Giving A Name To The R Pipe

John Mount noodles an idea from Hadley Wickham:

I’d say this fails on at least two counts, the first “%then%” doesn’t seem grammatical (as d is a noun), and magrittr pipes can’t be associated with a new name (as they are implemented by looking for theirselves by name in captured unevaluated code).

However, the wrapr dot arrow pipe can take on new names.

Let’s try a variation, using a traditional pronunciation: “to”.

I don’t like “then” very much. I definitely prefer the C# lambda pronunciation of “goes to” for this.

Click through for John’s thoughts on right assignment as well, something I almost categorically dislike.

Comments closed

The Zen Of Airflow

Bas Harenslak shows how you can think of The Zen of Python as it applies to Apache Airflow:

Apache Airflow is a Python framework for programmatically creating workflows in DAGs, e.g. ETL processes, generating reports, and retraining models on a daily basis. This allows for concise and flexible scripts but can also be the downside of Airflow; since it’s Python code there are infinite ways to define your pipelines. The Zen of Python is a list of 19 Python design principles and in this blog post I point out some of these principles on four Airflow examples. This blog was written with Airflow 1.10.2.

My favorite of the Zen of Python principles is a combination of two: “simple is better than complex; complex is better than complicated.” That’s something I don’t always get right, but it is critical for a stable architecture.

Comments closed

The Steps Of A Database Deployment

I have a series on near-zero downtime. In this post, I cover some of the key assumptions in the series as well as the steps in a database deployment:

Database Release
The database release phase is the first “primary” phase. It usually starts on a schedule, maybe 2 PM on a Wednesday or maybe “every day at 9 AM, 1 PM, 6 PM, and 10 PM” for more mature shops. Depending upon how much of an effect our release process normally has on end users, we might alert them that we expect to see a degradation in services starting at this point.

This phase of the release has us push out our database changes. This can involve creating or altering database objects but will not involve dropping existing objects.

Our database changes should support the blue-green deployment model. At this point in the process, all of the application code is “blue”—that is, the current production code. Our procedure changes need to be able to support that code without breaking. If we need to drop a column from a stored procedure, for example, we would not want to do it here. If we need to add a column to a stored procedure, we might do it here as long as it doesn’t break the calling code.

This is two topics smashed together into one post, but gives you an idea of a mental model around database deployments.

Comments closed