Press "Enter" to skip to content

Parsing HL7 Messages With Python

Cristian Satnic has HL7 formatted messages in SQL Server and wishes to parse them using Python:

Each line in the HL7 message is called a segment and then each segment is split into individual fields by | (pipe) characters (typically). HL7 fields have well-defined names and meanings … for example in the example above PID-3 (the 3rd field in the PID segment where the identifier ‘PID’ is not counted) is 12001 and that represents the patient identifier.

For this particular project I’m working on we have HL7 messages stored in a SQL Server 2016 database table where each row in the table contains the raw HL7 2.x message in a particular column. I need to be able to intelligently filter over this HL7 data by looking at values in particular HL7 fields (as shown above). Since this HL7 data is stored in a varchar(MAX) column I could certainly attempt to play games using LIKE comparisons in SQL but that would not get me very far. SQL simply does not understand the complex structure of HL7 and I have no native SQL Server functions at my disposal that I could quickly use to parse this data and filter it.

Cristian has a Jupyter Notebook which takes us through the solution. With SQL Server 2017, there’s the possibility of solving this in a stored procedure using Machine Learning Services.