To put the graph database to the test, I took bunch of emails from a particular MVP SQL Server distribution list (content will not be shown and all the names will be anonymized). On my gmail account, I have downloaded some 90MiB of emails in mbox file format. With some python scripting, only FROM and SUBJECTS were extracted:writer.writerow(['from','subject']) for index, message in enumerate(mailbox.mbox(infile)): content = get_content(message) row = [ message['from'].strip('>').split('<')[-1], decode_header(message['subject']),"|" ] writer.writerow(row)
This post walks you through loading data, mostly. But at the end, you can see how easy it is to find who replied to whose e-mails.