An Experiment in Document Conversion and Generation

This is the README file for the Github repository that holds the files used and created in this experiment. I’m including the README in its entirety since it kills 2 birds with 1 stone.

1. Introduction

This repo holds a set of files that I created as an experiment in getting old work out of proprietary formats. The idea is to take a MSFT Word file and convert it into something that is human readable, open formatted, and convertible.

To do this is I settled upon AsciiDoc to mark up the text of the paper. I chose AsciiDoc over Markdown because of the depth of features and availability of conversion tools.

2. The Process

I decided to use a local install of Etherpad Lite (EL) as my primary text editor for this project. I did this because of a few features including autosave, versioning, and the potential for real time collaboration. I hoped that these features would provide me with a useful editing tool.

Once EL setup and configured I was faced with the problem of how to get the text of the paper into the editor in the first place. My initial inclination was to retype the document, formating and editing as I went along. Faced with a 10,000 word doc and no appreciable typing skills, I was not happy with this option. After a bit of poking around in EL I found its import features. To get MSFT Word files imported required a bit more configuring, but it worked. I then imported the Word file into EL.

The import process added the text of the document to the editor. It stripped all of the formatting from the text and inserted the 112 footnotes in-line into the text. All of this was actually a good thing, making the process of marking up the doc with AsciiDoc easier. Using the original word processing file as a guide I worked through the document adding the necessary AsciiDoc markup to format the paper. The most tedious part was the 112 footnotes, but since AsciiDoc handles footnote with in-line markup it moved along as fast as could be expected.

In total I spent about 6 hours working on the AsciiDoc version of the document. The most time was spent tagging footnotes and figuring out the format for the bibliography
[I am still not really pleased with the way the biblio looks. I think I can fix though on a later iteration.]
The rest of the formating such as section titles, quotes, emphasis, and lists was straight forward though I did keep a copy of the AsciiDoc User Guide open in another tab to help out.

I found the Etherpad Lite interface easy to work with and really appreciated the autosave and versioning features. EL doesn’t know about AsciiDoc markup though so that presented a challenge. In order to preview the work I had to export the file as text and then do the basic AsciiDoc to HTML, opening the resulting file in another browser tab to see what was going on. As I became more confident of my work, I checked less often so this was not much of an issue. I marked major revisions as saved revisions at the end of section of the document to give me a nice clean revision history.

Once I had a nice clean version that produced good HTML, I exported a final copy to my local computer and set about using the AsciiDoc utility a2x to generate the document in various formats. For this particular experiment I went with XHTML, PDF, and EPUB. The generation/conversion process was marred only by my problems with understanding the format for the bibliography at the end of the document. Once I figure out just how to mark up the bibliography process was flawless. a2x first converts the AsciiDoc marked document into a DocBook XML file and then converts the DocBook file into other formats. The process uses the standard set of XML processing tools as well as CSS to generate the files. By using custom CSS files, the layout and formating of the various output files can be changed as needed.

3. The Files

The files included in this repo are the ones used and generated as part of the process described above.

KELSOFIN20130111.docx The MSFT Word file that was used for the starting point. This document began as a WordPerfect file in 1992 and was moved to Word in the mid-90’s.
KelsoPaper.txt This is the AsciiDoc version of the file as created and edited in Etherpad Lite. This is the file used to generate the other formats.
KelsoPaper.pdf PDF file generated from KelsoPaper.txt using the command a2x -v -f pdf KelsoPaper.txt
KelsoPaper.html XHTML file generated from KelsoPaper.txt using the command a2x -v -f xhtml KelsoPaper.txt
docbook-xsl.css CSS file used to style KelsoPaper.html
KelsoPaper.epub EPUB file generated from KelsoPaper.txt using the command a2x -v -f epub KelsoPaper.txt

4. Conclusion

I am happy with the results of this experiment and hope to be able to further explore the use of Etherpad Lite and AsciiDoc as a tool set for creating free and open documents.

Having a Routine May Make Decisions Easier

Vohs’s experiments tested whether everyday choices — which candy bar to eat or what clothes to buy, for instance — wear down our mental energy. The results? Vohs and colleagues consistently found that making repeated choices depleted the mental energy of their subjects, even if those choices were mundane and relatively pleasant.
So, if you want to be able to have more mental resources throughout the day, you should identify the aspects of your life that you consider mundane — and then “routinize” those aspects as much as possible. In short, make fewer decisions.

via Boring Is Productive – Robert C. Pozen – HBS Faculty – Harvard Business Review.

Fascinating stuff. Now I know why it’s hard to decide to what to have for dinner at the end of a long day and why lots of decisions early in the day result the need for a nap in the afternoon.

MySQL 5.6 Released, NoSQL Features Added

A lot has changed in the database market in the two years since the MySQL 5.5 release. For one, the rise in the popularity of NoSQL databases has escalated in recent years. The NoSQL trend is not one that Oracle is ignoring.

“SQL is a very flexible language that allows you to do a lot of things that are not possible through a direct NoSQL type approach,” Tomas Ulin, vice president of MySQL Engineering at Oracle told InternetNews. “So we’ve tried to join the best of both worlds with the full power of SQL to do complex queries and at the same time we’re introducing a NoSQL access type API.”

Oracle Releases Open Source MySQL 5.6 with NoSQL Features —

The NoSQL API will be available alongside traditional SQL access to the same database giving the admin a powerful option for data access without forgoing the existing code. It will be interesting to try this new feature and see what the community reaction is.