How to build the tesseract OCR engine on Windows/Cygwin

August 29, 2012 - Markus Hoenicka
Optical Character Recognition (OCR) comes in handy if you need to edit text which is not available in an editable format, e.g. scans of dead-tree documents. I needed an OCR solution which works both on my Unix boxes at home and on my Windows box at work. Tesseract ships a Windows version although it is a command-line tool. It seemed a better idea to build tesseract on Cygwin, and some success stories of previous versions on the web encouraged me to go ahead and try.

Read more »

FreeBSD: how PNG killed my touchpad

August 02, 2012 - Markus Hoenicka
I wanted to install R, a scientific software package for statistical analysis and data visualization, on my FreeBSD box real quick. R can output its graphs to PNG images, among others. libpng was bumped recently to a new revision using a new interface. I didn't expect this to be a great hassle, but I had to find out the hard way that just about every installed port depends on PNG support. As R pulled in the new version of libpng, I had to update all packages that depend on this library. Rebuilding these ports from the sources takes a day or two. This is not unusual. But in my case it had some interesting side effects.

Read more »

Import spreadsheet data into DocBook

March 19, 2011 - Markus Hoenicka
DocBook is a versatile XML vocabulary for technical documents, including grant proposals and theses in life sciences. Typing tabular data into an XML editor like Emacs + nxml is pretty cumbersome as there is a bad signal-to-noise ratio in terms of markup vs. data. In contrast, typing the same data into a spreadsheet like OpenOffice Calc is painless. Also, I'm used to keep my data in spreadsheets anyway, so I had to find a way to import these into DocBook files. Apparently there is no such tool on the market, so I dusted off my Perl skills and wrote a little script for this purpose.

Being a lazy person, I didn't build this script on Text::CSV but I rather used a lightweight approach to parse the input data. Best results are obtained if you save your spreadsheet data using tabs as column separators. As long as you don't insist on using tabs in your data as well, things will turn out fine.

See here for some more information including the download link. Maybe you find this tool as useful as I do while typing my "habil" thesis.

How to configure FreeBSD on an Acer TravelMate 8371

November 21, 2010 - Markus Hoenicka
In a previous post I've explained how to install FreeBSD 8.1-RELEASE on an Acer TravelMate 8371 and still be able to boot it. Now I'd like to share the details on the configuration which makes the OS usable in the first place. This is going to be pretty technical and a bit lengthy, and I'll make sure to include and explain all config files that I've created or edited. The reason why I'm elaborating on this stuff here is that The FreeBSD Laptop Compatibility List wouldn't let me create an entry.

Read more »

How to install FreeBSD on an Acer TravelMate 8371 ... and not brick it

November 18, 2010 - Markus Hoenicka
After about 5 years of almost daily use my laptop more or less fell apart. The built-in speakers have been mute for years, the battery plug was worn out, the battery itself barely lasted for 20 min when I unplugged the box, and the backlight no longer allowed to use any setting other than the lowest, making the box a nice companion for dark nights. This finally convinced me that it is about time to get a new one.

After a couple of weekends spent on surveying the market, I purchased an Acer TravelMate 8371 that seemed to suit my needs. Needless to say, it ships with an OS from Redmond that I simply don't care for. When I tried to install a far better replacement (aka FreeBSD), I almost bricked the box. Read on to see how to avoid this.

Read more »

How to run NTEmacs and Cygwin Emacs on the same box

March 24, 2010 - Markus Hoenicka
I've decided to switch to Cygwin Emacs from NTEmacs lately. As I had previously seen that the Cygwin X server may refuse to work at times, I wasn't ready to deinstall NTEmacs altogether (the X problems were certainly caused by my lack of understanding, or lack of maintenance, or both. Cygwin X in general is said to run smoothly). I rather figured it should be possible to run both Emacsen in parallel, without duplicating all efforts which go into maintaining your hand-crafted .emacs file. So I tried to come up with a way to share my existing .emacs with both versions. The major problem is that NTEmacs requires quite a lot of tweaks to make it cooperate with Cygwin bash, which is a far superior shell compared to Windows "cmd", whereas Cygwin Emacs requires at least as many tweaks to make it cooperate with native Windows tools like web browser or proprietary Windows programs. This is how I solved the problem.

Read more »

Speed comparison of NTEmacs and Cygwin Emacs

March 24, 2010 - Markus Hoenicka
I may have mentioned previously that I'm using a Windows XP box as a Netware client at work. As I prefer the Unix way of doing things otherwise, I've been installing lots of Unix software on this box. The two main components are Cygwin, which is essentially a Unix-like environment including all essential GNU tools, and the native GNU Emacs port. The latter is built with MinGW, the "Minimalist GNU for Windows" tools. The native port has been around for years, I must have been using it for more than a decade now. However, Cygwin has also provided an Emacs port for a while. This one either runs in your terminal (like MinTTY), or as a GUI app if you use Cygwin's X server. A speed comparison made me think twice about my previous choice.

Read more »

Emacs Photo Database now available

May 14, 2008 - Markus Hoenicka
This is to announce that Emacs Photo Database is now available as a SourceForge-hosted project. As outlined previously, this is a database holding film, negative, and print information for photographers who use real darkrooms instead of digital cameras and a "lightroom".

Photo database front-end, 2nd try

April 23, 2008 - Markus Hoenicka
I've reported previously that my attempts to create a simple photo database to manage my negatives and prints failed miserably with OpenOffice Base. I thought that installing a web frontend for SQLite might simplify adding and retrieving the data to a SQL database, but the results were not entirely to my taste. The tool simplified the database creation somewhat, and it is fairly easy to check the rows in the tables. However, there are two limitations:

Read more »

Installing SQLiteManager 1.2.0 on FreeBSD

April 16, 2008 - Markus Hoenicka
I've been into b/w photography for at least 15 years now, and I've got a serious collection of negatives and prints. A rule of thumb says: the larger the format, the fewer images. As I'm using my 4x5 as much as possible, there's only a handful of negatives to add each year. Still, I'd like to keep track of the negatives and the prints in an easier way than I used to do so far: I scribbled the data onto legal pads and waded through them if I needed to look up something. This calls for some sort of database.

Read more »

Operating system archaeology

February 25, 2007 - Markus Hoenicka
I received a computer the other day which I was supposed to set up for my kids, mostly for educational software (they're not yet in the ego-shooter age, fortunately). It was a Celeron 366MHz with a 6.8GB hard drive and 64MB of RAM. This was apparently a decent computer at the beginning of this millennium.

I was told not to use some Linux (which would probably run decently on that old box) but to reinstall Win98, as most of the educational software out there (mostly on CD-ROM) requires a Windows box or a Mac. As I wouldn't have to work on that box anyway, I agreed.

Read more »

Trying to solve the *ONE* XML PITA

February 20, 2007 - Markus Hoenicka
My favourite gripe with XML is that you can't reasonably print it. Laugh at me if you want: the SGML toolchain (Emacs, PSGML, openjade) is still my favourite way of publishing DocBook documents for a simple reason: it delivers word processor output. Not that I like word processors, but most publishers in the biomedical field expect submitted manuscripts in M$ Word format - offering them a FOP-formatted PDF document won't do me any good. openjade generates RTF (among other formats) for direct consumption, or you use OpenOffice to convert it to M$ Word .doc files. There used to be tools that generate RTF from XML, but either they were non-free, or they didn't work. Today I gave a different approach a try: convert DocBook to OpenDocument Format. To make things a little harder, I attempted to do this on my office computer, which much to my regret runs Windows XP. As mentioned previously, it runs the Cygwin tools to maintain a certain level of sanity.

Read more »

How to use subversion revision numbers in autotools-based projects

November 26, 2006 - Markus Hoenicka
When developers try to track down bugs, they'd better know what version precisely runs on the user's box. Back in the days of yore, projects used CVS to record the version numbers of the source files. On the up side, the cvs command-line tool would do string replacements in the source files which were commonly used to record the CVS version of each file. Therefore it was simple to design a function that returns the string that cvs inserted into the file. On the down side, this version number reflects only the version of the particular file that contained the function. But as we all know, most projects are built from several to many source files. Therefore the CVS revision number is of limited value only.

Subversion uses a different approach. On the up side, Subversion uses a repository version number to address a particular combination of file revisions at a certain point in time. This is perfect from a developers point of view, but it comes at a price: Subversion cannot do string replacements as CVS does, because a single revision of a file can be part of several revisions of the repository. The Subversion FAQ suggests to use the tool "svnversion" to generate a definition for C files or to generate a small C file containing the version number. I tried to integrate the Subversion revision into the RefDB binaries lately, but it turned out to be far more complicated than the FAQ suggested.

Read more »

How to use RefDB with a single database

November 15, 2006 - Markus Hoenicka
The default RefDB setup asks for at least two databases: a system database called "refdb" by default, and at least one reference database. The former contains bibliography styles and the journal word list. The latter contains the reference data proper. Keeping system data and reference data in two separate databases is preferred because you can scrap and re-create reference databases anytime without affecting the system database.

However, there are situations where you must stick to a single database. This is often the case if you rent some webspace "with MySQL". The fineprint usually restricts this to a single MySQL database which is created by the internet provider. Is it still possible to run RefDB on such a database?

Read more »

Working with multiple personal reference lists in RefDB

August 02, 2006 - Markus Hoenicka
RefDB has had an option to build personal reference lists for quite a while. However, there was just one such list per user, and there was popular demand that this limitation be dropped.

RefDB also has an option to create extended notes and link these notes to just about any object in the database - references, keywords, authors, you name it. You could of course create an extended note, link it to a handful of references, and call it a personal reference list. However, this possibility was far from apparent to most users.

Why not combine the best of both worlds, and implement personal reference lists on top of extended notes, while maintaining the simplified interface of the pickref and dumpref commands? Well, this is just what the current svn version (and the soon-to-be-released prerelease 0.9.8-pre1) offer. I'll tell you real quick how to work with this feature.

Read more »