Developing machine vision software with Ruby instead of C/C++

When I started doing a PhD in machine vision in 2004 I didn’t know what I was in for. I thought I would learn about various object recognition algorithms, implement them in C++, and then try to come up with something new. I was motivated to implement 2D object recognition and tracking algorithms and I was hoping to eventually get into 3D object recognition/tracking and/or Visual SLAM (simultaneous localisation and mapping).

The trouble started when I started to realise how many representations of images there are. I am not even talking about colour spaces or compressed images/videos. It is already sufficient to just consider one-channel grey images. Virtually every C/C++ library for handling images comes with its own data structures for representing images. I.e. when trying to use more than one C/C++ library at a time, one ends up writing a lot of code for converting between different representation of images.

It get’s worse. CPUs usually offer arithmetic using 8-bit, 16-bit, 32-bit, and 64-bit integers which can be signed or unsigned. Also there are single-precision and double-precision floating point numbers (i.e. 10 or more different native data types). When implementing a C/C++ library which just wants to support basic binary operations (addition, subtraction, division, multiplication, exponent, comparisons, …) for array-scalar, scalar-array, and array-array combinations, one quickly ends up with literally thousands of possible combinations. This leads to a combinatorial explosion of methods as one can see in the Framewave library for example.

In the end I wrote a thesis about a different way of implementing machine vision systems. The thesis shows how one can implement machine vision software using a popular dynamically typed programming language (i.e. the Ruby programming language).

The listing below shows an IRB (Interactive Ruby) session to illustrate the result. Comment lines (preceded with ‘#’) show the output of the IRB REPL (read-eval-print loop). The session first opens a video display showing the camera image. After closing the window it shows a video display with the thresholded camera image.

require 'rubygems'
require 'hornetseye_v4l2'
require 'hornetseye_xorg'
include Hornetseye
input = V4L2Input.new
# #<Hornetseye::V4L2Input:0x7fbad6151a38>
X11Display.show { input.read }
# Frame(YUY2)(640,480,0x3fdd6b1972a0)
X11Display.show { (input.read.to_ubyte >= 128).conditional 255, 0 }
# MultiArray(UBYTE,2):
# [ [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .... ],
#   [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .... ],
#   [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .... ],
#   [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .... ],
#   [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .... ],
#   [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .... ],
#   [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .... ],
#   [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .... ],
#   [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .... ],
#   [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .... ],
#   ....

See the picture below for an example of a thresholded image.

The example has just 7 lines of code. The REPL furthermore facilitates experimentation with machine vision software in an unprecedented way. The system achieves real-time by generating C-programs for the required operations, compiling them to Ruby extensions, and linking them on-the-fly.

I released the software as software libre under the name Hornetseye. My thesis is available for download now, too: Efficient Implementations of Machine Vision Algorithms using a Dynamically Typed Programming Language (Bibtex).

Here’s the abstract:

Current machine vision systems (or at least their performance critical parts) are predominantly implemented using statically typed programming languages such as C, C++, or Java. Statically typed languages however are unsuitable for development and maintenance of large scale systems.

When choosing a programming language, dynamically typed languages are usually not considered due to their lack of support for high-performance array operations. This thesis presents efficient implementations of machine vision algorithms with the (dynamically typed) Ruby programming language. The Ruby programming language was used, because it has the best support for meta-programming among the currently popular programming languages. Although the Ruby programming language was used, the approach presented in this thesis could be applied to any programming language which has equal or stronger support for meta-programming (e.g. Racket (former PLT Scheme)).

A Ruby library for performing I/O and array operations was developed as part of this thesis. It is demonstrated how the library facilitates concise implementations of machine vision algorithms commonly used in industrial automation. That is, this thesis is about a different way of implementing machine vision systems. The work could be applied to prototype and in some cases implement machine vision systems in industrial automation and robotics.

The development of real-time machine vision software is facilitated as follows

  1. A just-in-time compiler is used to achieve real-time performance. It is demonstrated that the Ruby syntax is sufficient to integrate the just-in-time compiler transparently.
  2. Various I/O devices are integrated for seamless acquisition, display, and storage of video and audio data.

In combination these two developments preserve the expressiveness of the Ruby programming language while providing good run-time performance of the resulting implementation.

To validate this approach, the performance of different operations is compared with the performance of equivalent C/C++ programs.

I hope that my work has shown that the choice of programming language plays a fundamental role in the implementation of machine vision systems and that those choices should be revisited.

See also

  • HornetsEye: Ruby computer vision library (developed as part of this thesis)
  • OpenCV: C/C++ real-time computer vision library
  • NumPy: Python numerical arrays
  • NArray: Ruby numerical arrays
  • Lush: Lisp dialect for large-scale numerical and graphic applications
  • Halide: a language for image processing and computational photography
  • Maru: a symbolic expression evaluator that can compile its own implementation language

Update

The thesis is now also available on Figshare.com.

Android Hello World

Here a quick documentation on how to build a C program using the Android NDK for Linux. The program can be run with the Android Terminal.

Here’s the Makefile.

NDK = $(HOME)/android-ndk-r8b
TOOLCHAIN = /tmp/ndk-hello
SYSROOT = $(TOOLCHAIN)/sysroot
GCC = $(TOOLCHAIN)/bin/arm-linux-androideabi-gcc
STRIP = $(TOOLCHAIN)/bin/arm-linux-androideabi-strip
CFLAGS = -march=armv7-a -mfloat-abi=softfp -I$(SYSROOT)/usr/include
LDFLAGS = -Wl,--fix-cortex-a8 -L$(SYSROOT)/usr/lib

all: $(TOOLCHAIN) hello

hello: hello.o
	$(GCC) $(LDFLAGS) -o $@ hello.o
	$(STRIP) -s $@

.c.o:
	$(GCC) $(CFLAGS) -o $@ -c $<

clean:
	rm -f hello *.o .*.un~

$(TOOLCHAIN):
	$(NDK)/build/tools/make-standalone-toolchain.sh --install-dir=$@

And here’s the C program hello.c.

#include <stdio.h>

int main(int argc, char** argv) {
   printf("Hello world!!!\n");
   return 0;
}

Running make should cross-compile the program.

Racket on Android

A smart smartphone

The recent smartphones have quite impressive specs. E.g. the Samsung Galaxy S III comes with a 1.5 GHz quad-core processor and a GPU. It would be interesting though to run a real programming language on one of those phones. I am quite interested in the Racket programming language (also see Wikipedia) which offers full meta-programming and compile-time macros. I managed to cross-compile Racket to run on an Android phone HTC Desire S. I’ll describe the steps here. Hopefully I’ll find time to improve things and package it in some way.

Build

To cross-compile Racket for ARM you first need to compile Racket for the host system. The GCC cross-compiler and the Racket interpreter are then used during the cross-compilation.

#!/bin/sh
ARCHIVE=$HOME/Documents/programming/racket-5.2.1-src-unix.tgz
rm -Rf /tmp/build
mkdir -p /tmp/build/host
cd /tmp/build/host
tar xzf $ARCHIVE
BUILD_HOST=/tmp/build/host/racket*
cd $BUILD_HOST/src
./configure
make install
mkdir -p /tmp/build/cross
cd /tmp/build/cross
tar xzf $ARCHIVE
BUILD_CROSS=/tmp/build/cross/racket*
cd $BUILD_CROSS/src
./configure --host=arm-linux-gnueabi
make \
  RUN_THIS_RACKET_CGC=$BUILD_HOST/src/racket/racketcgc \
  RUN_THIS_RACKET_MMM=$BUILD_HOST/src/racket/racket3m \
  RUN_THIS_RACKET_MAIN_VARIANT=$BUILD_HOST/src/racket/racket3m \
  HOSTCC=/usr/bin/gcc \
  HOSTCFLAGS="-g -O2 -Wall -pthread -I./include" \
  STRIP_DEBUG="arm-linux-gnueabi-strip -S" 
make \
  RUN_THIS_RACKET_CGC=$BUILD_HOST/src/racket/racketcgc \
  RUN_THIS_RACKET_MMM=$BUILD_HOST/src/racket/racket3m \
  RUN_THIS_RACKET_MAIN_VARIANT=$BUILD_HOST/src/racket/racket3m \
  HOSTCC=/usr/bin/gcc \
  HOSTCFLAGS="-g -O2 -Wall -pthread -I./include" \
  STRIP_DEBUG="arm-linux-gnueabi-strip -S" \
  install
cd $BUILD_CROSS/src/racket
arm-linux-gnueabi-gcc -static -o racket3m  gc2/main.o libracket3m.a  -ldl -lm  -ldl -lm -rdynamic

The last command is used to replace the racket3m binary with a statically linked version.

Installation

First I installed Android Terminal Emulator. I also installed Kevin Boone’s kbox_shell which allows you to run BusyBox without requiring to jailbreak the phone (I don’t think BusyBox is required to run Racket but it is nice to have the various Linux shell utilities). I really recommend to install Hacker’s Keyboard to have a full PC keyboard on your phone.

Android Terminal Emulator

Then one needs to copy the following files to the mobile phone:

  • The racket3m binary is copied to the mobile in a location where it can be accessed from the terminal (e.g. /data/data/jackpal.androidterm/shared_prefs/kbox/bin/racket if you have kbox_shell installed).
  • Several Racket modules located in /tmp/build/cross/racket-5.2.1/collects/… need to be copied to $HOME/.racket/5.2.1/collects on the mobile phone. The libraries are
    • racket
    • syntax
    • setup
    • config
    • compiler
    • planet
    • mzlib
    • scheme
    • unstable
    • mzscheme

Racket on Android Unfortunately the full Racket language takes 90 seconds (!) to load on the HTC Desire S. Once loaded, the REPL is very responsive though.

Racket interpreter running on Android

Hopefully I get ffi and the readline library working to have an improved command line.

Any comments and suggestions are welcome

Update

It is possible to only load the Racket base language. Running the following command

racket -I racket/base

brings up the REPL in 7 seconds. Thanks Robby for the helpful comment!

Update

Racket 5.3.0 furthermore needs the following module to be installed:

  • s-exp

Update

I haven’t managed to build Racket with the Android NDK (see my thread on the Racket mailing list). Building Racket with the Android NDK would be preferable because otherwise it is impossible to dynamically load Android libraries in order to access Android specific functionality.

Clojure for Schemers

Recently I found a video of an interesting presentation by David Nolen (homepage, blog) (comparing the Clojure programming language with Scheme (Racket). Some important points made in the talk are:

  • The Little Schemer, The Seasoned Schemer, and The Reasoned Schemer are great Q&A-style books for getting to grips with the Scheme programming language.
  • Clojure does not support optimisation of tail-recursion leading to stack overflows. However one can use lazy sequences instead.
  • Clojure does not support continuations.
  • The frequent destructuring of data using car and cdr in Scheme is cumbersome. Clojure on the other hand has syntax support for various immutable data structures (lists, vectors, hashes, and sets) which makes for more readable code.
  • Clojure is a great platform for exploring concurrency (atoms, refs, agents, promises, and futures).

David Nolen concludes that it is not a question of either/or.

See also:

Law of attraction

Can we attract good and bad things with our mind? - No!

This is one piece of certified bullshit which I have encountered one time too many.

There seem to be quite a number of people who chose to belief that you somehow can attract good stuff (health, love, money, …) by positive thinking (previously known as wishful thinking).

Initially there was the (comparatively harmless) concept of positive thinking. I.e. some psychology experts observed that successful people seem to generally have an optimistic attitude. So they came up with the notion that you could be successful in life by thinking positive thoughts.

This is simply a confusion of cause and effect. But this didn’t stop some people from taking this idea further and coming up with the law of attraction. It finally culminated with a movie called the The Secret. According to this movie, mankind (except for a select few such as Rhonda Byrne of course) has somehow missed out on the fact that it is possible to attract wealth, health, and happiness by changing one’s thoughts and feelings.

So what’s the harm? Well, this is not about threading a fine line between healthy optimism and wishful thinking. This is serious bullshit. There are people trying to cure dangerous diseases using positive thoughts. For example a recent article suggests that Steve Jobs tried to fight cancer using magical thinking.

Here’s a nice artist’s animation based on a talk by Barbara Ehrenreich. Barbara Ehrenreich points out that positive thinking might even be the root cause for the financial meltdown.

The full talk (35 minutes) is available here: http://youtu.be/PJGMFu74a70