Ruby Video Player

This is the second episode of my new Computer Vision for the Robotic Age podcast. This episode is about video-I/O. The podcast demonstrates how a video player with proper audio/video synchronisation can be implemented with the Interactive Ruby Shell. The Sintel short film (Copyright Blender Foundation) was used as a video for testing.

Here’s the source code of the Ruby video player created in the podcast:

require 'rubygems'
# load FFMPEG bindings
require 'hornetseye_ffmpeg'
# load X.Org bindings
require 'hornetseye_xorg'
# load ALSA bindings
require 'hornetseye_alsa'
# include the namespace
include Hornetseye
# open a video file
input = AVInput.new 'sintel.mp4'
# open sound output with sampling rate of video
alsa = AlsaOutput.new 'default:0', input.sample_rate, input.channels
# read first audio frame
audio_frame = input.read_audio
# display images using width of 600 pixels and XVideo hardware acceleration
X11Display.show 600, :output => XVideoOutput do |display|
  # read an image
  img = input.read
  # while there is space in the audio output buffer ...   
  while alsa.avail >= audio_frame.shape[1]
    # ... write previous frame to audio buffer
    alsa.write audio_frame
    # read new audio frame
    audio_frame = input.read_audio
  end
  # compute difference of video clock to audio clock
  delay = input.video_pos - input.audio_pos + (alsa.delay + audio_frame.shape[1]).quo(alsa.rate)
  # suspend program in order to synchronise the video with the audio
  display.event_loop [delay, 0].max
  # display image
  img
end

You can also download the video here

See Also:

Background Replacement

This is the first episode of my new Computer Vision for the Robotic Age podcast. This episode is on replacing the background of a live video with another video. The background replacement algorithm is implemented live using the HornetsEye real-time computer vision library for the Ruby programming language.

You can also download the video here

I am new to podcasting. So feel free to let me know any suggestions.

See Also:

Camera Calibration

I am currently working on camera calibration. Many implementations require the user to manuallly point out corners. Here is an idea on how to detect and label corners automatically.

  1. Apply Otsu Thresholding to input image.
  2. Take difference of dilated and eroded image to get edge regions.
  3. Label connected components.
  4. Compute corners of input image (and use non-maxima suppression).
  5. Count corners in each component.
  6. Look for a component which contains exactly 40 corners.
  7. Get largest component of inverse of grid (i.e. the surroundings).
  8. Grow that component and find all corners on it (i.e. corners on the boundary of the grid).
  9. Find centre of gravity of all corners and compute vectors from centre to each boundary corner.
  10. Sort boundary corners by angle of those vectors.
  11. Use non-maxima suppression on list of length of vectors to get the 4 “corner corners” (convexity).
  12. Use the locations of the 4 “corner corners” to compute a planar homography mapping the image coordinates of the 8 times 5 grid to the ranges 0..7 and 0..4 respectively.
  13. Use the homography to transform the 40 corners and round the coordinates.
  14. Order the points using the rounded coordinates.

Further work is about taking several images to perform the actual camera calibration.

Thanks to Manuel Boissenin for suggesting convexity for finding the “corner corners”.

Update:

After calibrating the camera the ratio of focal length to pixel size is known (also see Zhengyou Zhang’s camera calibration). Once the camera is calibrated, it is possible to estimate the 3D pose of the calibration grid in every frame.

I have created a screencast on how to locate the chequerboard calibration pattern.

See also:

Broken Tonight

Histogram-based classification

Computer vision with special reference to Ruby

Here is a small presentation showing histogram-based classification with Ruby.

Here is the program to capture the reference pictures

#!/usr/bin/env ruby
require 'rubygems'
require 'hornetseye_v4l2'
require 'hornetseye_rmagick'
require 'hornetseye_xorg'
include Hornetseye
BOX = [220 ... 420, 140 ... 340]
input = V4L2Input.new
mask = MultiArray.bool(640, 480).fill!
mask[*BOX] = true
labels = ['reference', 'dragon', 'knight', 'camel']
labels.each do |label|
  X11Display.show(:title => label.capitalize) do
    img = input.read_ubytergb
    mask.conditional img, img >> 1
  end[*BOX].save_ubytergb "#{label}.jpg"
end

The program for live classification is shown below

#!/usr/bin/env ruby
require 'rubygems'
require 'hornetseye_v4l2'
require 'hornetseye_rmagick'
require 'hornetseye_xorg'
include Hornetseye
PI = Math::PI
RAD = 2.0 * PI
VAL = 117.0
HBINS = 8
CBINS = 8
THRESHOLD = 32
N = 5
BOX = [220 ... 420, 140 ... 340]
class Node
  def hsv_hist(hbins = HBINS, cbins = CBINS)
    alpha = 2 * r.to_sint - g - b
    beta = Math.sqrt(3) / 2 * ( g.to_sint - b )
    h = ( (Math.atan2(beta, alpha) + PI) * (HBINS / RAD) ).to_int.clip 0 .. HBINS - 1
    c = ( Math.sqrt(alpha ** 2 + beta ** 2) * (CBINS / VAL) ).to_int.clip 0 .. CBINS - 1
    [h, c].histogram(hbins, cbins).to_int
  end
  def cmp(other)
    (self - other).abs / (self + other).major(1.0)
  end
end
input = V4L2Input.new
mask = MultiArray.bool(640, 480).fill!
mask[*BOX] = true
labels = ['reference', 'dragon', 'knight', 'camel']
hists = labels.collect do |object|
  MultiArray.load_ubytergb("#{object}.jpg").hsv_hist
end
history = ['reference'] * N
X11Display.show do
  img = input.read_ubytergb
  img_hist = img[*BOX].hsv_hist
  similarities = hists.collect { |hist| img_hist.cmp(hist).abs.sum }
  label = labels[similarities.index(similarities.min)]
  history = [label] + history[0 ... N - 1]
  if history == [label] * N
    system "echo '#{label}' | festival --tts" if label != 'reference'
  end
  history != ['reference'] * N ? mask.conditional(img, img >> 1) : img
end

And here is a demonstration video of the two programs