Friday, December 19, 2008

Generic Dependency Parser : Boost

I've added a dependency parser project to my github repository to parse dependencies between arbitrary files.

The parser is generic and can work on arbitrary files and dependency rules. The unit tests contain a contrived example.

As a demonstration I've recorded inter-project dependencies of boost 1.37.0 by parsing over 6000 boost header files. boost.rb is the ruby script that generated the dependencies.

Above image is the dependency graph for the archive library.

The following section describe the parametrization for parsing boost.

How is Boost parsed?

The script records all header files within the boost include directory. Each file path is assigned a vertex name: if the file path contains a nested directory inside boost include directory, then the first nested directory name is used as a vertex name in the graph. If the file path points directly to the boost include directory the vertex name is derived as follows: if a directory with the same name exists (except for the file extension '.hpp') then the directory name is used. Else, the basename of the file including the extension is used (considered as mini-library).

Next, each recorded file is parsed for matching #include preprocessor statements. On every such statement the script tries to lookup the file inside boost directory and its vertex name. A dependency is then generated between the vertex name of the file parsed and the vertex name from the resolved include statement. If the dependency causes a cycle in the dependency graph, the dependency is not added but error'd to the logger.

After all files are parsed, the dependency graph is reduced by removing edges u -> w, where u -> ... -> w exists and written to a '.dot' file that can be converted into various formats using http://www.graphviz.org/.

Additionally, for each project a more readable dependency graph is generated by masking unrelated dependencies.

Limitations of parsing Boost

The parser is not

  • a complete preprocessor parser: it does not care about conditional include statements or commented ones. Parsing is based on simple pattern matching to keep to code small nice. You might however add a complete parser if you like to
  • recording cyclic dependencies: some include statements cause cyclic dependencies that simply result from the choice of mapping to vertex names. I.e. when file boost/a/detail/detail.hpp includes boost/b/win32/abc.hpp which in turn includes boost/a/other/other.hpp and the mapping resolves vertex names from the first nested directory inside boost, we generate a dependency from a -> b and finally another one from b -> a which will not be added.

Picasa: Add Filename Captions Automatically

Ever wanted Picasa captions based on image filenames? Well, here comes a solution.

When adding pictures to Picasa, Picasa scans the pictures for meta information based on IPTC-NAA, a standard for attributing text, images and other media with meta information such as captions, copyright, etc. When Picasa encounters a file with attribute IPTC.Caption set it uses the value of the attribute as default caption.

I've come up with a litte ruby script captionize.rb to set the caption attribute to a bunch of images found inside a directory. The caption set is the filename (including the file extension) of the image.

The script requires a valid ruby installation and Exiv2 in your path to modify the meta content of an image.

Wednesday, December 10, 2008

The Prodigy - Invaders Must Die

The Prodigy is back with their new album "Invaders Must Die". The new long player will be released on 2009/02/27. Pre-orders are already accepted.

The cover track "Invaders Must Die" was free for download during the week of November 30th, 2008. If you missed it, The Prodigy was so kind to bring you an high quality video for free.

Enjoy the new/old The Prodigy style.

Saturday, December 6, 2008

UUIDs For The Masses! (in Ruby)

An UUID (Universally Unique Identification), also known as GUID, is often used in software to uniquely identify information. Originally designed for distributed systems, UUIDs have quickly found their way to wherever non-conflicting identifiers identifiers are needed.

UUIDs in Ruby

Lately, some Ruby projects have evolved that generate UUIDs in pure Ruby language. This hasn't been always that way: in 2005 Brad Wilson posted a code snippet that allowed generation of UUIDs on windows platforms. Round about the same time the uuid project was kicked-off with the goal to create a pure ruby implementation and thus cross platform support for generating UUIDs. This project achieved gold status in 2006.

UUIDs from Web Services

There are some web services out there that allow online generation of UUIDs. So i thought it should be easy to fetch UUIDs from those web services in ruby. Turns out it is!

For those interested in generating UUIDs from such web-services I've added a uuid.rb along with some unit-tests and an adopted win32 implementation to my ruby-snippets repository.

Git repository for blog code

To maintain the code I post on my blog I've joined github a service that provides, amoung other services, free social code hosting. From their homepage

Not only is Git the new hotness, it's a fast, efficient, distributed version control system ideal for the collaborative development of software. GitHub is the easiest (and prettiest) way to participate in that collaboration [...]

You can find my repositories at https://github.com/cheind. I've already added some code along with tests to the ruby-snippets repository. You can also find the repository links under the 'Links' section located in the blogs sidebar.

I will directly link from blogs to relevant source if appropriate in repositories if adequate.

Thursday, December 4, 2008

Finally A Code Layout That Works!

Background

Our team develops software that evolves around robotics/vision. Speaking in software terms we do not have a single product but rather a set of continuosly growing libraries which are integrated in a set of applications. From the very first moment on we headed for modularized software components, to factor out common code.

Our team consists of ~ten members of which five contribute to the code on a daily basis. We use a versioning control system and build server for continous integration and track our bugs.

For a developer working on a specific project the code layout on his/her computer harddisk resides in a single project directory containing a flat hierarchy where each library/application resides in a single directory

sample_project
  +---app_a
  +---lib_a
  +---lib_b

Within the version control system, however, the code is organized differently. Each library/application is a project on its own. It links to required libraries by including them as externals. Quite often projects consist of tens of externals which makes the projects extremely hard to maintain.

What's the new layout?

In the version control system there is just a single robotics/vision project containing libraries and applications as single sub-directories.

The trunk contains stable libraries of general interest only. Applications and customer specific code will reside in branches of the trunk. Additionally we set up the following three simple rules.

  • No one, except the maintainer, commits to the trunk.
  • The trunk contains libraries that are of general interest.
  • Any code in the trunk exhibits tests and documentation.

Ideally the maintainer is a person that not actively contributes code. That way, all requests to integrate code in the trunk have include unit tests and at least a description of the feature/bugfix to be integrated. Otherwise, how should the maintainer verify rule #2 and #3?

From the rules it is obvious that the trunk will show only single commits for a single feature/bugfix. That eases the merging of features back into branches and allows for a readable changelog.

As anyone is actually working on branches of the trunk, project leaders are free to decide if and when to integrate features/bugfixes from the trunk.

Our build server will continously build the trunk and run all of its tests. Registering a branch for continous integration is only a matter of cloning the build project of the trunk.

Code => Trunk

One of the questions worth asking is: How to motivate people to contribute code to the trunk? There probably is no simple answer to this question. It depends, beside other reasons on

  • communication: high level of communication between team members will keep everyone updated on what others currently work on. So sentences like "oh nice, could we have this in trunk? I need it too" should occur often.
  • the background/education of team members: influences their inner drive to modularize code and actively contribute to the trunk.

Please feel free to criticize.

Tuesday, December 2, 2008

Method-Hooks In Ruby

Method-Hooks provide a way to intercept method calls on objects and can be implemented conveniently using Ruby's meta programming techniques.

Acceptance Test

Starting with an acceptance test helps focusing on the indended usage of the functionality we will implement. Suppose our simple system has the ability to draw content to windows. Windows can be configured via properties in their appearance. Everytime properties are changed the window needs to be redrawn to reflect the changes. In order to ease the handling of windows, we'd like to call redraw automatically when certain properties are changed.

Here's a short implementation of class Window

# Window capable of rendering items
class Window
  # Access background color of window.
  attr_accessor :background
  # Access text messages overlayed to window content
  attr_accessor :text
  # Simple counter that increments when redraw is invoked
  attr_reader :redraw_counter
  
  # Initialize window by drawing content
  def initialize
    @redraw_counter = 0
  end
  
  # Redraw window in case of changes and increment counter
  def redraw
    #...
    @redraw_counter += 1
  end

  # After setters are invoked update window content
  include FollowingHook
  following :background=, :text= do |wnd, args|
    wnd.redraw
  end
end

The most important part about this acceptance test is the syntax we'd like to implement to automatically call redraw when background color or text messages are changed. Here it is again

include FollowingHook
following :background=, :text= do |wnd, args|
  wnd.redraw
end

The keyword following is a method name that takes any number of method symbols and a block. Its semantic is to execute the code block immediaetely after the given methods are called. Block arguments are provided: wnd: the window for which one of the provided methods was called, args: given invokation arguments and return value of method.

Advantages are

  • Simple generic method hook interface
  • Concentrate common code in one place
  • Can use provided attribute accessors

Implementation

Here's an implementation of following inside a module FollowingHook

# Contains methods to hook method calls
module FollowingHook
  
  module ClassMethods
    
    private
    
    # Hook the provided instance methods so that the block 
    # is executed directly after the specified methods have 
    # been invoked.
    #
    def following(*syms, &block)
      syms.each do |sym| # For each symbol
        str_id = "__#{sym}__hooked__"
        unless private_instance_methods.include?(str_id)
          alias_method str_id, sym        # Backup original 
                                          # method
          private str_id                  # Make backup private
          define_method sym do |*args|    # Replace method
            ret = __send__ str_id, *args  # Invoke backup
            block.call(self,              # Invoke hook
              :method => sym, 
              :args => args,
              :return => ret
            )
            ret # Forward return value of method
          end
        end
      end
    end
  end
  
  # On inclusion, we extend the receiver by 
  # the defined class-methods. This is an ruby 
  # idiom for defining class methods within a module.
  def FollowingHook.included(base)
    base.extend(ClassMethods)
  end
end

Basically what happens is that for a given method to hook its original content is backup'ed using an unique alias name. The original method content is 'replaced' by an invokation of the original method and an invokation of the given block with current parameters. Additionally we prohibit hooking a method twice and make the original method private to prevent direct execution.

following is used as a class method. To define class methods from within modules we use an inner module that keeps our class methods. When the module is included by the receiver Module#included is invoked with the class as in-parameter. The receiver is then extended by the class methods using Object#extend which causes the receiver class to add following as a class method.

Usage

With our class Window in place we can write a simple unit test to check the behaviour of following

require 'test/unit'

# Tests for methods in PrivateMethodHooks
class TestFollowingHook < Test::Unit::TestCase
  
  # Test following method hook
  def test_following_hook
    wnd = Window.new
    assert_equal(wnd.redraw_counter, 0)
    wnd.text = "Show me!"
    assert_equal(wnd.redraw_counter, 1)
    wnd.background = [1.0, 1.0, 1.0]
    assert_equal(wnd.redraw_counter, 2)
  end
  
end

Further Examples

Here's a simple demonstration on how to log all invokations to Kernel#system

class Object
  include FollowingHook
  following :system do |receiver, args|
    p "#{args[:method]} called with arguments #{args[:args].join(",")}"
    p "return value was #{args[:return]}"
  end
end

system('ruby --version')
# => ruby 1.8.6 (2008-08-11 patchlevel 287) [i386-mswin32]
# => "system called with arguments ruby --version"
# => "return value was true"
# => true

Grab the code!