Friday, December 19, 2008

Generic Dependency Parser : Boost

I've added a dependency parser project to my github repository to parse dependencies between arbitrary files.

The parser is generic and can work on arbitrary files and dependency rules. The unit tests contain a contrived example.

As a demonstration I've recorded inter-project dependencies of boost 1.37.0 by parsing over 6000 boost header files. boost.rb is the ruby script that generated the dependencies.

Above image is the dependency graph for the archive library.

The following section describe the parametrization for parsing boost.

How is Boost parsed?

The script records all header files within the boost include directory. Each file path is assigned a vertex name: if the file path contains a nested directory inside boost include directory, then the first nested directory name is used as a vertex name in the graph. If the file path points directly to the boost include directory the vertex name is derived as follows: if a directory with the same name exists (except for the file extension '.hpp') then the directory name is used. Else, the basename of the file including the extension is used (considered as mini-library).

Next, each recorded file is parsed for matching #include preprocessor statements. On every such statement the script tries to lookup the file inside boost directory and its vertex name. A dependency is then generated between the vertex name of the file parsed and the vertex name from the resolved include statement. If the dependency causes a cycle in the dependency graph, the dependency is not added but error'd to the logger.

After all files are parsed, the dependency graph is reduced by removing edges u -> w, where u -> ... -> w exists and written to a '.dot' file that can be converted into various formats using http://www.graphviz.org/.

Additionally, for each project a more readable dependency graph is generated by masking unrelated dependencies.

Limitations of parsing Boost

The parser is not

  • a complete preprocessor parser: it does not care about conditional include statements or commented ones. Parsing is based on simple pattern matching to keep to code small nice. You might however add a complete parser if you like to
  • recording cyclic dependencies: some include statements cause cyclic dependencies that simply result from the choice of mapping to vertex names. I.e. when file boost/a/detail/detail.hpp includes boost/b/win32/abc.hpp which in turn includes boost/a/other/other.hpp and the mapping resolves vertex names from the first nested directory inside boost, we generate a dependency from a -> b and finally another one from b -> a which will not be added.

0 Kommentare:

Post a Comment