Showing posts with label Thor. Show all posts

The most important checkbox #

Importing Away #

  PID COMMAND      %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
 8790 Thor         0.5%  2:56:01   1    93 36857   594M  13.3M  46.1M  3.41G

If I keep doing any more projects where I tend not to write destructors for my classes (or dispose of objects in general), I can have an excuse to get a G5 in no time. And I can say I not only need the beefier CPU(s), but also the larger memory capacity.

I'm importing the aforementioned dataset. It is very slow going (thanks FLOPC++ and COIN), and I don't even know how well searching works yet. Hopefully my future includes many pretty precision/recall curves.

Displacement Magick #

As previously mentioned, I'm going to be evaluating the Thor system by taking a set of simple drawings, distorting them to get N variants per shape, and then seeing how well the system does precision/recall when all the variants are considered to be part of the same class. The Mori paper uses the TPS (no, not the reports, thin-plate splines) model to generate these distortions. I'm not completely sure what they are, but I definitely feel like I don't have the time to completely replicate their work.

Fortunately I've found an approach that works nearly as well and is well within my reach. Back in my KPT-reading, "let's try every filter in Photoshop" days (version 2.0/2.5 era) I came across the "Displace" filter. It takes a second image (a displacement map) as an input and uses it to distort the current image (displacements are based on pixel intensities of the displacement map, with the red and green channels being used as the X and Y components respectively). I whipped up a quick displacement map by applying a lot of noise to a blank image and then blurring it slightly (so that distortion amounts didn't change suddenly from one pixel to the next) and applied it to one of the sketches. The results were very encouraging, and by having more than one of these displacement maps I could generate as many variants as I wanted.

The only thing left to do was to apply these displacement maps to all 249 test images. Photoshop supports scripting, but I didn't feel like learning yet another object model and/or dealing with another language. The simplest solution was to use ImageMagick (Mac OS X package) controlled by a Perl script (I know that there's a CPAN module for using ImageMagick more directly within Perl, Image::Magick, but again I felt like setting that up and learning to use it would take too much time). I needed two basic commands, one to add some padding to the image (since the map can displace pixels to beyond the boundaries of the original image) and one to actually apply the displacement. The former was done by convert -mattecolor white -frame 50x50+0+0 input file output file while the latter can be accomplished with composite -displace 50x50 displacement map input file output file. After letting the script chug along for 30 minutes, I now have a dataset of 249 x 20 = 4980 images.

Serendipity #

There are two big things left to do for this project, implementing shape contexts as an alternative method to compare against and finding a dataset to do the comparison with. Luckily enough, the solutions to both of those problems emerged simultaneously when reading one of the shape context papers, "Shape contexts enable efficient retrieval of similar shapes". In addition to describing a neat clustering/quantization approach that should help with performance (and make it easy to integrate with my system), they also talked about the ways in which they evaluated their system. Unlike other papers, say this one, they did not use a homemade dataset. Rather they used the MPEG-7 silhouette database, a 3D model database, and a standard set of images used in psychology made by Snodgrass and Vanderwart. When searching for this last one, I was disappointed to discover that it was not freely available (I'm sure our psychology department has a copy, but it's too much of a bother). However, I did find two other interesting looking datasets, one of objects and one of actions. Both are hand-drawn, and perhaps in some cases a bit more complex than what would be seen on a whiteboard, but not by much. Best of all, parts of both are freely available for downloading, therefore it looks like I am all set (I will most likely use the object one since the images it contains better match expected whiteboard content). Like the shape context paper, I will be applying various transformations to each image to get several derivatives, so that it is easy to measure precision/recall.

I have also been doing some cleaning up and tweaking to make my life easier. When viewing imported strokes I can zoom in/out now, pan and turn on and off point displaying. This makes checking out stroke extraction quality much easier (as opposed to having to recompile without point displaying and/or size normalization). Another thing that had been bothering me was that some thinned images would not get displayed at all. This turned out to be because I always add a one pixel padding to all stroke images, to ensure that all border pixels are white (makes life much easier when testing pixels, since I don't have to worry about ones on the edge). When the stroke image was as big as the original image, then it would have an x or y coordinate of -1. I'm using glDrawPixels to actually display the image, and this offscreen starting coordinate would cause it to clip the entire image. Thankfully a bit of Googling turned up a glBitmap-based workaround that seems to do the trick.

Write-only mode #

Changing the figure paddingI've nearly doubled the length of the writeup, which now hits a grand total of 6,858 words (still in draft form, with several sections left incomplete). Since I've been back in the wonderful land of LaTeX (as implemented by TeXShop) I've taken this opportunity to fix a few shortcomings that I didn't have a chance to take care of last time, having been in a whirlwind "cut-and-paste to piece something together" mode.

Most importantly, I always wanted to reduce the amount of padding that my figures got, since it looked particularly excessive. I'm using the wrapfigure package to lay them out, and after a bit of digging it turned out that it used the \intextsep and \columnsep document values to determine the vertical and horizontal padding respectively. Unfortunately, changing them by significant amounts, especially the latter, wasn't really an option since I didn't want to touch the amount that my columns were set apart from each other. Fortunately, I found a modified version of wrapfigure.sty that implemented two internal variables, \vsep and \hsep for this padding. By replicating those changes in my copy of wrapfigure.sty I was able to achieve my much desired tighter padding.

The whole process had a very black magic feeling to it however. For example, the following piece of code sets \hsep to some multiple of \columnsep: \WF@hsep=1.4\columnsep. However, it's not a simple multiplication by 1.4, since changing the value to a 1.0 doesn't return it to its default appearance. Still, I managed to make do by tweaking the values til I got what I liked (incidentally, 1.5). However, this is still something that deserves a bit deeper study at some point.

Build it up #

Here are the changes that need to be made in order to get WNLIB running on Mac OS X:

  • Add ./ to your path (if it's not already there, as is the case with the default OS X setup).
  • Locate all #includes of <malloc.h> and make them refer to <sys/malloc.h> instead.
  • Optionally, to remove a warning, change wn_system_memory_alloc_func_type in wnmem.h so that its argument is size_t instead of unsigned int. This also requires including <stdlib.h> and changing the type of the first parameter of lo_substitute_alloc in selftest_aux.c.
  • A bunch of files #define their own value of INFINITY. This is also #defined in <math.h>, so to remove the warning that results from this, we can just remove the #define (WNLIB defines it as WN_FHUGE (1.0e30), while <math.h> uses HUGE_VALF (1e50f) and so the two values are close enough).
  • translate_errno in wnio.c seems to support two modes to convert error codes into a human readable message. One is to use the strerror function (if present) and the other is to look up the error code in the sys_errlist system-wide global. Mac OS X seems to support both ways, but because of the way WNLIB tests to see if strerror is available (looking if linux or __CYGWIN__ are defined), we default to the second mode. Unfortunately the extern declaration that WNLIB doesn't match what is included in <stdio.h> (there's a const missing). The simplest fix is to make it use the strerror way instead, which can be accomplished by testing for the __APPLE__ define in addition to all the others.

Getting the library to build is one thing, but actually using it requires more effort. Since it is compiled with a C compiler and I'm using C++, to get the symbol name mangling to be consistent I also had to wrap the #includes in an extern "C" {}" block. There is a wn_assert in wntrnf.c that checks if total_capacity(i_capacities,len_i) and total_capacity(j_capacities,len_j) are equal. Unfortunately we're dealing with floating point numbers here, and precision issues do creep in. Making that assert a bit more lenient (only so many digits have to be equal) is necessary. Other precision issues also appear, mostly when comparing values with zero (especially when decrementing peripheries). Replacing those with comparisons to some epsilon value allows the code to run with real world data. More precisely, it allows wn_trans_problem_feasible, the first phase in solving the transport problem, to run. All this gives me is a feasible solution, but I would like an optimal one (or a close approximation thereof). This is done with the iterative function wn_trans_problem_simplex_improve which unfortunately despite all of my attempts, refuses to run (various asserts fail). I have for now given up on getting WNLIB to run (although its iterative approach made it appealing, since I could presumably get better running times out of it, at the expense of precision).

To continue with my string of failures for the day, I next moved on extracting usable data from the IAM dataset that I recently downloaded. The problem is that the images have a top portion that is meant for OCR, and then in the lower three quarters or so the actual handwritten sample resides. Ideally I only want this second part, but since the text is of varied length, it doesn't always start in the same place. My thinking was to take advantage of the fact that three horizontal lines mark the upper/lower boundaries of these sections, and by looking for those (rows with very low average values) I could see where to crop. Unfortunately the images have different intensities (in some cases the person used light pencil, in others very thick and dark marker) and so it's hard to come up with a sure-fire way of detecting the lines. This isn't really as much of a failure as the WNLIB attempt, but it's still rather annoying that in the end nothing worked today.

Odds and Ends #

I've been downloading the IAM handwriting dataset so I can see if it's appropriate for my uses. Unfortunately, downloading straight from the site to my computer at Princeton was giving me the dismal rate of 20K/sec, not very good considering I was looking at hundreds of megs worth of data. On a hunch, I tried wget on dyyme.pair.com (the machine that hosts mscape.com and this site), and there I was seeing 1.1 MB/sec. Downloading from there to my computer was at the rate of 200K/sec, thus still a bottleneck but an order of magnitude better than what I was seeing before.

A bit of traceroute-ing showed that the path from here to the unibe.ch server (where IAM is located) went through the Abilene backbone of the Internet2 to the GÉANT "multi-gigabit" network that connects research institutions in Europe. Despite the fanciness of the networks that my connection traversed (or perhaps because of it) I was seeing very dismal transfer rates. On the other hand, the connection from dyyme.pair.com went over the commercial/private network that Global Crossing has, with much better results. There's also something to be said about routing data manually versus letting the Internet do its thing.

Now that I have a basic version of the SQL query system up*, I'm seeing some rather odd behavior with its performance. Even simple queries like selecting the rows from an 88-record table that have a distance field under a certain threshold take a while to process. Part of this may be due to the conversion to/from strings that I have to do, but that doesn't explain it all. My current hypothesis is that this is due to GLUI taking up an inordinate amount of CPU time even the application idles. I have seen values as high as 80%, and when Thor is at a more reasonable level, WindowServer instead shoots up to 50% plus. After using QuartzDebug to turn on visible screen updates, it became apparent that the control window was being refreshed continuously. Poking through the GLUI source code showed that it always called the finish_drawing (which does a glFinish) function, even when the active control didn't have an idle handler. Making that call only happen when it was necessary seemed to help with the spurious refreshes, but Thor was still taking up lots of CPU time (if anything, more than before). Running Sampler on it while idle showed that it spent most of its time blocked in the main loop, waiting for the next event (GLUT seems to be built on top of Cocoa, so this was in NSApplication::nextEventMatchingMask). This made little sense, but I remembered a trick Szymon had mentioned. Adding a usleep(10 * 1000) call in my idle function (which GLUI calls after its done with its refreshing) brought the CPU utilization down to around 1%. This is somewhat kludge-ish, but I don't have the time (or the interest really) to delve into GLUT and/or GLUI's event handling, so this will do for now.

Of course, after all that work it turned out that GLUI and the idle function had nothing to do with the poor performance that I was seeing. In retrospect this makes sense, since GLUT appears to be single threaded and thus while the query function is running it was unlikely that the idle one was getting called. Still, fixing this means that it'll be much less annoying to leave Thor running in the background. Running Sampler while the actual query was going on showed that the program was spending most of its time computing the query stroke to basic shape stroke histogram distance. This is because some histograms (e.g. A3) are very dense (almost every bucket is populated) and thus the distance function takes a while. I thought that maybe I could thin out the connections a bit (if the buckets are too far apart there's no point in creating a connection between them), but that didn't really help for the very dense histograms and caused problems for the sparse ones.

I've also tried using a different COIN/OSI solver (OSL instead of CLP) but the FAQ suggests that this is as good as its going to get. I'm currently looking into alternative implementations, such as WNLIB. Unfortunately it seems like WNLIB's build process is a bit antiquated and doesn't work too well on Darwin/Mac OS X out of the box. This will need more attention tomorrow.

* I realize I've never actually written out how my query system works. This is worth doing, if only to have something to reference to (or just copy and paste) in my writeup:

  1. Basic shapes B1, B2, ..., Bn are loaded (currently just a circle and a square)
  2. For each basic shape i we compute histogram Hi1, Hi2, ..., Him (currently inner angles, D2, A3, D3)
  3. When loading stroke data, we compute for each stroke j histograms Hj1, Hj2, ..., Hjm, along with the distances (using the Earth Mover's metric) between Hi1 and Hj1, Hi2 and Hj2, ..., Him and Hjm (each loaded stroke's histogram has n distances, one to each of its counterparts in the basic shapes)
  4. All of the above (strokes, histograms, distances) are stored in a SQL database. Distances are actually stored as ratios, e.g. if histogram Hj1 had distances of 20, 40, and 140 to basic shape histograms Him (for i from to n) then we should store 0.1, .2 and 0.7 as the distance values.

When a query is made, we go through the following steps:

  1. For the query stroke, we compute its histograms Hq1, Hq2, ..., Hqm
  2. For each of those histograms, we compute the distances to their counterparts in the basic shapes, and determine the ratios as described above.
  3. We do a SQL query, looking for strokes in the dataset whose histograms had similar distance ratios to the basic shapes.
  4. This is done per histogram type, and then if a certain portion of the histograms agree that a stroke was in the same distance range as the query (currently 3/4), then we include it in the result set.

Currently no ranking is done (though this would be easy since we can look for how close the distance ratios really are), no further refinement is done (since we are now restricted to very few stokes, it should be feasible to do actual distances from the query stroke to the dataset strokes) and we only work at the simple stroke level, as opposed to looking at entire sketches. But it's a start.

Datsets and Databases #

One problem that I will eventually have to deal with is finding a good dataset to evaluate Thor against. Since sketch searching is still a nascent research area, there don't seem to be any good "benchmarks" that I could run my code against. I can rely on images returned by some Google image search, but that involves putting a lot of faith in my sketch extraction as well as a lot of cleaning up and categorization by hand.

One alternative is to use handwriting data, collections of which are much more numerous. The CEDAR database seems to be popular, but getting access to it seems to involve ordering a CD-ROM. There may be something in this list of computer vision test images but it'll require some digging. More helpfully, the IAM Database from Switzerland is available online in the form of high resolution TIFF scans. The fact that the scans are so large may help, since one issue with handwriting data is that it's very high frequency and not very representative of the kind of writing that people will be doing on whiteboards. There's still the problem of verifying results, since most of these databases are recognition oriented, whereas I'm looking for similar shapes.

N.B. Most of the database sites were found through this links page.

Histograms everywhere #

Up until now I've only been computing a simple histogram, the distribution of inner angles for all points in the stroke. Based on the ideas in the Shape Distributions paper, I have been adding more types. Although their focus is on 3D model searching, it is generally applicable to my 2D stroke work as well (for once, instead of leaving the extension of a 2D algorithm into 3D as "an exercise for the reader", I get to instead simplify a 3D algorithm down to 2D).

Several shape functions that can be used to generate signatures are described in the paper. The ones that I chose as being relevant/interesting are A3 (angle between three random points), D2 (distance between two random points) and D3 (square root of the area between three random points). In order to do this sampling randomly and still have it exhibit as little variance as possible, they pick 1024 * 1024 points, bin them into 1024 buckets and then sample the resulting histogram into a curve with 64 points. I currently only pick ten thousand points that I just leave into a histogram with 100 buckets. Generating the histogram repeatedly shows some visible variance, so I may have to bump up these numbers. There is also the issue that I do all this sampling at the stroke points (after the stroke has been resampled and normalized for position/size). By comparison, the paper does all sampling randomly. The net effect is that I get periodic spikes because of the even spacing between points. May or may not be desirable.

histogram Also on the topic of histograms, I've polished their appearance a bit. The Tufte-inspired look makes them a bit more readable, and the labels help when you've got 4 histograms per stroke and several strokes on the screen.

This was actually part of a general UI cleanup that I did on Sunday, as a way of procrastinating from real work. Initially I had wanted to add tabs to GLUI, so that I could more easily (and intuitively) toggle between the different modes that Thor supports (importing, querying, viewing of basic shapes). Unfortunately GLUI's codebase is pretty decrepit (in addition to having an indentation scheme that I completely disagree with) and so that plan was soon dropped. Instead, I settled for having a mode menu which hides/show the relevant pane(s) of controls. GLUI doesn't support hiding and showing of controls per se, but adding that was relatively easy since the functionality is effectively implemented by rollouts.

New and Improved #

Resampling works perfectly now. It turned out I was doing some rather bone-headed things, like doing a linear interpolation between a point and itself. Also, I the number of points inserted between two critical points is now a function of the portion of the total stroke length that is contained between the two of them.

I've also tweaked the graph-based hair removal algorithm. Consider the lower left corner of the image shown. Once the two hairs of the Y-like shape sticking out are removed, what we have left is another hair. Previously, the hair edges removal was all done in one pass, but to remove newly formed hairs we now have to do multiple passes (as well as keep track of edges per vertex, so that we know when what was formerly an inner vertex is now an endpoint). In this particular case, the unremoved hair wasn't actually having a detrimental effect, since the shape could still be represented by a single stroke.

Hair Loss Considered Useful #

Examples of hairs being removed I've been working on removing the aforementioned "hairs" that can result when thinning an image. I chose to use the graph-based algorithm that I described earlier, and its implementation turned out to be very straightforward (the Liu paper did use graphs for something like this, but they seemed to be more concerned with merging points rather than deleting pixels outright).

The upper left image shows the original thinned image, with the pixels corresponding to edges (random colors) and vertices (endpoint ones in red, inner crossings in green). The first step in extracting these to features is to get the vertices. They can be determined by looking at the number of white to black crossings that a circular iteration over a pixel's neighborhood encounters (this method was also previously used to determine the number of times a pixel can be visited in the stroke extraction phase). Once we have the vertices, we can go out from each one, following non-background pixels until we reach another vertex. These pixels that are traversed make up an edge.

The image in the upper right corner shows the result of stroke extraction when the hairs have not been removed. As it can be seen, the rectangle is decomposed into three strokes, which isn't very helpful when trying to categorize the shape by looking at individual ones. The lower left corner shows the thinned image with the hairs removed. Currently, any edge that has at least one vertex be an endpoint (as opposed to a crossing) and is made up of less than half the average number of pixels per edge is declared to be a hair, and all of its constituent pixels are replaced with the background color. Finally, the lower right image shows the strokes extracted from this "hairless" image. As expected, the entire rectangle is made up of one stroke.

I've also tweaked the despeckling so that it now does hole filling if a pixel has at least 6 of its neighbors of a non-background color (as opposed to requiring that all of them be so). Not that this is of earth-shattering importance, but writing it down will hopefully remind of this change in the future, so that things like forgetting that I turned off despeckling for an entire month won't happen again.

Finally, I've been implementing the cleaner and more maintainable way of doing resampling. Ironically enough, it's also giving me trouble, but it seems like it's the way to go, I just need to sleep first.

Thinning Trouble #

Now that resampling is working to some extent, and I can load/store things into the SQL backend, I've been able to get a basic query system working. It's sort of kludge-y, in that I do simple binary comparisons. I look at the query stroke, and decide whether it looks more like a circle or a square (using only the inner angle histogram). Then, I return all strokes from the dataset that are also more like the basic shape that I chose. Very plain, but this was meant to get the rest of the infrastructure (query input, results display, (de)serialization) in place. Now real progress can be made.

Difference image (left), thinned image (right) Well, except for this one thing: the thinned image shown. The little lines (I'll call them "hairs") sticking out of the basic skeleton are disturbing the stroke extraction process. Consider the bottom portion. Assuming we start with the bottom left corner, we keep advancing along the pixels. As we get to the first intersection, we continue moving to the right, since we favor moving in the same direction. We do the same thing at the second intersection (in the bottom right corner), resulting in us reaching the end of this segment. Therefore, we're forced to begin a stroke anew, with the net result being that the core feature of this shape, the rectangle, is broken up into several strokes. One solution appears to be to do a pre-processing step that attempts to take these hairs into account. Consider a graph that whose nodes are either endpoints or intersections in the thinned image. By looking at edge lengths in the graph, it should be possible to determine which are hairs (relative short lengths, one node is an endpoint). Then, we can either discounts those points entirely, or simply give their direction a lower weight when reaching an intersection in the stroke extraction phase.

Although this graph-based approach was suggested by Szymon, I believe that something like it is described in Liu97 (Robust Stroke Segmentation Method for Handwritten Chinese Character Recognition), which is where I got the idea to use thinning in the first place. As it happened, the dataset that I was using at that stage didn't exhibit these artifacts, and so I felt like I could discard the "hair removal" part of the paper.

Balance act #

Having spent the last two days working on it, it's becoming apparent that resampling is taking more time than expected. Part of this can be attributed to my unwillingness to sit down and figure out the algorithm once and for all. Rather, my modus operandi has been to hack something out, try it out with a few things, fix what's broken, see what the fix broke, and so on until (hopefully) everything worked. Though it is in a working state right now, I feel like another bug discovered would make the whole thing fall apart.

Szymon has suggested a better way of doing the resampling, which in retrospect seems rather obvious. We still do a first pass to determine that all the critical points are. Then we walk through the critical point list, taking them two at a time. Between the pair we insert desired sample points/number of critical points number of points, along with the critical points themselves. Since that's all that there seems to be to it, I should implement this sooner rather than later for the sake of maintainability.

I've also been writing more code to save/load objects from the SQL backend. When I expressed my interest in graphics, I didn't expect to have to spend time serializing and de-serializing objects. The MySQL API is very easy to use, but it's all strings-based. This is acceptable in languages like Perl, but gets to be a pain in C++.

One Stroke, One Point, One Sampling #

One of the more important things that has to be done for stroke normalization is uniform sampling (the other being scaling independence). This is especially important when handling user queries, since the user's mouse can linger in certain places that would otherwise not deserve as much detail. Generally speaking, resampling should be easy, since it's just a matter of treating each stroke as a parametric curve. It can then be sampled for regularly spaced intervals of t. Unfortunately, this doesn't always preserve appearance for strokes with sharp angles, especially if the number of samples is low in comparison to the complexity of the curve.

Inspired by this project at Waseda University, I decided to first do a pass of the stroke, looking for critical points ("critical" being defined as control points at which the angle formed by vectors to the previous/next ones is below a certain threshold). Then, when doing resampling, I perturb the uniformity a bit, so that those points get included in the final version. This works well enough for the most part, but it seems to occasionally induce some obtuse angles where there were none before, which throws off to some extent the previously mentioned angle distribution histogram similarity measurement.

Loose ends #

After the pain of last night, things are going much smoother, and the program can actually tell apart circles and squares. Some improvements and observations:

  • Accessing the results of the transport-simplex problem isn't necessarily obvious (since FLOPC++ is sorely lacking in documentation). To get things out of the MP_variable result, the level method has to be used. Although it takes multiple arguments (suggesting that one can access the flow from bucket i to bucket j by calling level(i, j), it seems to work only if one actually does i * rowWidth + j (since I specified things in row major order, presumably it uses column major).
  • Another reason to keep track of link indices by hand is because it's much more efficient to not generate links from histogram buckets that have 0 value. This results in an irregular link ordering, but we end up with 10x less links (at least for simple shapes like triangles, squares and circles), which has a direct (linear?) impact on solving speed.
  • FLOPC++ likes to output debugging info, and there's no obvious switch to turn it off. Adding a few #ifdef DEBUG lines in MP_model.cpp takes care of this
  • COIN outputs its own debugging info, but it has a much nice logging interface. However, its default logging level is to output some things, and FLOPC++ doesn't change this. Changing the default logging level (in Coin/CoinMessageHandler.cpp) from 1 to 0 is the remedy.
  • Changing the default requires COIN to be rebuilt. Although this wasn't apparent last night (due to it being in a temporary location), the process requires that there be no spaces in its path. This can be fixed by tweaking Makefiles (adding quotes), or by renaming folders in its path. You can guess which is the easier thing to do.

No pain, no gain #

The next step, once we have some kind of shape histogram, is to figure out distances between them. The hip, up-to-date way of doing this is to use earth mover's distance, the canonical paper for which is, The Earth Mover's Distance as a Metric for Image Retrieval. According to the paper, the algorithm degenerates to a transportation problem. One histogram servers as the source/supply side, with the other being the destination/demand. The bucket values are the supply and demand values, while the absolute difference in bucket indices is the distance. The paper suggests to solve this using the transportation-simplex method, gives a couple of reference, and then moves on to more interesting things. Those us who actually have to implement things are then left grasping at straws.

After briefly trying to find descriptions of the transportation-simplex method so that I could implement it myself, I decided that there must be an easier way. A quick Google search turned up FLOPC++ which seemed to do what I want, and featured a very nice syntax (and even presented a code snippet for doing a transportation problem). However, it depended upon COIN, an IBM-provided open source OR library. As usual, I feared an interminable chain of dependencies, and/or broken Makefiles, but things were surprisingly easy. COIN actually had a Darwin target, and built out of the box. FLOPC++ took some coaxing, since it was trying to build a shared library, something that the OS X build of gcc doesn't support per se. Changing the linker from gcc to libtool with a -static argument seemed to do the trick (I can't take much credit for figuring this out, the COIN Darwin makefile described all this, and it was very easy to lift their config code).

Once all the libraries were built, it was time to try out some simple test code. After some configuration confusion (why does Xcode have a header include path option in both the project and in the specific target, both requiring to be set?), the code seemed to build, and I was all set to integrate COIN/FLOPC++ with the main codebase. All went well, until I tried to compile, and then all hell broke loose. The 150+ compiler errors that I got were very mystifying, especially considering that the code had built just fine in the test project. After painstakingly checking all compiler options (the main Thor project uses the "Carbon Application" template, while the test one used the "C++ Tool" one, which could account for some differences) to no avail, I finally tracked it down to the precompiled header that was being prepended to all source files. Unfortunately, seeing which of the #include's in the .pch file was to blame was a very tedious task, since the entire project had to be rebuilt after each trial (the fact that ZeroLinking seems to make Xcode somewhat more reticent than it should in terms of recompiling files didn't help either). In the end, it turned out that any header that (even indirectly) included <Carbon/Carbon.h> caused things to go haywire. Spinning those off to a separate include (unfortunately I couldn't figure out how to make it precompiled as well, thus increasing compile times for now) seemed to fix the problem.

Once this was all done, and I got to actually run the code I had written, results were inconclusive. When using hand-drawn circles and squares, the distances seemed to be indicative of something, but how accurate it is remains to be seen.

P.S. It appears that the Princeton network has been having difficulties with traffic directed towards mscape.com this afternoon (i.e. nothing gets through). This entry is brought to you courtesy of CoDeeN/PlanetLab and the MIT CoDeen proxy.

Inching forward #

Trying yet again to pick up the ball on Thor. Did a few things today:

  • Added "presets" option so that different import parameters (source/destination paths, calibration image count, etc.) can all be loaded at once.
  • Realized why image differences were broken: when I did the write-up back in January, I turned off despeckling in order to show its effects. Turning it back on, and improving the distance metric (Euclidean instead of three absolute differences) gives me much better results now.
  • Created a Histogram class and began to use it to keep track of inner angle distributions. It is now hooked up to the query stroke set (and can be visualized in real-time). Will have to implement earth mover's distance to compare with other shapes.

Mouse Drawing #

Finally began to work on the query side of Thor. Added a view/query mode switch button (yes, modal interfaces are bad, will fix this later), and a simple drawing tool so that queries can be specified. I then began to work on what I thought would be the simplest attribute possible, average angular separation between points. This would let me differentiate between circles (low separation) and rectangles and other angular shapes (high separation).

Unfortunately, though the separation computing works, my simplification function managed to spontaneously break itself, and the (recorded) difference between circles and rectangles is very small. This isn't helped by the fact that it's very hard to draw straight lines with a mouse, and so my rectangles aren't really any simpler than my circles.

Glooey DBs #

Thor UI Since I've begun to focus on the query part of Thor, the UI is starting to matter more and more. Until now I've gotten by with a GLUT-based display window and status readout, while control was handled by the keyboard and the right-click menu. This was getting rather tedious to code for (especially if I wanted anything fancier) and so I've now moved on to using GLUI. Getting it to build on Mac OS X required changing its CC variable in its makefile to CC=gcc -framework OpenGL -framework GLUT, but beyond that the resulting libglui.a was usable as is. The API uses a very simple callback based approach, and although the library itself is done in C++, it sticks to very basic features (i.e. inheritance is about as fancy as it gets). The automatic control layout is good enough (though perhaps somewhat lacking in documentation - to get the text input labels to be right aligned I had to go through the source code and figure out that I was supposed to modify the text_x_offset member). Right now all of the controls are in their own window, but I may want to merge the two in the future (this requires a bit more work to integrate with the usual GLUT event callbacks).

I've also made some more progress with the MySQL storage backend. Thanks to CocoaMySQL modifying tables and checking results isn't quite as tedious. I've also changed my table structure a bit: there are now frames, subImages and strokes tables, mirroring my CapturedImage, SubImage and Stroke C++ objects. Saving is handled well, and now the next thing is to add a simple primitive categorizer (like the angle distribution one) so that I can have some kind of basic query working.

Positive Impulse Needed #

As a small step towards regaining my momentum for working on Thor, I've begun to hook up my current stroke extraction mechanism to a storage backend. Since this is a fundamentally a database of strokes, and I want to be able to do fast queries to eliminate strokes that are clearly irrelevant, MySQL seemed like the best choice, in terms of performance (versus what I could write), ease of integration and ease of deployment. Plus this gets me a distributed architecture for free, i.e. there's nothing stopping me from having multiple capture clients all dumping data into a centralized server. The C API is pretty simple (and looks very familiar after having worked with the Perl DBI module), and works pretty much as advertised under Mac OS X and with Xcode. As of now, I can save each image's captured strokes into a table, which is a decent enough start.

To generate the INSERT query that would save the aforementioned data, I needed to generate some formatted strings. The STL string class is a bit lackluster in this area, but Boost seems to have a nice collection of classes for this, as well a bunch of other tools. Getting it to run on Mac OS X was also a matter of downloading it and its build system (I was afraid that it would be a Linux-like stack of dependencies that's several levels deep, but my fears were unwarranted).

As I was doing this I realized that I haven't really given much thought to how strokes should be organized. Right now, I separate each delta frame into (pre-thinning) continuously connected sub-images, and extract strokes from each one of those. Separating things spatially is a good start, but is a bit too brittle. There are cases like loops where the pen pressure varies slightly, resulting in breaks. Since the separation between strokes is very small (on the level of one pixel), it seems like there should be some joining. Also, grouping only those strokes that are actually touching may be a bit too stringent, the Flatland paper uses bounding boxes, a much better idea.