Do you want to hear the most incredible "it's not a bug - it's a
feature" story ever?
After shooting hundreds of megs of RAWs
with my Canon 350D last couple of weeks, I noticed a very strange thing
- importing this large amount of files from my camera into F-Spot took
ages. F-Spot ate memory in tens and hundreds of megabytes and never
returned it back to the system. Well I blamed it on Mono and went
searching for a better way. Then I found out that command-line C program
gphoto also take the same horrific amount of memory to import my photos.
I saw that to download 900 Mb of photos (~250 photos) photo memory use
went up to ~910 Mb (2 Mb were shared). Luckily Linux managed to swap out
part of gphoto, so I could finish the download with my 512 Mb of real
RAM and a 1 Gb swap file. I googled and founds tens of bug reports on
this - first of them as early as December 2004. Ouch.
Well -
let's see what the problem is, shall we? Some bugreports reference a bug
in gphoto's SourceForge bug tracker where a users reports that
downloading a 250 Mb video file takes 250 Mb of RAM and developers reply
that unfortunately that is the limitation of current infrastructure and
it is very hard to fix. Bumer.
But wait! He says that
downloading ONE file takes a lot of RAM. This limit should not exist
when downloading multiple files - we should be able to drop information
about previous file as soon as we start downloading the next one,
right?
Ok, lest see, what really is going on there.
Downloading source of gphoto. Looking at it. Seeing a lot of mess. After
around 10 minutes I start to understand that there is a table of option
names and functions and the real job is doe by command line parser who
calls a function as soon as he encounters a proper parameter on the
command line. :P After 3 more minutes jumping around the code I finally
get to a function that gets called to download a single file. Looks
pretty easy:
- take a CameraFile pointer
- pass it to gp_file_new() for inicialization
-
pass it to gp_get_file() to get the actual data of file (download
happens here)
-
pass it to gp_write_file_to_file() to dump the data to a file on disk
- pass it to gp_file_unref() to free the data
Looks all fine and dandy so far. However I see the memory use that
suggest that this last operation does not happen as it should, so I search
for the gp_file_unref() function. I do not find it in gphoto source, but
as I soon figure out - it is in libgphoto2. The function is pretty
straight forward - the reference count of the structure is reduced by 1
and if it has reached 0, the structure is freed from memory via
gp_file_free() function.
Hmm, I wonder what will happen if I
replace gp_file_unref() with gp_file_free() in gphoto? After a quick
compile and installation (I thank the Gods and all DD's for the wonders of
"debuild -us -uc && sudo dpkg -i ../gphoto*.deb") I ran gphoto
again. Wow, it now only consumes 8-16 Mb of RAM and not 900. The files
downloaded fine, but in the end glibc made a lot of fuss about "double
free". What does that mean? It means that someone managed to get a
reference to our MemoryFile and didn't give it back. Naughty boy!
We
only call three functions using that pointer, so it should not be hard to
trace them trough the source to see what they do. The gp_file_new()
function looks good, it sets reference count to 1 always. gp_get_file is
more complex - I get to crawl through a lot of strange redirects to all
levels of gphoto architecture. At one point I get a bit alarmed as I see a
local variable called ref_count, but then I see that the code just stores
reference count there for safekeeping while data is copied from another
object and right after that copy reference count is put back safely. After
all that I get to the end of the gp_get_file function, just a couple thing
left - cache the result, clean up and return the file. Wait a minute
....
CACHE?!?!?!!
$(&@($^@#$(^@&^$(#&$@#(&$(@#$&!^&$^@*!(&$#(@&
!!!!!
It appears that someone thought that it is a good idea to
use a gig or so of my RAM for cache, just in case if I would like to
download the same photos the second time around in the same program call.
IT IS NOT!
Results: one line patch, one NMU building for
upload, one *very* long bug in upstream bug tracker, one developer quite
upset and not too convinced about the correctness of free software ways
any more :P