Photo migration from Flickr to Google Plus
I've been with Flickr since 2005 now, posting a lot of my photos there, so that other poeple from the events, that I usually take photos of, could enjoy them. But lately I've become annoyed with it. It is very slow to uplaod to and even worse to get photos out of it - there is no large shiny button to Download a set of photos, like I noticed in G+. So I decided to try and copy my photos over. I am not abandoning or deleting my Flickr account yet, but we'll see.
The process was not as simple as I hoped. There is this FlickrToGpluss website tool. It would have been perfect .. if it worked. In that tool you simply log in to both services, check which albums you want to migrate over and at what photo size and that's it - the service will do the migration directly on their servers. It actually feeds Google the URLs of the Flickr photos so the photos don't even go trought the service itself, only metadata does. Unfortunately I hit a couple snags - first of all the migration stopped progressing a few days and ~20 Gb into the process (out of ~40 Gb). And for the photos that were migrated their titles were empty and their file names were set to Flickr descriptions. Among other things that meant that when you downloaded the album as a zip file with all the photots (which was the feature that I was doing this whole thing for) you got photos in almost random order - namely in the order of their sorted titles. Ugh. So I canceled that migration (by revoking priviledges to that app on G+, there is no other way to see or modify progress there) and sat down to make a manual-ish solution.
First, I had to get my photos out of Flickr. For that I took Offlickr and ran it in set mode:
./Offlickr.py -i 98848866@N00 -p -s -c 32
The "98848866@N00" is my Flickr ID which I got from this nice service, then -p to download photos (and not just metadata), -s to download all sets and -c 32 to do the download in 32 parallel threads. An important thing to do is to take all you photos that are not in a set in Flickr and add them to a new 'nonset" so that those photos are also picked up here, there is an option under Organize to select all non-set photos. It worked great, but there were a couple tiny issues:
- There is a bug in Offlickr that it does not honor pages in Flickr sets, so it only downloads first 500 images in each set, fix for that is in that bug;
- It also wanted Python2.6 for some reason, but worked fine with Python2.7
- With that number of threads sometimes Flickr actually failed to respond with the photo, serving a 500 error page instead. Offlickr does not check return code and happily daves that HTML page as the photo. To work around that I simply deleted the HTML errors and then ran the same Offlickr command again so that it re-downloads the missing files. Had to repeat that a few times to get all of them:
ack-grep -l -R "504 Gateway Time-out" dst/ | xargs rm
After all that I had my photos, all 40 Gb of them on my computer. Should I upload them to G+ now? Not yet! See the photos all had lost their original file names. It turns out Flickr simply throws that little nugget of information away. It is nowhere to be found, neither in metadata or the UI or the Exif of the photos. Also some of my photos had clever descriptions that I did not want to loose or re-enter in G+ and also geolocation information. Flickr does not embed that info into the Exif of the image, instead it is provided separately - Offlickr saves that as an XML file next to each image.
So I wrote a simple and hacky script to re-embed that info. It did 3 things:
- Embed title of the photo into the Description EXIF tag, so that G+ automatically picks it up as title of the photo;
- Embed the GEO location information into the proper EXIF tags, so that G+ picks that up automatically;
- Create a new file name based on original picture taken datetime and EXIF Canon FileNumber field (if such exists), so that all photos in an album are sequential.
It uses exiftool for the actual heavy lifting.
After all that was finished I tested the result by uploading a few images to G+ and testing that their title is being set correctly, that they have a sane file name and that geo information works. After that I just uploaded them all. I tried figuring out the G+ API (they actually have it) but I was unable to pass the tutorial, so I abandoned it and simply uploaded the photos of each set int their own tab via a browser. That took a few hours. But that is much faster that with Flickr. Like 4 MB/s versus 0.5 MB/s faster. And here is the result. So far I kind of like it. We'll see how it goes after a year or so.
Now on to an even more fun problem - I now have ~40 Gb of photos from Flickr/G+ and ~100 Gb of photos locally. Those sets partially intersect. I know for a fact that there are photos in Flickr set that are not in my local set and it is pretty obvoious that there are some the other way round. Now I need to find them. Oh and I cann't use simple hashes, because Exif has changed and so have the file names for most of them. And not to forget that I often take a burst of 3-4 pictures, so there are bound to be a some near-duplicate photos in each set too. This shall be fun :)