Tuesday 18 January 2011

Crossroads of new and old

In the progression of things, I'm generally in favor of more technology as opposed to less. Or more like: cleverer, more useful technology as opposed to low-tech solutions. This approach, however, does break down for me in a few cases for me that I couldn't figure out to a satisfactory solution, yet. I wager, these breakdowns occur because I'm not "young enough" anymore so I had extensive experience of the low-tech even if I can clearly see and evaluate the advantage of the high-tech available here and now. Also, I wager, besides experience, emotions cloud my judgement. So I think I want to talk a bit about electronic and paper books, and electronic and paper journals.

Journal Entry
By: joelmontes@flickr

Books: Dead Tree or Bits

As a little kid practically I grew up in a library and I've always been reading a lot. I bought a lot of books: not the most among the people I know, but give it some time and I'll have a pretty decent library at home. Provided I don't move continents and pass on that library to a second-hand bookshop, again.

In Taiwan it is slightly more difficult to find books in a language I can read (in most likelihood it will be in English) than back in good ol' Hungary or England. Not impossible, there are plenty good ones, but two selves in a 2 or 3 level bookstore is not the greatest. I turned to e-books, then, and started to read a lot on my... phone. Using Aldiko on a HTC Desire is pretty good - surprisingly good, I'd say, given the small screen size. And since I bring it with me everywhere, it is very practical - for a few hours before the constant screen usage devours the battery. And it is only really good for text-only novels, for any of my research papers or comics it is... well, comical. Thus I was thinking of buying a Kindle DX. Not the smallest kindle, which I've seen and it's great, but wouldn't solve my research-paper problems.

But then I was really thinking: it's such a very different feeling to have a library of paper books and electronic ones. I can have tens of thousands of books on a hard drive easily and will never be able to really read all of them. Buying the same number of paper books would give me a stop, do I really want to read every single one or just too lazy to be selective? The whole experience is completely different, and somehow the practicality of the ebooks does not compensate me for the loss of control and inner "fuzzy feeling" (yeah, that's how well I can put my finger on it).

Resolution? The best would probably be if every book could be a bundle: paper + ebook for some premium on the price. Not double, but maybe 10%. Maybe even 25% I could justify and would opt for the bundle instead of either of them on their own. Provided the ebook is DRM free, of course. But until then, I think I will just keep my paper copies no matter how large they are....
And as for the Kindle... I guess I will procrastinate a little while longer (since I don't need it, "merely" want it) and will buy it on an impulse later.

Journals: stored in attic or in backup

Daily journals also have a lot of emotions attached. Paper journals are really good to write, and sometimes even to read back. They tell a lot about the person, and give a lot of freedom how to use them. On the other hand, typing being my main way of putting words down, I don't trust my hands (and the cramps in them) to produce something consistently legible. Searching for things is also near impossible unless reading through a lot of material.

On the other hand, I set up a personal wiki not long ago, and it's just a matter of setting up the right pages and there am I, typing away the days events and observations, with linking to the relevant people and things, with search, with backup, with legibility.... and somehow with less involvement. I don't think one could possible write a journal on paper and then transfer it to the computer, at least I don't think there's good enough handwriting recognition to handle that yet.

Resolution? In the meantime I started to write it in the wiki, thinking that things written down in any format are better than not written down at all. And hoping that the sad feeling that by typing instead of writing I lose something, will diminish with time...


It is telling, however, that there are so many things that I don't mind being changed. Don't mind internet radio instead of normal radio or live performance as my main source of music. No matter how close journals and daily planners are, I don't mind using just Google Calendar for all my events, even if that means I've lost a lot of colorful notes that I had before in my planners. Don't mind sending emails most of the time even for close friends, and only the occasional snail-mail.

Maybe I'm just making things too difficult for myself. But I don't want to lose all the personality I have in exchange of convenience. At least there has to be a threshold.

Monday 17 January 2011

Facebook Hacker Cup Round 1A, that wasn't

From earlier blog posts it is clear that I like programming contests. Since I spend so much (too much?) time on Facebook, I was also looking forward to Hacker Cup 2011. Last week I accidentally ended up qualified for the first proper round (accidentally, because I had other engagements for the whole qualifier weekend, and my submission was late, but accepted). So I spent a while during the week preparing. Checked out the proper time zones (as Facebook events don't care about time zones at all, which has a number of awkward results for global events....), did a bit of practice on Coderloop, that sort of thing.

This "Round 1" supposed to have 3 sub-rounds to cater for the global audience, and those worked out to be Saturday night 11pm-2am, Sunday morning 5am-8am and Monday morning 5am-8am for me in Taiwan. Saturday evening went to a wedding a few hundred km away, so didn't expect to be back on time, but arrived home around halfway the first sub-round. Got a coffee (at midnight, that's livin' on the edge:) and set out to check the problem sets. I though maybe I qualify and then don't have to stay up, but even if I don't (since everyone else had a 1.5h lead) it's good for practice. I finished one of the problems and wanted to submit it, but it didn't work. Checked out the Facebook wall, and the organizers were saying "we know there's something wrong, just keep trying and it'll work". I tried a while longer, but then I noticed a newer announcement: they ended the round 20 minutes early because of the high number of people having problems, and I should stick around to see whether there will be 2nd and 3rd sub-round.... 2am, 3am came and passed, of course everyone's complaining on the Hacker Cup wall, and then came the decree: all other subrounds postponed at least until next week. Not very nice, considering people do make plans for things that are announced months in advance. Also, the story of the shoddy organization of the event does not end here, but a bit more about this later.

After sleeping off the slight disappointment and the lot of coffee, today I wanted to finish the other problem sets, since one can always learn from these and sometimes I just cannot rest until something is "done". That was basically the better part of today, and here's what I've learned.

*** Spoiler alert ***

Problem 1: After the dance battle

Can find the problem description on the practice page, clicking on the name on the left. That description annoyed me to no end. Maybe I'm a bit unfair because TopCoder also like to write problem sets that read more like fiction. But they at least make sure that every relevant thing is mentioned and everything is consistent. Here? No sign of that. Eg. two thing that stands out:
  1. the example input already violates the valid input formatting (the N number of test cases should be between 10 and 50, while in the test it is 5. This is not a problem that should break anybody's reasonable solution, but looks careless.
  2. the problem is described as taking place on a square lattice, and one moves between "adjacent" squares. There's never a definition of that adjacency. I could reasonably imagine a setting when squares diagonally to each other (i.e. the NE, NW, SW, and SE directions) are "adjacent". The only reason I know this is not the case here and only shared sides make squares adjacent is because otherwise my results for the test cases are different. Careless again.
Nevertheless, what is the puzzle? In a nutshell, one has to set up a graph of the maze: the squares as nodes, bi-directional and the connections are complicated by the "colors" (i.e. portals). I took some inspiration from the essay Python Patterns - Implementing Graphs. The next and final step is then finding the shortest path between the Start and Exit. It quickly turned out that the function in that essay is waaaay to slow for this. So go back to the basics, do a little implementing, one of the most commonly used (I think) method, Dijkstra's algorithm. Using the pseudo-code from that Wikipedia page, and checking out a couple of other sites to clarify the meaning of "relaxing" in the algorithm, it was easily done with Python lists and dictionaries...

def dijkstra(graph, start, end):
    Dijkstra's algorithm, based almost entirely on the pseudo-code
    in wikipedia.
    dist = {}
    prev = {}
    for node in graph.keys():
        dist[node] = float('inf')
        prev[node] = None
    dist[start] = 0
    queue = graph.keys()
    while len(queue) > 0:
        node = sorted(queue, key=lambda val: dist[val])[0]
        # Since just looking for the path from start to end
        if (node == end):
        if dist[node] == float('inf'):
        for neighbour in graph[node]:
            if (neighbour in queue):
                alt = dist[node] + 1
                if alt < dist[neighbour]:
                    dist[neighbour] = alt
                    prev[neighbour] = node
    # Find the number of steps between start and end
    steps = 0
    node = end
    while prev[node]:
        steps += 1
        node = prev[node]
    return steps

Probably it is very inefficient, but still finishes quicker than I can say "Dijkstra". And it is done, just loop through the test cases. The complete program, with the graph setup and the rest is in my github repo.
(Also, see update at the end of the post.)

Problem 2: Power Overwhelming

This one has a bit more story to it: apparently FB's own programmers messed it up and the output of the test cases was wrong in 3 times out of 5... They took it off from the practice page too, but someone posted it as a question on Stackoverflow, so there you can read the setting.

It is basically an integer programming algorithm. Most people outside of programming but still somewhat mathematically inclined would think that "integers are easier than real numbers". That is not true for optimization, where integers force another constraint on the problem that solvable mostly by brute force search.

Anyway, even if the problem is taken off from the list, I was still curious of the solution. I banged my head in the wall for quite a while because I couldn't find an good mathematical formula to solve the problem (the real number solution would be trivial). "If you first don't succeed, try-try again." But sometimes should just stop and step back. When I found the said Stackoverflow page, that's when it dawned on me: I was right there at the solution but just didn't accept it: if there's no other way to optimize, then do a brute force search. Duh. So there it is, the method:
Find the optimal solution in case of real numbers. Vary the values in that neighbourhood in a small, but big enough range and choose the best result.

I'm still not completely satisfied because I'm not 100% sure I have the global best solution, but it should be. The original test set's numbers make this harder to check. They chose "long" integers on purpose to make the problem harder to tackle. Anyway, my results are also in the github repo, and for the original test input
15 10 658931394179
92 37 304080438521
39 100 972826846685
24 89 306549054135
64 16 254854449271
my results are:

Please let me know if there's a mistake (#1,3,4 are different from the FB original, but I checked that my results are better than the original answers.)

Problem 3: First or Last

This was the problem I originally solved, it is shown on the practice page, click on the name on the left.

Not sure how anybody else solved it, but I used two insights that I gained recently while working on other puzzles:
  1. If the relevant inputs and outputs are integers, try to avoid floating point arithmetic if possible (i.e. no division). This will eliminate the possibility of rounding errors.
  2. Sorting is extremely useful and can answer a lot of questions if one chooses the right sorting criteria.
Yes, this puzzle can be easily solved by sorting: when comparing two segments of the race track, order them such that if one overtakes in the "first" segment and drives normally in the "second", then the chance of survival is higher than the other way around. Say, the chance of crashing written as "p", as 1/p that one crashes, then  (p-1)/p is the chance that one survives. If the first segment has chances "a" and "b" to crash while overtaking and driving normally, while another segment has "x" and "y", then one has to compare (where ??? means smaller, equal or larger):
(a-1)/a * (y-1)/y ??? (b-1)/b * (x-1)/x
which can be reorganized to get rid of the divisions as
(a-1) * b * x * (y-1) ??? a * (b-1) * (x-1) * y

After this sorting there's just a matter of choosing the number of segments where one has to overtake to get to the lead and calculate the overall probability of survival on this (obviously very dangerous track).

There's one more catch, though, this problem was made more complicated as well, but in the good aim of making the solution unique: the results have to be returned as reduced fractions... At this point I getting ready to get down to some greatest common divisor magic and such, but Python comes to the rescue again: the fractions module has everything I needed...

So, this code is also in the github repo.


I really don't want to risk getting myself excited about the re-do of this round, but I know my curiosity will get the best of me and I'll be there no matter what time zone it will be or what new ambiguous problem setting FB will come up with. But I also know, that now I'll look out for the Google Code Jam even more, somehow they feel much more thorough.

But first, some sleep... :P

Update: Actually that Dijkstra's algorithm I put here for Problem 1 is a bastardized version, a stupid cross that is neither breadth-first search (since it's using weights, just all of them are equal to 1), nor Dijkstra (since all weights are fixed)... Should better rewrite it. The Algorithm Design Manual actually suggests other methods if the problem is path-search (or "motion-planning") like here, but without much detail. I guess all of that is an overkill for this situation: a breadth-first search should be just fine and simplify things. I don't regret that I did Dijkstra first, though. This is a learning process...