Dude does it in binary: Facebook Hacker Cup Round 1A, that wasn't

Monday, 17 January 2011

Facebook Hacker Cup Round 1A, that wasn't

From earlier blog posts it is clear that I like programming contests. Since I spend so much (too much?) time on Facebook, I was also looking forward to Hacker Cup 2011. Last week I accidentally ended up qualified for the first proper round (accidentally, because I had other engagements for the whole qualifier weekend, and my submission was late, but accepted). So I spent a while during the week preparing. Checked out the proper time zones (as Facebook events don't care about time zones at all, which has a number of awkward results for global events....), did a bit of practice on Coderloop, that sort of thing.

This "Round 1" supposed to have 3 sub-rounds to cater for the global audience, and those worked out to be Saturday night 11pm-2am, Sunday morning 5am-8am and Monday morning 5am-8am for me in Taiwan. Saturday evening went to a wedding a few hundred km away, so didn't expect to be back on time, but arrived home around halfway the first sub-round. Got a coffee (at midnight, that's livin' on the edge:) and set out to check the problem sets. I though maybe I qualify and then don't have to stay up, but even if I don't (since everyone else had a 1.5h lead) it's good for practice. I finished one of the problems and wanted to submit it, but it didn't work. Checked out the Facebook wall, and the organizers were saying "we know there's something wrong, just keep trying and it'll work". I tried a while longer, but then I noticed a newer announcement: they ended the round 20 minutes early because of the high number of people having problems, and I should stick around to see whether there will be 2nd and 3rd sub-round.... 2am, 3am came and passed, of course everyone's complaining on the Hacker Cup wall, and then came the decree: all other subrounds postponed at least until next week. Not very nice, considering people do make plans for things that are announced months in advance. Also, the story of the shoddy organization of the event does not end here, but a bit more about this later.

After sleeping off the slight disappointment and the lot of coffee, today I wanted to finish the other problem sets, since one can always learn from these and sometimes I just cannot rest until something is "done". That was basically the better part of today, and here's what I've learned.

*** Spoiler alert ***

Problem 1: After the dance battle

Can find the problem description on the practice page, clicking on the name on the left. That description annoyed me to no end. Maybe I'm a bit unfair because TopCoder also like to write problem sets that read more like fiction. But they at least make sure that every relevant thing is mentioned and everything is consistent. Here? No sign of that. Eg. two thing that stands out:

the example input already violates the valid input formatting (the N number of test cases should be between 10 and 50, while in the test it is 5. This is not a problem that should break anybody's reasonable solution, but looks careless.
the problem is described as taking place on a square lattice, and one moves between "adjacent" squares. There's never a definition of that adjacency. I could reasonably imagine a setting when squares diagonally to each other (i.e. the NE, NW, SW, and SE directions) are "adjacent". The only reason I know this is not the case here and only shared sides make squares adjacent is because otherwise my results for the test cases are different. Careless again.

Nevertheless, what is the puzzle? In a nutshell, one has to set up a graph of the maze: the squares as nodes, bi-directional and the connections are complicated by the "colors" (i.e. portals). I took some inspiration from the essay Python Patterns - Implementing Graphs. The next and final step is then finding the shortest path between the Start and Exit. It quickly turned out that the function in that essay is waaaay to slow for this. So go back to the basics, do a little implementing, one of the most commonly used (I think) method, Dijkstra's algorithm. Using the pseudo-code from that Wikipedia page, and checking out a couple of other sites to clarify the meaning of "relaxing" in the algorithm, it was easily done with Python lists and dictionaries...

def dijkstra(graph, start, end):
    """
    Dijkstra's algorithm, based almost entirely on the pseudo-code
    in wikipedia.
    """
    dist = {}
    prev = {}
    for node in graph.keys():
        dist[node] = float('inf')
        prev[node] = None
    dist[start] = 0
    queue = graph.keys()
    while len(queue) > 0:
        node = sorted(queue, key=lambda val: dist[val])[0]
        # Since just looking for the path from start to end
        if (node == end):
            break
        if dist[node] == float('inf'):
            break
        queue.remove(node)
        for neighbour in graph[node]:
            if (neighbour in queue):
                alt = dist[node] + 1
                if alt < dist[neighbour]:
                    dist[neighbour] = alt
                    prev[neighbour] = node
    # Find the number of steps between start and end
    steps = 0
    node = end
    while prev[node]:
        steps += 1
        node = prev[node]
    return steps

Probably it is very inefficient, but still finishes quicker than I can say "Dijkstra". And it is done, just loop through the test cases. The complete program, with the graph setup and the rest is in my github repo.
(Also, see update at the end of the post.)

Problem 2: Power Overwhelming

This one has a bit more story to it: apparently FB's own programmers messed it up and the output of the test cases was wrong in 3 times out of 5... They took it off from the practice page too, but someone posted it as a question on Stackoverflow, so there you can read the setting.

It is basically an integer programming algorithm. Most people outside of programming but still somewhat mathematically inclined would think that "integers are easier than real numbers". That is not true for optimization, where integers force another constraint on the problem that solvable mostly by brute force search.

Anyway, even if the problem is taken off from the list, I was still curious of the solution. I banged my head in the wall for quite a while because I couldn't find an good mathematical formula to solve the problem (the real number solution would be trivial). "If you first don't succeed, try-try again." But sometimes should just stop and step back. When I found the said Stackoverflow page, that's when it dawned on me: I was right there at the solution but just didn't accept it: if there's no other way to optimize, then do a brute force search. Duh. So there it is, the method:
Find the optimal solution in case of real numbers. Vary the values in that neighbourhood in a small, but big enough range and choose the best result.

I'm still not completely satisfied because I'm not 100% sure I have the global best solution, but it should be. The original test set's numbers make this harder to check. They chose "long" integers on purpose to make the problem harder to tackle. Anyway, my results are also in the github repo, and for the original test input

5
15 10 658931394179
92 37 304080438521
39 100 972826846685
24 89 306549054135
64 16 254854449271

my results are:

21964379805
1652611083
12472139015
6386438607
1991050385

Please let me know if there's a mistake (#1,3,4 are different from the FB original, but I checked that my results are better than the original answers.)

Problem 3: First or Last

This was the problem I originally solved, it is shown on the practice page, click on the name on the left.

Not sure how anybody else solved it, but I used two insights that I gained recently while working on other puzzles:

If the relevant inputs and outputs are integers, try to avoid floating point arithmetic if possible (i.e. no division). This will eliminate the possibility of rounding errors.
Sorting is extremely useful and can answer a lot of questions if one chooses the right sorting criteria.

Yes, this puzzle can be easily solved by sorting: when comparing two segments of the race track, order them such that if one overtakes in the "first" segment and drives normally in the "second", then the chance of survival is higher than the other way around. Say, the chance of crashing written as "p", as 1/p that one crashes, then (p-1)/p is the chance that one survives. If the first segment has chances "a" and "b" to crash while overtaking and driving normally, while another segment has "x" and "y", then one has to compare (where ??? means smaller, equal or larger):

(a-1)/a * (y-1)/y ??? (b-1)/b * (x-1)/x

which can be reorganized to get rid of the divisions as

(a-1) * b * x * (y-1) ??? a * (b-1) * (x-1) * y

After this sorting there's just a matter of choosing the number of segments where one has to overtake to get to the lead and calculate the overall probability of survival on this (obviously very dangerous track).

There's one more catch, though, this problem was made more complicated as well, but in the good aim of making the solution unique: the results have to be returned as reduced fractions... At this point I getting ready to get down to some greatest common divisor magic and such, but Python comes to the rescue again: the fractions module has everything I needed...

So, this code is also in the github repo.

Afterword

I really don't want to risk getting myself excited about the re-do of this round, but I know my curiosity will get the best of me and I'll be there no matter what time zone it will be or what new ambiguous problem setting FB will come up with. But I also know, that now I'll look out for the Google Code Jam even more, somehow they feel much more thorough.

But first, some sleep... :P

Update: Actually that Dijkstra's algorithm I put here for Problem 1 is a bastardized version, a stupid cross that is neither breadth-first search (since it's using weights, just all of them are equal to 1), nor Dijkstra (since all weights are fixed)... Should better rewrite it. The Algorithm Design Manual actually suggests other methods if the problem is path-search (or "motion-planning") like here, but without much detail. I guess all of that is an overkill for this situation: a breadth-first search should be just fine and simplify things. I don't regret that I did Dijkstra first, though. This is a learning process...

the example input already violates the valid input formatting (the N number of test cases should be between 10 and 50, while in the test it is 5. This is not a problem that should break anybody's reasonable solution, but looks careless.
the problem is described as taking place on a square lattice, and one moves between "adjacent" squares. There's never a definition of that adjacency. I could reasonably imagine a setting when squares diagonally to each other (i.e. the NE, NW, SW, and SE directions) are "adjacent". The only reason I know this is not the case here and only shared sides make squares adjacent is because otherwise my results for the test cases are different. Careless again.

def dijkstra(graph, start, end):
    """
    Dijkstra's algorithm, based almost entirely on the pseudo-code
    in wikipedia.
    """
    dist = {}
    prev = {}
    for node in graph.keys():
        dist[node] = float('inf')
        prev[node] = None
    dist[start] = 0
    queue = graph.keys()
    while len(queue) > 0:
        node = sorted(queue, key=lambda val: dist[val])[0]
        # Since just looking for the path from start to end
        if (node == end):
            break
        if dist[node] == float('inf'):
            break
        queue.remove(node)
        for neighbour in graph[node]:
            if (neighbour in queue):
                alt = dist[node] + 1
                if alt < dist[neighbour]:
                    dist[neighbour] = alt
                    prev[neighbour] = node
    # Find the number of steps between start and end
    steps = 0
    node = end
    while prev[node]:
        steps += 1
        node = prev[node]
    return steps

5
15 10 658931394179
92 37 304080438521
39 100 972826846685
24 89 306549054135
64 16 254854449271

my results are:

21964379805
1652611083
12472139015
6386438607
1991050385

If the relevant inputs and outputs are integers, try to avoid floating point arithmetic if possible (i.e. no division). This will eliminate the possibility of rounding errors.
Sorting is extremely useful and can answer a lot of questions if one chooses the right sorting criteria.

(a-1)/a * (y-1)/y ??? (b-1)/b * (x-1)/x

which can be reorganized to get rid of the divisions as

(a-1) * b * x * (y-1) ??? a * (b-1) * (x-1) * y

comments powered by Disqus.

blog comments powered by Disqus

Dude does it in binary

Monday, 17 January 2011

Facebook Hacker Cup Round 1A, that wasn't

Other geekheads

My CompuSphere

Labels

Blog Archive