A mathematical overview of UVA 855

Lunch in Grid City, or UVA 855, is a problem created for the Portuguese regional of the ACM-ICPC. The premise of the problem is that you are given a set of coordinates S=\{\mathbf{p}_1,\mathbf{p}_2,\dots \mathbf{p}_n\}, n<=50000 in a grid, and have to find a coordinate \mathbf{c} that minimizes the sum of the Manhattan distances between all points in S and \mathbf{c}. Formally:

\mathbf{c}=\displaystyle{\arg\!\min}_{p} \displaystyle\sum_{i=1}^{n}{|\mathbf{p}_{i_x}-\mathbf{p}_x|+|\mathbf{p}_{i_y}-\mathbf{p}_y|}

A naive approach would be to compute the distances of all points in S to all possible points in the grid, and pick up the point with smaller score. This has complexity \mathcal{O}(san), where s and a are the height and width of the grid, respectively. Since 1 \leq s,a \leq 1000, this approach would obviously not work.

Simplifying the definition

The summation that defines the score for a candidate \mathbf{p} can be broken down into two separate summations. This simplifies the problem and helps us to better analyse it:

\displaystyle\mathbf{c}={\arg\!\min}_{p} \sum_{i=1}^{n}{|\mathbf{p}_{i_x}-\mathbf{p}_x|} + \sum_{i=1}^{n}{|\mathbf{p}_{i_y}-\mathbf{p}_y|}

What this equation tells us is that we can solve the problem by separately optimizing each axis, since they are independent. Let’s focus on the x axis for the sake of clarity:

neighbors

Each component of p belongs to S

The problem specification gives a valuable hint: it gives the impression that \mathbf{p} \in S. Although this is not entirely true, we can prove that \mathbf{p}_x = \mathbf{p}_{i_x} | \mathbf{p}_i \in S and \mathbf{p}_y = \mathbf{p}_{j_y} | \mathbf{p}_j \in S.

Suppose that there is a potential solution \mathbf{p} such that \mathbf{p}_x=\mathbf{p}_{i_x} | \mathbf{p}_i \not\in S. The score on this particular dimension is given by:

\displaystyle\alpha={\!\min} \sum_{i=1}^{n}{|\mathbf{p}_{i_x}-\mathbf{p}_x|} (I)

Let k be the amount of points to the left of \mathbf{p}, and l the amount to its right. If we take the solution to be the closest point on its left, the new score would be:

\displaystyle\alpha'_l=\alpha-k\Delta x_l + l\Delta x_l=\alpha-(k-l)\Delta x_l ,

where \Delta x_l is the difference between the x coordinate from the closest point to the left and \mathbf{p}_x. Here we can notice that, if k \geq l, then \alpha'_l \leq \alpha, which means the point to the left would be preferred over our hypothetical solution. Moreover, if k < l, then picking up a point to the right would give us the new (and better) score

\displaystyle\alpha'_r=\alpha-l\Delta x_r + k\Delta x_r=\alpha-(l-k)\Delta x_r ,

where \Delta x_r is the difference computed by using the closest point to the right.

From this analysis, we conclude that our search space for \mathbf{p}_x is reduced to the x coordinates of all points \mathbf{p}_i \in S. The same argument can be made for \mathbf{p}_y (where its search space is the y component of all points in S).

The solution as a median

Let \mathbf{p}_x be the x value that minimizes the distances on the x axis. Its score \alpha is computed as given by (I). If we were to pick up the left neighbor of \mathbf{p}_x, the new score would be:

\alpha'_l=\alpha-k\Delta x_l + (l+1)\Delta x_l=\alpha-(k-l-1)\Delta x_l.

Since \alpha'_l>\alpha (otherwise, \mathbf{p}_x wouldn’t be a solution), we can say that:

\alpha-(k-l-1)\Delta x_l>\alpha,\\(k-l-1)\Delta x_l<0,\\k<l+1,\\k \leq l. (II)

This tells us that the amount of points by the left of \mathbf{p} should not be larger than the amount of points by its right. Another relation between $k$ and $l$ can be found by analyzing the score \alpha'_r obtained by picking up the point to the right of \mathbf{p}:

\alpha'_r=\alpha-l\Delta x_r + (k+1)\Delta x_r=\alpha-(l-k-1)\Delta x_r.

In this case, \alpha'_r \geq \alpha. By exploring this relation:

\alpha-(l-k-1)\Delta x_r \geq \alpha,\\ (l-k-1)\Delta x_r \leq 0,\\ k \geq l-1. (III)

By combining (II) and (III), we have that l-1 \leq k \leq l. If the total amount of points n is odd, then k=l, otherwise |l-k| would be greater than 1. In the other hand, if n is even, then k can’t be equal to l. Therefore, k=l-1.

In general terms, if you have a sorted collection of x coordinates in X, the coordinate \mathbf{p}_x that gives the best score α is given by

\mathbf{p}_x=X[{\lfloor\frac{n-1}{2}\rfloor}],

considering an indexation that starts at 0. This is essentially the median of the collection X. Once again, the same argument also applies to the y coordinates.

Final thoughts

We conclude that, in order to solve the Lunch in Grid City problem, we have to:

  • Create two different vectors, one with all x coordinates and another with all y coordinates of the given points in S (referred to as the list of friends in the problem description);
  • Sort each vector;
  • Return \langle X[{\lfloor\frac{n-1}{2}\rfloor}], Y[{\lfloor\frac{n-1}{2}\rfloor}] \rangle .

This solution has complexity O(n\log n), where the bottleneck is the sorting algorithm.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s