Thoughts on implementing 'unsort'

The history of Computer Science seems at times to be the history of sorting algorithms. It is one of the fundamental processes and a lot has been written about it. Creating new and obscure sorting algorithms is almost a rite of passage for any serious computer scientist.

Today, however, I was looking for what is essentially its opposite - an 'unsort' algorithm. What? Am I mad? What possible use could that be, and isn't a 'shuffle' just what I want?

Shuffle is my current closest approximation but it's too random and doesn't give me what I want which is that:

  1. Similar items should be spread apart
  2. Disimilar items can be put together

When written like this, it's a bit clearer that this is some sort of anti-sort.

For example take the list [3,2,7,2,9,3,9,2]. We can easily sort this and end up with [2,2,2,3,3,7,9,9] where we can clearly see that similar items get grouped together. But what I want to achieve is something like [2,9,3,7,2,9,3,2] where the similar items have been spread apart.

Now, I fully realise that this is undoubtably an incomplete specification and that there are any number of ways to do this depending on how we measure 'spread'.

Here's one attempt: the spread is the sum of the 'distances' of each element to each other similar element later in the list. Where 'distance' is simply the difference in the indexes of the two elements in question.

So, for [2,2,2,3,3,7,9,9], the spread would be

(1 + 2 + 1) + (1) + (1) = 6

where the first entry in parentheses measures the spread of the 2s, the second that of the 3s and the third that of the 9s.

On the other hand, for [2,9,3,7,2,9,3,2], the spread would be

(4 + 7 + 3) + (4) + (4) = 22

where the first groups are again the 2s, the second is again the 3s and the final group is again the 9s. By this measure, this second ordering is definitely more spread out.

I think the optimal 'unsort' algorithm could seek to maximise this number.

I haven't (yet!) written any code to try and create this algorithm, but that's on my list of things to do. My first task is to find the optimal 'unsorting' of this example and see how good my intuition might be.