Back
Here is a question I was asked on an interview of one half-hardware,
hard-software company. The original question was not as formal as this.
The solution is pretty straightforward if you think/draw a bit.
1) Given this device,
_______
a1 -->| |
| |--> max(a1, a2)
a2 -->|_______|
use several such devices to compose a more complex device:
_________________________
| |
a1 -->| |
| |
a2 -->| |
| |--> max(a1, a2, a3, a4)
a3 -->| |
| |
a4 -->| |
|_________________________|
Think about using the minimal number of sub-devices. Try also to optimize
the execution time of this super-device.
2) Given the device you just created for max(a1, a2, a3, a4) use several
such devices (either receiving 2 inputs or 4 inputs) to build the device
receiving 8 inputs, i.e. max(a1, a2, a3, a4, a5, a6, a7, a8).
3) Generalize the solution for any 2**N. What is the execution time of a
super-device and how many sub-devices are needed to compose a super-device.
To find the max of 2^N inputs, you can use 2^N-1 two-input max operations,
operating in time proportional to N. Just arrange the max operations in a
tree. E.g. the solution for the 2^N = 4 is shown below. To go to 2^N=8,
just use two copies of the 4-solution in parallel, and add another
two-input max to take the maximum of the 2 halves. The extension to
general 2^N is straightforward.
a1 \
> max(a1,a2)
/ \
a2 \
> max(a1,a2,a3,a4)
/
a3 \ /
> max(a3,a4)
/
a4
Hmm, did I write "max" function? I am not sure why I did this. :)
The actual question was about sort(). So, please replace the schemes
with the following, the rest is the same.
_______
a1 -->| |--> s1 = max(a1, a2)
| |
a2 -->|_______|--> s2 = min(a1, a2)
use several such devices to compose a more complex device:
_________________________
| |
a1 -->| |--> s1 = max(a1, a2, a3, a4)
| |
a2 -->| |--> s2
| |
a3 -->| |--> s3
| |
a4 -->| |--> s4 = min(a1, a2, a3, a4)
|_________________________|
The interesting thing is to extend this to the case of 8 inputs when you
have a time pressure, i.e. should think/draw/speak in real time. :)
Sounds like a merge sort to me.
For 4 inputs, you can do this: (i# indicates an intermediate solution)
blackbox(a1,a2) -> i1, i2
blackbox(a3,a4) -> i3, i4
blackbox(i1,i3)-> s1, i5
blackbox(i2,i4)-> i6, s4
blackbox(i5,i6)-> s2,s3
For 8, it's a bit more complicated, but it's still effectively 2 large
boxes (to sort into 2 lists of 4) followed by a series of merges.
If the first '4' box gives i1-i4 and the second i5-i8, then you have these
comparisons:
i1,i5 -> s1 & i9
i2,i6 -> i10 & i11
i3,i7 -> i12, i13
i4,i8 -> i14, s8
i9, i10 -> s2, i15
i11,i12 -> i16, i17
i13,i14 -> i18, s7
i15,i16 -> s3, i19
i17,i18 -> i20, s6
i19,i20 -> s4,s5
And so on.
tomc: yes, merge sort is the way to go for optimal execution.
I initially tried bubble sort just to see how it works, it is not optimal.
However, your solution for 8 inputs/outputs is not the best one. :)
Hint: the solution should be nice and not complicated.
Here is the solution for 4 inputs/outputs in ascii art:
_______ _______
a1 -->| |--------->| |---------------------> s1
| D1 | | D1 | _______
a2 -->|_______|---. ,--->|_______|--------->| |--> s2
_______ X _______ | D1 |
a3 -->| |---' `--->| |--------->|_______|--> s3
| D1 | | D1 |
a4 -->|_______|--------->|_______|---------------------> s4
My implementation for the merge was intentionally done not using the
derived 4-box function. You can do so, sticking with the 'recursive'
model, but I think you'll find that is, to some degree, less optimal.
If you wish to use a big box (derived 4-box) in the solution, you can once
you've reduced to 4 inputs. I would argue that's less efficient however,
not more, since the reduced solution uses some pre-sorted known properties
to skip the first 'step' of the 4 box solution. A simlar type of
optimization may be possible on the front end of my solution, I didn't look
in detail at that piece.
Let's call the device with 2^N inputs/outputs "DN".
Tom: now that I think about it, you are right, we may save both time and
the number of D1 devices if we use your solution. I think we may even start
with D1 and not to use D2 at all, but I don't have a time now to verify
whether we will end up with your solution (20 D1 used) or it is possible to
improve it. Anyway, it is better than 25 D1 used in the original solution.
However the original solution is the best if you count any subdevice as one
and not just the smallest. It is always possible to use only 5 subdevices
to implement DN. Here is the solution for D3, it pretty much mimics the
solution for D2.
_______ _______
a1 -->| |--------->| |---------------------> s1
| | | |
a2 -->| |--------->| |---------------------> s2
| D2 | | D2 | _______
a3 -->| |--. ,-->| |--------->| |--> s3
| | \ / | | | |
a4 -->|_______|--. X ,-->|_______|--------->| |--> s4
_______ X X _______ | D2 |
a5 -->| |--' X '-->| |--------->| |--> s5
| | / \ | | | |
a6 -->| |--' `-->| |--------->|_______|--> s6
| D2 | | D2 |
a7 -->| |--------->| |---------------------> s7
| | | |
a8 -->|_______|--------->|_______|---------------------> s8