Wow this a low activity forum. I did suggest something more friendly like Numenta's forum.

Anyway if you have neurons that take 2 inputs what is the optimal way to connect them?

It's a basic question in itself. I draw an answer from my experience with the Walsh Hadamard transform.

For an input vector with 2 to an integer power of 2 elements step through the input elements sequentially pair-wise.

Have two neurons act on each pair of elements and put the output of the first neuron sequentially in the low half of a new array. Put the output of the second neuron sequentially in the upper half of the new array. Repeat using the new array as input. After log_base_2(n) repetitions a change in a single element in the original input can affect all n outputs. Which is the best you can do.

Also this guy has a way to avoid most of the multiplications for deep neural networks:

With a big reduction in the amount of chip area and power consumption needed for specialized neural network chips.

I already showed multiply free random projections using recomputable random sign flipping and the fast Walsh Hadamard transform.

Those random projections can easily be converted to a Locality Sensitive Hash (LSH) by binarization.

LSH can be used to make associative memory. Weight the bit outputs (viewed as +1 or -1) of LSH and sum to get output value.

To train, recall and find the error. Divide the error by the number of bits and add or subtract that term to each weight as appropriate to get the error to zero.

You can reduce the amount of gating required to combine that associative memory with neural networks by making a comparison with the null file in Linux.

Data you don't want to remember can be written to a junk area in the vector input address space. You can read from a zero area or a random area if you want to recall nothing. The random area very likely returning a nonsense value close to the zero vector.

Going further you can use LSH to switch in different blocks of memory to the associative memory in various ways to make huge associative memory systems that are still relatively fast and still generalize well enough.