0 Replies Latest reply on Jan 24, 2009 8:24 AM by elgersmad

    CPUs ANDing in Assembly and what could be better

    elgersmad
      More Efficient IF Statements preformed by Boolean Gate Logic in Programming

      Hello,

      I know what kind of change could really speed up the CPUs and code execution. If you look any programmers code the most frequent Assembly Language macro that pops up is the IF statement.

      I go back to this again and again, and still I have never gotten through to the real Chip Design Engineers. If I did, MMX, 3DNow!, and Hypertransport would not be the sum of the work of my suggestions. After MMX, I got hacked off at Intel, and made sure AMD would have the fastest.

      If you take all of your basic logic gates, AND, OR, NAND, NOR, XOR, and XNOR and implement them via the CPU you can make the computer do allot more allot faster. People always tell me, it already does that. No, it doesn't. You take two bytes, then you AND the ones and zeros, like this:

      11110000
      AND
      10101010

      Result
      10100000

      The CPU should take in all 64 bits from one memory location, and AND them. If all 64 bits are not equal to 1, and no accessible flag is set, it is not doing what I said, or meant. The ALU would have 2 types of AND, 2 types of OR, one is the past convention, and the new is My Convention. All are simply 64 Input, and set a flat in the ALU. Here is an 8 bit example of how the gate should be in line on the buffer, and produce an output.

      Gate Implimentation on the 8 or 64 bit CPU Internal Buffer

      Exactly where in the CPU this Command should be Added

      Now, I've looked around and if this flag was active when either > greater than, < Less Than, or = equals were true, quite often there is already a data latch being used as a zero flag, or a carry flag that can be used when these function commands come along. There shouldn't be any results in that register by any means as a result of the command. If one set commands doesn't interfere with another, there's no reason to rewire the whole chip.

      It's not the point of how you can use Assembly Language and produce a gate of this sort. That's using up clock cycles and is no-where near the same.

      I would only require 3 clock cycles to set up a 2 Input AND gate. It would only require 3 CPU clock cycles for a 64 input AND gate. It's just bit-mask.

      Converting C to Assembly Language

      More Decompiling and Comparing

      You'll never really see how bad it is until you have 10 programmers competing at specific Base N counters, where you have first digit Base 2, second digit base 10, etc for a project compete on for speed that you'll see all of the problems and all of the errors.

      How will it ever come down to the most efficient code?

      Well, with a parallel input AND gate that's 64 bits wide and has a flag, I could check the states of a bunch of If Statements with only 1 Global Variable, and bits and pieces of inputs can be stretched throughout my code. If two things can't happen without a conflict or an error, I have exclusive OR. They are not showing that they have the best way to handle long strings and variable over 64 bits. Sometimes, you are not constructing a spell checker.

      You could use a base 10 incremented counter to almost equal the same thing. But, in reality you can tell which of what functions you may need to jump back too. Yea, if the full some of events have taken place, but know which specific one takes another loop and a test of variables and possibly strings. There's no simple solution.

      In allot places and allot of ways you'll find that it's interchangeable with specific jump instructions in Assembly Language, and you can get a simple 2 or 3 something to happen with the same number CPU Clock Cycles. But, when the instructions become long, and the number of arguments lengthy, and knowing which of what is not done, and where to go back too, is much faster than clearing all of the data, and going back over all of it, or writing in all of the error codes that machine must execute.

      Yes, the logic is nodal. But, try some really long programs, and Implementing the logic gates as a function of the ALU my way. You'll see, if you know which piece of code is associated with the first 1 or 0, you check a much smaller/64bits/inputs that really say that you've completed all of these loops, have all of those variables, collected all of those strings. A spell checker can count words, and evaluate memory space with a base 10 counter. But, when it's all invisible, and you want fast execution, you don't want the loop to start at the beginning if you know it shouldn't have to, and it would be nice if it only took 3 clock cycles to figure that out, and know where to go back to.

      Not only that, you can apply DeMorgan's Theorem, Boolean Algebra to reduce the number of instructions around the loops and important pieces of code. You can even evaluate which is really critical to the program in a whole new manner. A language of Sets derived from the algebra of sets, to indicate the presence of code in expanded memory, with just a stroke. Outputs can be attached to pointers and jump instructions.

      But, the way the Boolean Functions presently exist in the CPU doesn't allow you to use a Bit-mask for the 64 bits, and choose to AND all 64 bits as if they were just all of the inputs of one AND Gate. If I had to stack up AND functions of the ALU to have 64 working inputs, or 8, I would have an 8 by 8 grid, or 64 8 input AND Gates. That wouldn't be efficient. I realize that those functions are there, but they will not increase code execution speeds based upon how the ALU Implements them.

      This new style could destroy if in every sense in Nested If Statements, and Nested Loops except for use in spell checkers, and for file comparisons.

      For the sake of accepting previous conventions of Assembly Language and C writing styles, I would suggest that these commands would place P in front representing a Parallel input Logic Gate. PAND, POR, PNOR, PNAND, PXOR, and PNOT is 2s compliment. Equals, greater than and less than only set the flag as true when called. If the very next Assembly Language Command doesn't use it, the flag is reset.

      I did find a good simple routien that would be an excellent test to preform once the simulator has the gate implimented on the silicon. Find a square root of a number, and compare how many instruction clock cycles the CPU utilized in finding the right answer. Compare that to C, C#, C++, Fortran and Assembly Language first without then with the added commands. In every case, do everything you can to use the least amount of code, then count clock cycles for each one, and compare processors.

      Another test you could preform in code to really evaluate how much difference that these commands can make in place of IF statements, is to use the Boolean Functions where it's often written in code, IF (X=Y) AND (C>B) AND (G=(G+1)). The more times that code sequence shows up in C or the other languages, that is where up to 64 conditions could be checked in 2 CPU Clock Cycles.