Joined: 4/18/2007
Posts: 88
Location: Tokyo, Japan
I will post several snippets of code in C++. Most of the problems here have nothing to do with the actual language being used and this is not a syntax/be-sure-everything-was-declared set of questions. The floating-point numbers are completely arbitrary and have no meaning in this quiz. The integral numbers can be assumed significant to the solutions. Each of these snippets is wrong. Your objective is to rewrite them so that they are correct. In order to relax your minds, I will tell you not to waste your time looking at the subtle details such as using :: scoping on global functions or using UL after numbers to indicate their signs and sizes. Additionally, you can assume the types of values based on prefixes. f = float. st = size_t. i = int. d = double. ul = unsigned long. #1: Solved first by Bisqwit
CUtils::CallFunction( fX / 16.0f, fY / 16.0f );
#2: Solved first by Bisqwit
vVec.x = ::cos( fA );
vVec.y = -::sin( fA );
vVec.z = ::sin( fA );
#3: Solved first by Bisqwit
for ( int I = static_cast<int>(vVector.size() - 1UL); I >= 0; --I ) {
   // Do whatever.
}
#4: Solved first by amaurea
// Our objective here is ONLY to get the maximum value in a vector of unsigned longs.
unsigned long ulMax = 0UL;
for ( size_t I = 0UL; I < vVector.size(); ++I ) {
    ulMax = std::max( ulMax, vVector[I] );
}
#5: Solved first by Bisqwit
++ulSwitch;
ulSwitch %= 2UL;
#6: Solved first by amaurea (additional credit to Bisqwit)
ulX = ulIndex % 16UL;
ulY = ulIndex / 16UL;
#7: Solved first by Dromiceius
for ( int I = 0; I < iTotal; ++I ) {
    // Do something.
}
#8: Solved first by marzojr
struct SOMESTRUCT {
    long lMember0;
    void * pvMember1;
    __int64 i64Member2;
};
More questions may follow. This quiz is for fun, but you definitely do earn bragging rights for solving all of these problems. The first person to solve each problem with its 100% answer will have his or her name posted above the problem. Multiple answers may be correct, or variants of a single answer may be correct. Grading is lenient, so you get credit for an answer that illustrates the correct concept(s) regardless of whether it is syntactically correct or any other insignificant details. L. Spiro
Post subject: Re: Fun Quiz
Editor, Active player (296)
Joined: 3/8/2004
Posts: 7469
Location: Arzareth
#1: Hard to guess what you are going for here. Maybe it's related to the fact that the variables are still going to be passed as doubles, and you think double division would be more efficient than float division? I shrug. #2. My guess is that this is not the proper formula for whatever vector calculation you are going for in here. Maybe you want a sin(fA)-cos(fA) somewhere. #3: Casting after subtracting 1UL? Well that's weird. I'd do -1 after the cast. That is highly suspect. In fact, if I want to loop backwards, I prefer this: for(size_t s = vvector.size(); s-- > 0; ) #4: Initializing I to 0UL probably is not a problem, but it's weird. Is vVector a vector of unsigned longs? Also, you haven't considered the case of the vector being empty -- in that case you see, 0 is not the maximum; NaN is. :) #5: "wrong"? eh. Well, if this is what you want... ulSwitch = (ulSwitch & 1)^1; #6: "wrong"? eh. Do you want anding and shifting? I'll pass. The compiler can do that. If you want to imply that you wanted :16 bit fixed-point math, just replace 16 with 65536. #7: I'd use lowercase for i. #8: I wouldn't use hungarian notation. Oh, and you may want to rearrange the elements so that int64 comes first and the others come last; this saves some space on 32-bit platforms, but not on 64-bit ones.
Joined: 4/18/2007
Posts: 88
Location: Tokyo, Japan
#1: Afraid this is not the solution. #2: In this quiz, all of the code produces the desired output. Imagine it this way: You have stepped onto a new project and browsed through the existing code. You spot some of these problems from this quiz and, without knowing why his code is how it is, you can modify his code on-the-spot to provide the same result he is getting, but better. #3: Correct! Your backwards unsigned loop is the best way. Compare against 0 whenever possible (there are so many ways for compilers to optimize a comparison against 0) and you are guaranteed to avoid the case where the limiter is reloaded each iteration (which may involve function calls, pointers, or both). You can also avoid such a situation by storing the limiter in a local, but you have used more stack space and still will not meet the speed you get from just going backwards. A backwards loop in the format you gave should always be preferred whenever possible. #4: We will assume the vector is of unsigned longs since we have no other reason to believe so. And we will just assume that 0 is the maximum of a 0-length list, simply because that is the result of the existing example. #5: That is definitely faster than the existing code, but can you do more to it? You are allowed to assume that the number will never be out of the valid range enforced by the example code. That being said, your code IS the best strict replacement for the example code; if we can not assume that the value will always be either 0 or 1, then your code is the best answer. And you are also the only person so far to give that answer. #6: The compiler may or may not. We probably can trust the compiler in this case, but I would rather raise awareness. Of course that means raising awareness to the fact that the compiler probably will “fix” this for you. The perfect answer should probably include this notice, but the actual code will be helpful in more situations than just in this one. #7: In the other places where I have posted this, this question proved the most difficult and last to be solved. #8: You are on the right road, but there is one very tiny consideration that, if taken into account, will lead you to one order that is superior over all others. And you are probably going to hate this one because of that consideration. L. Spiro
Former player
Joined: 2/19/2007
Posts: 424
Location: UK
#1: No idea what you're after here. I guess you could try to be faster (but unportable) by subtracting 4 from the exponent of the floating point numbers instead, that isn't what I would call a more correct solution. #2: Replace the last one with: vVec.z = -vVec.y;. #3: Already solved. #4: I guess you want the backwards looping here too, then, since you have no faith in the compiler's ability to optimize: for(size_t I = vVector.size(); I--;) ulMax = std::max(ulMax, vVector[I]); However, I would do this this way: unsigned long ulMax = vVector.empty() ? 0 : *std::max_element(vVector.begin(),vVector.end()); #5: I would have gone for ulSwitch = !(ulSwitch&1). But with the extra assumption that ulSwitch should be 0 or 1 to begin with, I would just do ulSwitch = !ulSwitch. #6: ulX = ulIndex & 0xf; ulY = ulIndex >> 4; #7: I don't think we can answer this without knowledge of what "Do something" is. Looks fine to me, though. #8: Let's see.. If we assume that alignment = size for the members (not always true), then if all have the same size (8), then the order doesn't matter. If only one has size 4, then the order still doesn't matter, as the whole struct will be padded to fit the alignment of its most stringent member. If two of them are size 4, then they should be put together. The initial code already has this property, as does Bisqwit's suggestion, so if those are wrong, I can't see how to improve them. Edit: Fixed typo.
Editor, Active player (296)
Joined: 3/8/2004
Posts: 7469
Location: Arzareth
L-Spiro wrote:
#2: In this quiz, all of the code produces the desired output.
Maybe then you want to do vVec.z = ::sin(fA); vVec.y = -vVec.z; which is something the compiler should be able to do anyway; CSE is not exactly rocket science. Or maybe you want sincos(fA, &vVec.z, &vVec.x); vVec.y = -vVec.z;, which incidentally is more or less exactly what GCC 4.3.4 produces of the original code (ICC 11.1, too), sans redundant memory reads.
L-Spiro wrote:
#5: That being said, your code IS the best strict replacement for the example code; if we can not assume that the value will always be either 0 or 1, then your code is the best answer. And you are also the only person so far to give that answer.
Not exactly, ulSwitch = (~ulSwitch) & 1 may be even better. It was nowhere indicated that it should be assumed that the values of ulSwitch are 0 or 1 beforehand. If they are, one should use bool and the ! operator instead (since you are already going with C++). :) At least on GCC, ! for bool value becomes a xor by 1 on some platforms. If you insist, I can spell it here: ulSwitch ^= 1; #7: Oh you've got to be kidding me. If iTotal is never negative (apologies for missing indents, code and spoiler doesn't work together): do { unsigned n = iTotal >> 3; switch( iTotal & 7 ) { case 0: while(n-- > 0) { /* do something */; case 7: /* do something */; case 6: /* do something */; case 5: /* do something */; case 4: /* do something */; case 3: /* do something */; case 2: /* do something */; case 1: /* do something */; } } } while(0); If iTotal may be negative: #ifdef __GNUC__ # define likely(x) __builtin_expect(!!(x), 1) # define unlikely(x) __builtin_expect(!!(x), 0) #else # define likely(x) (x) # define unlikely(x) (x) #endif do { int i = 0; if(likely((i < iTotal))) { unsigned n = (iTotal+7) >> 3; switch( (iTotal) & 7 ) { case 0: do { /* do something */; case 7: /* do something */; case 6: /* do something */; case 5: /* do something */; case 4: /* do something */; case 3: /* do something */; case 2: /* do something */; case 1: /* do something */;} while(--n > 0)); } } } while(0); You may also be looking for a less duffy solution: { int i=0; for(; i+4 <= iTotal; i += 4) { /* code, code, code, code */ } switch(iTotal-i) { case 3: /* code */ case 2: /* code */ case 1: /* code */ default:break; } } But beware that manual loop unrolling, which the compilers are often capable of doing themselves, robs the compiler of its oversight on opportunities for loop vectorization.
Joined: 4/18/2007
Posts: 88
Location: Tokyo, Japan
amaurea: #2: Almost. #4: Correct! The answer is the same as for #3. This is just to raise awareness to the fact that there are so many places where you can loop from the top down, and you should when you can. This is not an optimization the compiler can do for you, because there are times when the direction of the loop matters, and you have to be careful before using this solution. At least we know that the example code will provide the same result in either direction, so it is better to go backwards. #5: Both of those are indeed much faster than the given code, but can you do something more? #6: Correct! Bitwise operations are always faster than division and especially modulus. When the right side is a constant that is a power of 2 you should consider using bitwise operations instead. As mentioned before, the compiler may do this for you. You can decide on your own whether to trust the compiler. #8: You are not wrong, but once you see the reason for the answer you will probably groan, turn green, put on purple shorts, and cause havoc. Bisqwit: #2: Correct! Unfortunately the actual case of this code I saw at work which inspired this example did not compile that way. It was actually in C#, which is why here I am trying to avoid relying on the compiler so much, since, although I use C++ as the basis for these examples, this is actually about concepts that work in any language. I admit that some of them do lean heavily towards C/C++, but this is for fun and I hope that coders of many languages can take away as much as possible. You get extra points for being the first ever to mention sincos, which, if available, should definitely be used whenever the sine and cosine of a single number are needed. #5: Correct! Of course your previous answer and your rebuff of it are all fine answers too. #7: These are all not bad at improving speed, but is there something else you can do? Less duffy. L. Spiro
Editor, Active player (296)
Joined: 3/8/2004
Posts: 7469
Location: Arzareth
L-Spiro wrote:
#7: These are all not bad at improving speed, but is there something else you can do? Less duffy.
It's not the same as #3 and #4? Eh. I'll pass if that's the case. #8: struct SOMEstruct { int_least64_t i64Member2; void* pvMember1; long lMember0; char padding[ (sizeof(void*)*4) - sizeof(int_least64_t) - sizeof(void*) - sizeof(long) ]; }; I wouldn't use something such unportable as __int64. Rather, int_least64_t which can be found in stdint.h. The padding is here to make it properly aligned when stored in an array. On 32-bit, this pads to 16 bytes (otherwise 8+4+4 = 16 bytes) if the compiler does not choke on the zero-length array. On 64-bit, this pads to 32 bytes (otherwise 8+8+8 = 24 bytes). I was also thinking of possible limitations on various addressing modes and whether it makes sense to place some particular element at zero offset in order to make the access opcode shorter, but I cannot think f any reason why one of those elements should be prioritized over another.
Joined: 4/18/2007
Posts: 88
Location: Tokyo, Japan
#7 is indeed not the same as #3 and #4. But don't pass just yet. All of these examples (or at least most) could be further optimized if you are willing to rewrite tons of it (such as with loop unrolling) and prefetching/caching better, etc., but this is all about the simple things you can do to write better code. As such, the answer for #7 is also based on simplicity, and I am at least willing to tell you that it is similar to #3 and #4. If I give any more hints than that then it may become too simple to solve, so I have to stop there. As for #8, although I tried to think of questions that would work in multiple languages, we can all tell easily that this one is definitely a C/C++ issue. You are coming up with some good ideas for this one, but I know at least my compiler-of-choice chokes on 0-sized arrays. You are thinking quite deep into this one, so let me let you relax a bit and focus just on the order of the items. And apparently a lot of people are going to hate the solution for this one. L. Spiro
Editor, Active player (296)
Joined: 3/8/2004
Posts: 7469
Location: Arzareth
L-Spiro wrote:
As such, the answer for #7 is also based on simplicity, and I am at least willing to tell you that it is similar to #3 and #4. If I give any more hints than that then it may become too simple to solve, so I have to stop there.
Well, this is pointless, but here goes. #7: for ( int I = iTotal; I-- > 0; ) { // Do something. } #1: Dost thou mean: CallFunction(fX * 0.0625, fY * 0.0625)? Falls to the category "redundant". See, here's what GCC produces for the original code at -O1: _Z1gv: movsd .LC0(%rip), %xmm1 movapd %xmm1, %xmm0 mulsd fY(%rip), %xmm1 mulsd fX(%rip), %xmm0 jmp CallFunction <...> .LC0: .long 0, 1068498944 At most, this is a pessimization, because it hides the purpose of the code. Remember, the source code is also a documentation ― first and moremostly, it should document the intended purpose of the code, and only next, convey the practical means used to accomplish the purpose, in the simplest and easiest to maintain manner possible. At least that is the view I hold firmly and dearly. If you have got SSE2, you can also go with: #include <emmintrin.h> <...> __m128d tmp2 = _mm_mul_pd(_mm_set_pd(fX,fY), _mm_set1_pd(1.0 / 16.0)); CUtils::CallFunction(_mm_cvtsd_f64(tmp2), _mm_cvtsd_f64(_mm_unpackhi_pd(tmp2,tmp2)) ); >:-B , producing this assembly code: _Z1gv: movsd fY(%rip), %xmm0 movhpd fX(%rip), %xmm0 mulpd .LC0(%rip), %xmm0 movapd %xmm0, %xmm1 unpckhpd %xmm1, %xmm1 jmp CallFunction <...> .LC0: .long 0, 1068498944, 0, 1068498944
Joined: 10/3/2005
Posts: 1332
L-Spiro wrote:
#7 is also based on simplicity,
The answer isn't while(--iTotal >= 0) is it?
Joined: 4/18/2007
Posts: 88
Location: Tokyo, Japan
#1: Correct! At least in concept. The 100% answer is: CUtils::CallFunction( (1.0f / 16.0f) * fX, (1.0f / 16.0f) * fY ); Division is up to 25 times slower than multiplication and should be avoided religiously. The subtle points that go into the answer are: Rule #1: Put the constants to the left of the variable. This guarantees that the compiler will perform the division at compile-time even if you forget the parentheses. Rule #2: Use parentheses anyway. It makes it clear to viewers that that constant will be evaluated by itself, and ensures no complications with other multiplications or divisions. Rule #3: Use the long form of 1.0f / 16.0f instead of just using the constant 0.0625f. This allows everyone to easily determine how the constant was derived and allows them to easily reconsider the problem as a division rather than a multiplication of an arbitrary number. As for compiler optimizations, again they may or may not take place on this exact code, but the point here is the concept, since more complicated expressions are guaranteed not to be optimized away by the compiler. Compiler optimizations are highly subjective, and the level of optimizations they are willing to perform varies not only by compiler, but by platform. If you plan for your code to be executed on many platforms and compilers, it is best to play safe and do the simple optimizations yourself. Additionally, compiler optimizations may be completely disabled for debugging purposes. Debug builds are not as important, but it is nice if they run at a reasonable speed. And my final reason for personally not trusting compilers too much: Metrowerks CodeWarrior. Up to you how you use these answers though. They are just something to consider. L. Spiro
Joined: 4/18/2007
Posts: 88
Location: Tokyo, Japan
Dromiceius wrote:
L-Spiro wrote:
#7 is also based on simplicity,
The answer isn't while(--iTotal >= 0) is it?
Correct! Though I am not picky on whether you use a for-loop or a while-loop. The for-loop form looks like: for ( int I = iTotal; --I >= 0; ) We barrow the idea from #3 to walk backwards down the loop if this is possible (here we assume it is). But the fact that the value is signed changed the game a bit. Falling back on an old rule of prefix operators never being slower than and sometimes being faster than postfix operators, we prefer --I over I--. Then we have to change the comparison against 0 to >= from >. In C/C++, it is rare to encounter a list/array that would not be better off unsigned, but in C# and Java this happens constantly, and that is why the signed version of the backwards loop is important to know. In Java, this is actually a huge gain over the alternative, as it produces smaller code and considerably faster code at that. This type of loop is an absolute must for embedded development. All that remains is the evil #8. Which everyone is going to hate. L. Spiro
Active player (353)
Joined: 1/16/2008
Posts: 358
Location: The Netherlands
#8 Maybe the pointer should go last so the referenced item can be placed directly adjecent (in memory)? (edit: although I do not know whether this even has any true advantage other than being neat :P)
TASes: [URL=http://tasvideos.org/Movies-298up-Obs.html]Mr. Nutz (SNES), Young Merlin 100% (SNES), Animaniacs 100% (SNES)[/URL]
Joined: 4/18/2007
Posts: 88
Location: Tokyo, Japan
DaTeL237 wrote:
#8 Maybe the pointer should go last so the referenced item can be placed directly adjecent (in memory)?
I am afraid not. Just a note: The orders of the items are limited so much that simply guessing could land you on the right order. For this question you must also explain why the order is correct. L. Spiro
Banned User, Former player
Joined: 3/10/2004
Posts: 7698
Location: Finland
This reminds me of the list of advanced C++ questions I once compiled (based on things I have actually had to learn): 1) STL containers have, among others, two special constructors: One which takes an integral and a reference to the value type (to initialize the container with the specified amount of values) and another which takes two iterators (to initialize the container with a value range specified by those iterators). In other words, the containers will have two constructors in the form:
Containter::Container(size_type amount, const value_type& value);

template<typename Iterator>
Container::Container(Iterator startIter, Iterator endIter);
Suppose you do this:
std::vector<int> container(10, 5);
In principle that constructor call matches both of the above constructors. The compiler will choose the one which matches best. In this case the constructor taking two iterators will be a best match (because both parameters are of the exact same type). In other words, the "wrong" constructor will be called (yes, this does actually happen). However, the STL containers still manage to do the right thing (rather than giving a compiler error because ints are not iterators). How? 2) Give an example code where using static_cast gives a compiler error, while using dynamic_cast compiles and works ok. The only difference between the two cases is the keyword, otherwise the codes must be the same. Also explain why this is so. (And no, this problem does not involve precompiler macro trickery or anything of the sorts. That's not what this problem is asking. It's about the semantics of static_cast vs. dynamic_cast.) 3) Explain the concept of 'placement new' and why/when it can be useful. Give an example of an actual usage in the standard library. Also explain how objects constructed in this way can be destroyed properly. 4) STL containers support user-defined memory allocators (specified as class template parameter, instances of these allocators being given as constructor parameter). By default, instantiating eg. a list of integers is completely equivalent to this:
std::list<int, std::allocator<int> > theList((std::allocator<int>()));
The std::list is given an allocator of type int (that is, an allocator which is configured to allocate elements of type int; among other things, this makes it have a function for constructing objects of that type). However, std::list will not allocate objects of type int. Instead, it will allocate objects of an internal type (usually some struct which contains the int and two pointers as members). However, it was not given an allocator which was configured to allocate objects of that internal type. Thus it cannot use the given allocator directly for allocating those objects. How is this problem solved within STL allocators? 5) Notice how in the previous example a set of extra parentheses was used in the constructor call. This was not a mistake. Explain why those extra parentheses are necessary.
sgrunt
He/Him
Emulator Coder, Former player
Joined: 10/28/2007
Posts: 1360
Location: The dark horror in the back of your mind
Warp wrote:
2) Give an example code where using static_cast gives a compiler error, while using dynamic_cast compiles and works ok. The only difference between the two cases is the keyword, otherwise the codes must be the same. Also explain why this is so. (And no, this problem does not involve precompiler macro trickery or anything of the sorts. That's not what this problem is asking. It's about the semantics of static_cast vs. dynamic_cast.)
Just had to tackle this one - it reminds me of working with a particularly nasty class hierarchy once where I needed to put this knowledge to use.
#include <iostream>

class typeA {
  public:
  virtual void someMethod() {
    std::cout << "typeA" << std::endl;
  }
};

class typeB {
  public:
  virtual void someMethod() {
    std::cout << "typeB" << std::endl;
  }
};

class typeC : public typeA, public typeB {};

int main(int argc, char** argv) {
  typeA* foo = new typeC();

  typeB* bar = dynamic_cast<typeB>(foo);

  if (bar) {
    bar->someMethod();
  } else {
    std::cout << "foo is not typeB" << std::endl;
  }

  delete foo; // not strictly necessary, but I want to set a good example

  return 0;
}
If you replace dynamic_cast with static_cast, you get
example.cc:22: error: invalid static_cast from type ‘typeA*’ to type ‘typeB*’
Note how typeA and typeB are, outside of typeC, unrelated. This is why static_cast throws a compile error - the two classes aren't directly hierarchically related to one another, so it's not possible to cast up or down in the fashion that a static_cast would need. (The explanation sucks and doesn't really do the concept justice, but the example's there if someone wants to explain it better. When I don't need to run off to be somewhere in 10 minutes, perhaps I'll come back and flesh this out further.)
Tub
Joined: 6/25/2005
Posts: 1377
wow, feels like 1990. I absolutely agree with bisqwit that readability should be preferred (despite his IOCCC entries ;)). If you're going for total speed, do some profiling first. Then comment the code you're "improving" so it remains understandable. #2: sincos() is a GNU extension and will reduce portability. bad. Apparently, a good compiler can optimize the original code anyway. If you use a compiler without CSE in 2010, you have bigger worries than bottom-up-loops. But yeah, vVec.y = - vVec.z; looks good, too. #3: seeing that kind of type-casting makes me puke. If you're going for readability, you shouldn't use the int in the first place. No matter which direction you loop in. Always going top-down isn't the speedy answer either way, because other factors play a more important role than comparisons against 0. Need to iterate a large array twice? Do the first one top-down, the second one bottom-up. That way you'll start the second loop with data that's currently in your CPU cache. Or just stick to the readable code. #4: again, the correct solution is the builtin, not a top-down-loop. #5: yeah, the optimization you're looking for is bool, not unreadable bit-functions. Mapping to fast asm code is the compiler's job. #7: the "solution" isn't equal, you're changing iTotal. That sounds like a variable name that may be important. If I need to use iTotal further down the function I get nasty errors. It may (or may not) save a few cycles, nothing worth wasting time on when I change the function later. Heck, I often define those variables const, just to avoid such nasties, i.e.
const int size = <black magic here>;
T *array = new T[size];
#8: using a void * is a sin, but that's not what you're after. There's no need for padding either, since sizeof(SOMESTRUCT) will return the size including any end-of-struct-padding, so that sizeof(SOMESTRUCT[100]) == 100 * sizeof(SOMESTRUCT) and (char *) &arr[1] - (char *) &arr[0] == sizeof(SOMESTRUCT) remains valid at all times. Maybe you'll want the pointer first? That way dereferencing the pointer can be done by just dereferencing the struct's location instead of dereference + add + dereference again? (only valid if you maintain pointers to instances of the struct, I don't think it'd save anything for stack variables)
m00
Banned User, Former player
Joined: 3/10/2004
Posts: 7698
Location: Finland
sgrunt wrote:
Just had to tackle this one - it reminds me of working with a particularly nasty class hierarchy once where I needed to put this knowledge to use.
Yes, that's the (AFAIK) simplest case where static_cast does not even compile but dynamic_cast compiles and works as intended. It's also a good demonstration of why dynamic_cast has at worst linear complexity in relation to the size of the inheritance hierarchy (because at worst it has to traverse the entire hierarchy to check if the cast is valid, and then change the pointer accordingly, as there might not be any way to use compile-time information to make it faster). Not that this is usually any problem, though.
Joined: 4/18/2007
Posts: 88
Location: Tokyo, Japan
Tub wrote:
wow, feels like 1990. I absolutely agree with bisqwit that readability should be preferred (despite his IOCCC entries ;)). If you’re going for total speed, do some profiling first. Then comment the code you’re "improving" so it remains understandable.
Readability is the motive for one of the rules guiding #1. Rather than simply using a constant, express the 1/X value so that people can easily so that this is really a division by X.
Tub wrote:
#3: seeing that kind of type-casting makes me puke. If you’re going for readability, you shouldn’t use the int in the first place. No matter which direction you loop in.
Absolutely correct. The first point of this solution is not to use integers for unsigned loops.
Tub wrote:
Always going top-down isn’t the speedy answer either way, because other factors play a more important role than comparisons against 0. Need to iterate a large array twice? Do the first one top-down, the second one bottom-up. That way you’ll start the second loop with data that’s currently in your CPU cache. Or just stick to the readable code.
Also very valid points. This should all be taken into consideration before making changes. For the sake of simplicity, we do not want to take too many considerations into consideration[sic], but in the real world you need to use your head at all times and keep an eye out for other factors.
Tub wrote:
#4: again, the correct solution is the builtin, not a top-down-loop.
I am afraid not. Top-down wins here. Especially in Java (barring the changes you need to make to account for the signed type).
Tub wrote:
#5: yeah, the optimization you’re looking for is bool, not unreadable bit-functions. Mapping to fast asm code is the compiler’s job.
I am afraid not. Point in case: Coding a Nintendo DS game using Metrowerks CodeWarrior. Being Nintendo DS, you need to take optimization seriously. It is not a powerhouse. But amazingly, despite this, CodeWarrior does not actually make most of the optimizations you take for granted. Just because your compiler does it does not mean they all do. And if your code is built to run in multi-platform performance-intensive situations then you can expect a significant change in performance across various platforms unless you code in a way that simply leaves the compiler no choice. Most of my old team were underlings and interns, but even they had no problems figuring out what the bit operations were doing once they saw them. They hadn’t thought of using them before, but after seeing them once they understood the idea fine and starting using them in their own code. Interns! But you can make your decisions based on what you feel is best for your work style and for those around you. In other words, although I am giving credit for one specific answer in most cases, I actually want to stress that you need to use your own head and decide for yourself whether the answer is really the best for you.
Tub wrote:
#7: the "solution" isn’t equal, you’re changing iTotal. That sounds like a variable name that may be important. If I need to use iTotal further down the function I get nasty errors. It may (or may not) save a few cycles, nothing worth wasting time on when I change the function later. Heck, I often define those variables const, just to avoid such nasties, i.e.
const int size = <black>;
T *array = new T[size];
Absolutely correct! Sorry, this was my mistake. Leaving iTotal changed is against the rules for the reasons you gave here. In the past I did not accept this answer for that reason, but somehow I missed it here. I was in a rush or something. Nonetheless you are right and I should not have accepted that answer. The for-loop I posted is what I wanted. My apologies.
Tub wrote:
#8: using a void * is a sin, but that’s not what you’re after. There’s no need for padding either, since sizeof(SOMESTRUCT) will return the size including any end-of-struct-padding, so that sizeof(SOMESTRUCT[100]) == 100 * sizeof(SOMESTRUCT) and (char *) &arr[1] - (char *) &arr[0] == sizeof(SOMESTRUCT) remains valid at all times. Maybe you’ll want the pointer first? That way dereferencing the pointer can be done by just dereferencing the struct’s location instead of dereference + add + dereference again? (only valid if you maintain pointers to instances of the struct, I don’t think it’d save anything for stack variables)
You are right about the padding. The compiler will do that for you (now don’t I sound like a hypocrite? =P ) and at best, if we pad it ourselves, we are just matching the result of what the compiler would do anyway. You have a good idea for your order, but for this question there is something else to consider. I think I need to buy a riot shield and have it handy when I release (or confirm) the answer for #8. L. Spiro
Tub
Joined: 6/25/2005
Posts: 1377
L-Spiro wrote:
Point in case: Coding a Nintendo DS game using Metrowerks CodeWarrior. Being Nintendo DS, you need to take optimization seriously.
true, but as always: do tests before optimizing. Telling everyone to just use bit-functions because it worked on the DS may be a waste of effort (and code-readability) or even contra-productive on different platforms. Even then, I'd leave the code something like this, to make sure the intent is clear.
state = (~state) & 1; // state = !state, this saves x cycles
or even use a macro
// this saves x cycles per call on the DS because the compiler is stupid
#define NEGATE(a) a = (~a) & 1
...
NEGATE(state);
(use an inline function if the compiler supports it for obvious reasons.) I'm tutoring the C++ classes at our university, if I told the students there to use bit functions instead of readable operators, they'd kill me. There's also something to be said about replacing array-syntax with manual pointer manipulations (i.e. throwing everything into a 1-dimensional array and working from there). I've done it once because I wanted to win a benchmark for calculating the Levenshtein distance (as in: return all strings in a set where l_dist(query, word) < k). The program became slower. The compiler didn't understand the intent of my code and stopped it's own optimizations. Interestingly, using an array of int was slightly faster than using an array of bytes and the order in which I filled the matrix was important due to caching. But then I cut the time by half by applying a simple heuristic and bailing if the distance is too large. I shouldn't have focused on cycle-count optimizations in the first place, at least not before exhausting all optimizations on the algorithm level. The equivalent Java-code was 3 to 5 times slower on all cases, please don't talk about "java" and "performance" in the same sentence. ;) The best java-optimization would be to avoid excessive object allocation - so far all bottlenecks I've encountered were somehow related to that.
You are right about the padding. The compiler will do that for you (now don’t I sound like a hypocrite? =P )
to be fair, I think that's guaranteed by the standard ;) about the first example:
#include <stdio.h>

int main()
{
	double x = 3.1415;
	int i;
	for (i=1000000;i;i--)
	{
#if 0
		x /= 1.234;
		x /= 2.345;
		x /= 0.3455;
#else
		x *= (1.0 / 1.234);
		x *= (1.0 / 2.345);
		x *= (1.0 / 0.3455);
#endif
	}
	printf("%f\n", x);
	return 0;
}
division
27458608301646260078734700844078636923901316520727690355595540752359233165371709431870541266944.000000
real    0m0.117s

division, -O2
27458608301646260078734700844078636923901316520727690355595540752359233165371709431870541266944.000000
real    0m0.083s

multiplication
27458608301819373883828058048884692489328705290484643113470060021220783831766899877207431184384.000000
real    0m0.045s

multiplication, -O2
27458608301819373883828058048884692489328705290484643113470060021220783831766899877207431184384.000000
real    0m0.024s
multiplication is obviously faster, but also obviously leads to different results. Be careful there.
m00
marzojr
He/Him
Experienced player (749)
Joined: 9/29/2008
Posts: 964
Location: 🇫🇷 France
For #8: ignoring the possibilities of (16-bit or lower) and (128-bit or higher) processors, I guess you are after is the __int64 before the pointer before the long. The reason is that, discarding the processor possibilities mentioned above, __int64 will always be at least as large as any pointer type, which will always be at least as large as a long integer (64-bit Windows being, as always, the exception to the "at least", having 32-bit longs). I would also made sure that you aren't tied down to Windows compilers by using __int64, too: it would be best to use int64_t and typedef from __int64 (M$, Borland and Intel compilers) or from long long (other compilers). More so given that the next C++ standard will very likely use C99's int64_t and family, and will certainly use long long instead of __int64.
Marzo Junior
Joined: 4/18/2007
Posts: 88
Location: Tokyo, Japan
Correct! You have Microsoft to thank for the headaches surrounding #8. On any compiler besides Microsoft Visual Studio, the order does not matter on 64-bit builds, but Microsoft decided to keep unsigned long at 32 bits for compatibility. Unfortunately, because their compiler is so commonly used (but apparently everywhere not here), their little oddities really should be taken into account. Your 64-bit type-of-choice goes first, because it is always 64 bits. Pointers follow because they are always 32 bits, then 64 bits. The unsigned long goes last because it is usually 32 then 64, but sometimes 32 then 32. /Me puts up shields. L. Spiro
Editor, Active player (296)
Joined: 3/8/2004
Posts: 7469
Location: Arzareth
So the "tiny consideration" was regarding the relative placement of the void* and the long, in that on 64-bit Windows, void* and long are not the same size because "long" is still 32 bits? I see. How about platforms where long is larger than void*? Such as 16-bit Borland C++. Your "best" selection would lead into the following memory layoutexcept that BC aligns at 2, never at 4:
Addr Size Data
0000 8    64-bit object (int64)
0008 2    16-bit object (void*)
000A 2    <padding>
000C 4    32-bit object (long)
0010 <end>
You yourself said that you are programming for various non-PC platforms, such as the Nintendo DS, so you cannot exclude 16-bit platforms, since those are used often for microcontrollers.
marzojr
He/Him
Experienced player (749)
Joined: 9/29/2008
Posts: 964
Location: 🇫🇷 France
How about platforms where long is larger than void*? Such as 16-bit Borland C++.
Or the (future) platforms in which void * is larger than __int64 and have to come first. These were the reasons I explicitly did not consider these cases in the problem's solution. The issue is further complicated because M$ (who else...) had 16-bit longs in 16-bit editions of Visual C++. Oh, the headaches...
Marzo Junior
Joined: 4/18/2007
Posts: 88
Location: Tokyo, Japan
Yes, all of this headache just for that tiny detail. Your note about 16-bit platforms is entirely valid and should be taken into consideration by anyone who codes on such a platform. It is true that I assumed the portability would be only between 32-bit and 64-bit, but the fact is I do not know anyone who codes on 16-bit platforms. Nintendo DS can run 16-bit thumb code but both of its processors are actually 32 bits. Most mobile development is done with Java, which is a different bag anyway, but the iPhone/iPod touch is also a 32-bit machine. I do not know anyone who codes lower than these. But if you or a loved one does code on 16-bit beasts, then these extra factors should be taken into account. I should say that the correct answer is only correct between 32-bit and 64-bit machines. L. Spiro