Would it help at all if the Arduino program had an idea of time, instead of blindly presenting the next input on the card whenever the NES asked for it? I'm thinking something along the lines of:
main() {
wait for first input request
present first input
initialize timer to 0
wait forever
}
InputRequestInterruptRoutine() {
if(timer value > 15ms) // 16ms is about 1/60 sec
{
present next input
reset timer to 0
}
else present last input
}
This shouldn't be affected too badly by any inaccuracies in the Arduino's timing relative to the NES, since the timer is reset every frame, and it should be possible to handle lag frames as well (i.e. without pre-stripping them) if you're clever about it. (Say the NES has definitely lagged if the timer hits 18ms without a poll. In that case, add an interrupt routine for timer=18ms that resets the timer to 2ms and skips a frame of input.)