Ah… the joys of experimental hardware.
As many of you know, node n62 has been causing us grief. I’ve been trying to run some large MPI test jobs and this one node has been refusing to boot a MPI ACE files.
The good news is that I have a big clue. After testing several “known to work” ACE files (all of which failed), I built the simplest BIT file I could think of. It worked! The only difference was that I used the on-board DDR memory rather than DDR2 in the DIMM slot. Next I built a standalone C system that runs out of BRAM and exercises the DDR2. Here’s the result:
-- Entering main() -- Starting MemoryTest for DDR2_SDRAM_32Mx64: Running 32-bit test...FAILED! Running 16-it test...
The good news is that I think the problem is that the DDR2 DIMM is not seated properly.
The bad news: it is the 63rd (of 64) nodes. We’ll be turning a lot screws to gain physical access to this node. We knew this was coming…
Ron