1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

S&M 1.7.6 issues

Discussion in 'Overclocking & Cooling' started by decto, 26 Mar 2006.

  1. decto

    Wise Guy

    Joined: 24 Jan 2006

    Posts: 2,190

    System
    Opteron 165
    MSI Neo4-F
    2x1GB Gskill HZ PC4000 (Just replaced as an RMA as one stick was faulty)
    POV 7800GTX 256MB
    Enermax 465W PSU
    Thermal Take Big Typhoon
    Antec P180

    I posted a few weeks ago about randon prime fails at both stock and when OC'd. After some time spent the memory was to to blame with errors showing up in Memtest 1.65 that strangely didn't show in the OCZ 1.0 version. Anyhow this has now been replaced.

    Memory is now 20hrs + stable in memtest @ 250Mhz 3-4-4-8 1T
    Prime is stable @ stock 1.35v 1.8Ghz and at 1.40v, 2.5Ghz. Currently on 14hours and counting for the latter so is looking good. It did fail prime the other day but that was at 2.6Ghz and I think I was @ 1.44v. Probably my own fault as I had only upped the voltage by 0.2v from when it fell over during a quick and dirty OC test using clockgen to up the HTT every minute while running prime to get a rough Idea of scaling vs voltage. It actually ran all night and only failed when the heating kicked in the following morning and the room temp raised quite a bit. With a little more voltage I'm sure 2.6 would be fine and I even ran for an hour at 2.7 with 1.54 v but the CPU was a little too hot so I'd backed down to 2.6 and was looking for the minimum stable voltage.

    Rather than wait for prime I decided to use S&M to test the next couple of voltage increments. I only have voltages up to 1.4v at 0.25v increments. I also have % increases of 3.3 , 6.6, 9.9 etc up to 19.9 so I use these with the voltage to make the steps. eg. 1.40v -> 1.375v +3.3% (1.42v) -> 1.350v +6.6% (1.44v) etc.

    When I run the full S&M suite it always fails on core 1 FPU test on the second loop. This is at both 1.8Ghz and 2.6Ghz. core 0 is fine.
    All further testing at stock 1.8Ghz
    To rule out issues I looped the S&M memory test - Ok after 6 hours
    I looped the cache and integer tests - OK after 3 hours (fed up of waiting)
    I looped the FPU test - ok after 6 hours
    I looped the Power test - ok after 3 hours (bored again)

    If I loop the interger and FPU test together it fails on core 1 second loop FPU test.

    If I loop the FPU and Power test S&M periodically say it has an error and has to close, the VGA window stops responding but the FPU test is happily looping in the background.

    So this is driving me nuts, days of testing have passed and I am no closer to knowing if it is my CPU or Mainboard or even if it's just S&M that has a problem.

    3dmark2005, Fear, Doom3, Quake IV etc are all fine, I've had no other crashes.

    I'd just like to know if other Dual core AMD users have had similar problems.

    AD
     
  2. kitfit1

    Mobster

    Joined: 24 Feb 2003

    Posts: 3,787

    Location: Stourport-On-Severn

    Were you overclocked when you exstracted S+M from it's zip file. If you were there is a chance it could have got corrupted, especially if you exstracted it when you had a dodgey stick of ram.
     
  3. decto

    Wise Guy

    Joined: 24 Jan 2006

    Posts: 2,190

    Unfortunately nothing so simple. Downloaded S&M again and replaced the files. Still have the same issue with core 1 pass 2

    Also checked the PSU voltage, 12.01v +/- 0.02v , 5.09v +/- 0.01v 3.39v +/- 0.01v accoring to my Fluke. voltages compated on and off load.

    AD
     
  4. kitfit1

    Mobster

    Joined: 24 Feb 2003

    Posts: 3,787

    Location: Stourport-On-Severn

    In my experience with S+M, when it throughs up the error close box it can normally be put down to a ram problem. When it freezes and stops responding it's down to not enough vcore.
     
  5. decto

    Wise Guy

    Joined: 24 Jan 2006

    Posts: 2,190

    The reinstall of S&M seems to have fixed the 'Must shut down' errors. Have been looping FPU and VGA @2.5 Ghz for the past couple of hours which is usually more than enough to provoke it.

    Just a shame I still get the core 1 error going from interger to FPU whatever clockspeed I use.

    Figure I'll give it a final run of Prime overnight and if I don't get any errors I'll leave it at 2.5 for a while.

    Temps are around 54C in a 23C room at max load so I figure I have some headroom for summer when temps are likely to approch 60C though I doubt I'll be running such a heavy load.

    This is the first chip I've owned that gets majorly hotter when I increase core speed. apppox 10C difference in load temperature from 1.8 to 2.5

    AD
     
  6. kitfit1

    Mobster

    Joined: 24 Feb 2003

    Posts: 3,787

    Location: Stourport-On-Severn

    I think you may have come up with a solution without realising it. You should not have a 10c diff between 1.8 and 2.5. With my opty 170 there's hardly any diff. If i was you i'd be looking at reseating the cpu.
     
  7. decto

    Wise Guy

    Joined: 24 Jan 2006

    Posts: 2,190

    The temperature increase is unusual. I've removed and reseated the HSF a good number of times and each time the thermal paste is a nice thin opaque layer.

    The temperatures are consistantly approx 4C better at load than I obtained with the orignal 4 heatpipe HSF which I'd refitted a number of times.

    I think the chip may have an issue with the contact between the heatspreader and the cores but I'm not yet prepared to risk removing the heatspreader and potentiall killing the chip.

    The other option is that core is not great silicon and does produce a lot more heat when the frequency goes up.

    I'll have to see how brave I feel at the weekend.

    I wiped the disk and reinstalled XP from scratch last night. At 2.5Ghz I've just completed 18hrs of dual prime followed by 3 hours and counting of looped 3dmark2001. Faultless.

    The error in S&M always shows '2' so it's consistant. There may even be a fault on the core, just that nothing else finds it.

    Thanks for the help

    AD
     
  8. kitfit1

    Mobster

    Joined: 24 Feb 2003

    Posts: 3,787

    Location: Stourport-On-Severn

    If Prime is ok for 18hrs, i would'nt worry about what S+M says. Myself i only use S+M as a guide to stability, Prime is the best test.
     
  9. decto

    Wise Guy

    Joined: 24 Jan 2006

    Posts: 2,190

    Update.

    Change of heart last night resulted in the decapitation of my core. The IHS is now removed.

    Temps seem to have dropped 2-3C under load so not a dramatic change, and it suggests the IHS was doing a good job.

    Just started priming at 2.6Ghz, 1.49v (set) 1.45v(monitoring) Temp seems stable at 51-52C vs room temp of 22C.

    I still get the same error in S&M so I've decided to take your advice and ignore it.

    Also I've found the cause of my 'Serious Errors'. Seems my USB wifi stick, used to prevent the little monster getting on the internet causes a crash at very high CPU loads. It would always loose the wifi connection under extreme load but it also seems to cause a crash when installed under such conditions. This got worse once I set the priority for prime to 1 below the maximum. With prime on max the mouse pointer will only move a fraction at a time !

    Have to see how this pans out

    Regards

    AD