Worst Nightmare: Need more disk performance

Silencing hard drives, optical drives and other storage devices

Moderators: NeilBlanchard, Ralf Hutter, sthayashi, Lawrence Lee

Post Reply
Schlotkins
Posts: 278
Joined: Thu Mar 27, 2003 5:30 am

Worst Nightmare: Need more disk performance

Post by Schlotkins » Wed Dec 08, 2004 5:05 am

I finished up my silent PC in February with a Samsung 160gig drive and I'm loving it's quietness... Can't hear it at idle and just a low rumble on seeks. However, I have a problem.... I'm doing a lot of data work now on my PC and the disk speed is killing me in SAS (Stat program). I just sit here through sorts and summary data. You can hear the drive grinding away.

Now, I checked out memory usage and XP is telling me I have like 256meg free when this is going on, so I don't think throwing more memory at it is going to help. I don't know if SAS is stupid and doesn't load more into memory or if it's windows. So, I'm looking at my options and of course, none of them are THAT good.

- I could upgrade to 4gig of RAM, make a RAMdisk of say 3gig and probably copy everything I need over to the disk and run it from there. The downside is I'd have to do this on every reboot and I'm a little wary of adding another 3 gigs of DDR ram with DDRII coming around now.

- Get a fast drive, throw it in an enclosure and put it in a closet near the computer. I guess I could go SATA (Raptor) or SCSI, but the closet isn't that far (4 or 5 feet) from me. In addition, I'm worried about heat in there. How much will the closet dampen the sound? Anyone else doing this?

So, I'm open to suggestions. I'm sure some others have tried something like this... I don't know how to do the wireless drive setup, but I'm open to the idea if it's actually possible. I don't have unlimited resources here (poor Ph.D. student), but time is important.

Thanks so much,
Chris

Tobias
Posts: 530
Joined: Sun Aug 24, 2003 9:52 am

Re: Worst Nightmare: Need more disk performance

Post by Tobias » Wed Dec 08, 2004 6:18 am

Schlotkins wrote: Now, I checked out memory usage and XP is telling me I have like 256meg free when this is going on, so I don't think throwing more memory at it is going to help.
The problem with running regressions is that the data needs to be transformed and summarized etc. so many times that the dataset during sessions multiplies many times. More RAM will possibly only be helpful if it is large enough to contain all data when you are calculating your variables...

Dhurdahl
Posts: 46
Joined: Thu Oct 16, 2003 11:43 pm

Post by Dhurdahl » Wed Dec 08, 2004 6:35 am

Well the normal way to do this is to get more drives and RAID them.

So what I would suggest is getting 2 or 4 more Samsung drives and run them in a RAID 0 set.
Either from the MB or PCI card.

That should give you a very nice boost without raising the noise too much.

:D

Schlotkins
Posts: 278
Joined: Thu Mar 27, 2003 5:30 am

Post by Schlotkins » Wed Dec 08, 2004 6:41 am

The problem with running regressions is that the data needs to be transformed and summarized etc. so many times that the dataset during sessions multiplies many times. More RAM will possibly only be helpful if it is large enough to contain all data when you are calculating your variables...
Hm... I wonder how many times. The data sets (right now) are about 80 megs. They will probably go higher, but that's where I am now. EDIT: Oh, I see the problem. If I have X as 15,000 observations X'X is going to be 15000x15000 matrix of doubles... wow, that's going to be huge... Hmmm, maybe 4 gigs of ram is going to be the way then.
Well the normal way to do this is to get more drives and RAID them.
If I'm not mistaken, sorts, merges, regressions, etc would probably be more random access oriented than sequential read speed... no?

Thanks for the replies!
Chris

Dhurdahl
Posts: 46
Joined: Thu Oct 16, 2003 11:43 pm

Post by Dhurdahl » Wed Dec 08, 2004 7:02 am

Schlotkins wrote:If I'm not mistaken, sorts, merges, regressions, etc would probably be more random access oriented than sequential read speed... no?
Anand test of 2 drives in RAID0 :http://www.anandtech.com/storage/showdoc.aspx?i=2101

Actually RAID 0 is exactly what you should use if you have random IO.

:D

james22
Posts: 16
Joined: Sat Apr 03, 2004 9:50 pm

Post by james22 » Wed Dec 08, 2004 10:42 am

EDIT: Oh, I see the problem. If I have X as 15,000 observations X'X is going to be 15000x15000 matrix of doubles... wow, that's going to be huge... Hmmm, maybe 4 gigs of ram is going to be the way then.
If you're referring to X your regression matrix it usually has #columns = number of predictors and #rows = number of observations. For things to work, there must always be far fewer predictors than there are observations.

When you do X'X you end up with a K x K matrix, where K is your number of predictors, not the number of observations.

Why don't you try doing things step by step to see which routine is taking the longest time - my guess is that it's the matrix inversion.

SAS can be pretty slow/inefficient - search the Web and maybe you can find some tips on how to optimize things.

James

hofffam
Posts: 173
Joined: Wed Oct 22, 2003 6:18 am
Location: Texas

Post by hofffam » Wed Dec 08, 2004 12:09 pm

I don't know how SAS works on the PC, but similar applications sometimes provide options to direct I/O for different activity to different locations. Would it help to add a second drive and tell SAS to use one drive for "work" files and another drive for other files? Raid 0 is worth a 20% boost at best....

Also - is your processor at 100% during these analysis runs? If so - you are already processor-bound and improving disk performance won't help.

Another alternative is to replace the motherboard with a dual processor board. This will help only if SAS is multi-threaded and can take advantage of multiple processors.

ddrueding1
Posts: 419
Joined: Sun Sep 19, 2004 1:05 pm
Location: Palo Alto, CA

Post by ddrueding1 » Wed Dec 08, 2004 12:34 pm

Upgrading to 4GB of RAM could help a lot, but only if 4GB would be enough to hold your dataset. Otherwise all you are doing is delaying the amount of time before the "grinding" begins.

RAID0 is not the way to go for random IO (especially of important data). Either a 15k RPM disk in the closet (SCSI cables can run that far) or simply another Samsung in your machine configured as a stand-alone drive where portions of your workload take place. Possibly even dedicating it to being a scratch disk.

I'll second checking CPU usage as well, if that's the case, there are much easier solutions.

Schlotkins
Posts: 278
Joined: Thu Mar 27, 2003 5:30 am

Post by Schlotkins » Wed Dec 08, 2004 12:47 pm

If you're referring to X your regression matrix it usually has #columns = number of predictors and #rows = number of observations. For things to work, there must always be far fewer predictors than there are observations.
Wow... that's embarassing. Kids, let this be a lesson to you: Don't think of matrices until you've had some coffee. :)

Actually, I'm doing a lot of data sorting/merging/basic stats right now... My guess is the regression work won't be as bad. We shall see though as I'm going to have to combine these things.

OK, let me try some code and check the CPU usage. Please hold... <corny elevator music> OK, the process seems to use 44% of my CPU at the max. So, it's definitely NOT a CPU issue. Even the page file size doesn't go up that much. Of course, when I have 300 megs of physical memory available, why doesn't SAS put the file into memory. The page file only goes up like 35 megs...

I can't find any benchmarks on the web so I don't know what to tell you there. I did find some memory numbers I can edit so maybe I'll play with them. I guess my concern at this point is I have 300 megs free of RAM and SAS isn't using it since the entire time the hdd is pegged. So, maybe RAM wouldn't even do anything for me if I can't get SAS to use it.

Again, I appreciate ALL the input.. even the stuff that makes me look stupid. :)

Chris

Tobias
Posts: 530
Joined: Sun Aug 24, 2003 9:52 am

Post by Tobias » Wed Dec 08, 2004 2:06 pm

Schlotkins wrote:
Again, I appreciate ALL the input.. even the stuff that makes me look stupid. :)
Haha, when James22 entered I really felt like a n00b again and was very happy I was general enough to stil be right:)

Anyway, you say that you are only sorting and merging tables with up to 150k rows right now? The database in SAS is in a single file right? If so, wouldn't it mean that every time you change the file the computer need to read the entire dataset, find the specific place to extract/combine data and then need to store it to the disk again. It is a rather chunky bit of file to read/write, so it could be at least a part of the problem. It feels like it could be consistent with the increase in the pagefile as well...

hofffam
Posts: 173
Joined: Wed Oct 22, 2003 6:18 am
Location: Texas

Post by hofffam » Wed Dec 08, 2004 3:58 pm

I found the following link http://www.intel.com/business/bss/solut ... cklist.pdf that offers SAS tuning tips. Nothing earth shattering, but it does suggest separating I/O amongst disks to reduce disk contention.

It seems like cpu is not a problem for your case, and neither is memory (or SAS can't use lots of it). The cheapest (and quietest) choice might be another Samsung drive. Probably should run it on a separate IDE/ATA controller.

hofffam
Posts: 173
Joined: Wed Oct 22, 2003 6:18 am
Location: Texas

Post by hofffam » Wed Dec 08, 2004 4:03 pm

Another document http://www.sas.com/partners/directory/i ... _guide.pdf with much more detail. This is server-centric but the information should have general value. Once again it suggests separating files on multiple disks.

ddrueding1
Posts: 419
Joined: Sun Sep 19, 2004 1:05 pm
Location: Palo Alto, CA

Post by ddrueding1 » Wed Dec 08, 2004 4:13 pm

Just something to check on, be sure DMA is enabled on your drive and that it is running at it's full speed.

Edirol
Posts: 40
Joined: Mon May 03, 2004 11:51 pm

Post by Edirol » Wed Dec 08, 2004 8:31 pm

Get more ram. And use a ramdisk. There are free utils out there that let u create big ramdisks if you look hard enough. I don't have a link on me ATM.

Schlotkins
Posts: 278
Joined: Thu Mar 27, 2003 5:30 am

Post by Schlotkins » Thu Dec 09, 2004 8:58 am

Thanks for the links guys. I'll check it out. It looks like there several things I can do both in terms of settings and code to help things out. I'll play around and see what I can do but it looks like I don't need a new drive. Maybe some more ram for a ramdisk, but otherwise I should be OK.

Thanks!
Chris

Chang
Posts: 129
Joined: Mon Oct 11, 2004 12:26 pm

Post by Chang » Thu Dec 09, 2004 11:42 am

I've found ramdisks with SAS don't help enough for the effort involved in working with them.

Three suggestions:

1) Try using the compress dataset option -- throwing in a (COMPRESS=YES) does wonders for some datasets. Remember, you can use dataset options almost anywhere you reference a dataset name (like with ". . . out=myoutdata(compress=yes);".

2) Tweak the sort options. Just upping the memory used with SORTSIZE=128MB might fix it all. NOEQUALS can shave off a couple seconds (especially handy if you're bootstrapping). TAGSORT helps if you're sorting on a single column and have tons of extra columns.

3) SAS is much happier with many rows than many columns. If you have many columns (say in the thousands), try dropping columns or transposing your data.

greeef
Posts: 355
Joined: Thu Apr 15, 2004 8:08 am

Post by greeef » Thu Dec 09, 2004 12:02 pm

Just a thought - run a regular defrag and set your pagefile to a constant size (say, 1Gb) to reduce fragmentation.

Is your hard drive on the same IDE bus as any other devices?

Grab a copy of sisoft sandra and do some hard disc benchmarks, see if they're any slower than what you should be getting.

griff

Schlotkins
Posts: 278
Joined: Thu Mar 27, 2003 5:30 am

Couple question

Post by Schlotkins » Thu Dec 09, 2004 5:34 pm

It seems as if a few people here use SAS... :) Here's my questions:

On this Intel document, I can't change some of the items, like MEMLIB or REALMEMSIZE... Why is that? I did a google on PAE in Windows XP and it seems like it's on by default. I guess generally, I don't know how to change these values.

Also, in case someone does a search, try using:

sasfile analysis.backache load;
sasfile analysis.backache close;

(link: http://support.sas.com/sassamples/bitsa ... file1.html)

Thanks again!
Chris

halcyon
Patron of SPCR
Posts: 1115
Joined: Wed Mar 26, 2003 3:52 am
Location: EU

Post by halcyon » Thu Dec 09, 2004 11:26 pm

RAID will do you very little good. See storagereview test for more data on this (also supported by gamepc tests). Unless you have heavy server type concurrent i/o, RAID will not speed up your disk times much at all.

I also doubt that even if you upgraded to Raptor that the difference would be really _that_ big, although you could probably notice it (as well as the increased noise).

So, suggestions about optimisation and dram upgrade (prices are going to sink in Q1-Q2/2005) are probably quite ok :)

hofffam
Posts: 173
Joined: Wed Oct 22, 2003 6:18 am
Location: Texas

Post by hofffam » Fri Dec 10, 2004 6:44 am

Halcyon - why do you think a DRAM upgrade will help? Schlotkin's system seems to show that the existing RAM is under-utilized. Increasing an under-utilized resource won't help anything. It is even possible that it might slow things slightly because the operating system has to track more memory pages.

Schlotkin - I used SAS many years ago on a MAINFRAME. I have zero experience with a PC. In mainframes I/O is typically spread out widely across multiple devices so contention on a single drive is rare. Your cpu is not busy. Your memory is under-utilized. Your original idea that hard disk issues are the bottleneck seems correct. Does your SAS license entitle you to support? Can you send them email asking for specific suggestions? SAS is surely an expensive piece of software and should have technical support. I'd want to try to find ways to get SAS to do more in memory instead of disk..... Increasing cache sizes, sortwork in memory, etc. I don't know what file system you are using (FAT32 or NTFS), but increasing blocksize/cluster size might help (might cause more wasted space).

Schlotkins
Posts: 278
Joined: Thu Mar 27, 2003 5:30 am

Post by Schlotkins » Fri Dec 10, 2004 6:50 am

At this point, I think it's the way SAS is configured. By default the application only uses about 64megs of RAM for sorting, etc. Now, here's the real sh*t kicker. If you tell SAS to use MORE RAM then you have, the thing pukes all over the place, crashes your box and makes you reboot. Using that load command, I've really sped things up so that helped. Now I'm trying to get SAS to use my RAM for the WORK library. I think, at least for my current size files, that will really improve performance. If I can't get SAS to use more RAM, then I think HD is the way to go. If CPU utilization is only 44%, 300megs of RAM are free and the HD is clicking away, I think that's the bottleneck.

I'm going to send SAS an email today and see what they say. I'm pretty not even close to a power user at this point, so the stuff I'm doing should click off pretty quickly.

Chris

Chang
Posts: 129
Joined: Mon Oct 11, 2004 12:26 pm

Post by Chang » Fri Dec 10, 2004 7:56 am

MEMLIB and REALMEMSIZE are options you can adjust in the SAS config file. Also, when you're randomly googling for answers, make it applies to your configuration (i.e. for desktop XP and not NT 4.0).

SAS's disk I/O is probably closer to concurent server I/O than you might think. It's code dependent of course, but RAID would help alleviate the bottleneck common to most desktop SAS installations.

As for telling SAS to use more RAM than you have, it depends on how you do it. If you specify a larger sortsize than you have available physical memory, the Windows memory manager takes over and starts hitting the swapfile.

I suggest just opening up the help and reading the "Techniques for Optimizing I/O" page. Also browse through the "Using SAS in your OS > Using SAS in Windows > Running SAS Under Windows > Performance Considerations under Windows" section. That should hit all the bases. It doesn't sound like using the sasfile statement is the answer to your performance issues. You said you were seeing slowdowns with basic sorts and summary data. That's a single hit to your dataset and you'll have the same slowdown loading it into memory. It might work as a short term fix, but not an answer to your overall issues.

Also, if you haven't already done so, greef's suggestion of a defrag might help some. If possible, you'll also want to move the location of SAS's work directory onto a different drive than the Windows pagefile.

freak_in_cage
Posts: 92
Joined: Mon Dec 06, 2004 8:05 am

Post by freak_in_cage » Sat Dec 11, 2004 3:01 pm

erm, all this complex ram stuff is wayyyyy byond me, but heres what i would do:

1) use the current seagate setup as a slave
2) buy a second high performance drive and go to lengths to silence it. i have herd (although i may well be wrong) that considering the performance the WD raptors arnt THAT loud. i saw a link to a site around here somewhere which compltly enclosed a drive in foam, except a few heatpipes carring thing heat to an aluminium case which completely enclosed the drive. the case was coevered in heatsinks and fitted below the PCI slots.i reckon the heatpipes would keep it cool enough, although obviouslty if you dont need such extreme performance then it would be better go for a 7200RPM drive.

i guess it depends on how quite you want it. you may be able to get away with suspending it, and adding a bit of insulation + heatsinks to it.

vortex222
Posts: 257
Joined: Sun Mar 28, 2004 8:30 pm
Location: nanaimo BC Canada

Post by vortex222 » Sat Dec 11, 2004 7:28 pm

if SAS has options to span IO accross multiple disks, then why cant a ramdrive be specified as just another disk? specify mulitple ram disks lol

Post Reply