I'm surprised nobody recommended some Cisco Catalyst-Switches and a pair of 4-port Intel-NICs yet - that's what usually happens when networking is the thread-topic...
I too was searching for some definite answers for the "how to get may files quickly from a to b using a network"-topic around a year ago, unfortunately I didn't find anything usefull.
Actually I found that 20MB/s-number for a Gbit-connection quite often.
Ok, here's what I would to by what I've learned until now:
1) Find out what NICs you have - if they're PCIe and Realtek 8168/8111, you're fine - if they're by Marvell, that's not so fine, definitely check out several drivers in that case, if nothing helps there's hopefully a free PCIEx1-slot for a cheap Realtek.
Don't waste money on something like Intel Pro1000PT's - I did (60 Euros for two NICs) and got ~5% worse throughput with them, no matter if offloading was enabled or disable (CPU-load didn't change either)
2) Check that there are no unnecessary bindings or filter-drivers on your NIC - they should also use default settings, if you're not sure reinstall them (remove the driver and reinstall by "scan for hardware" works to restore default settings).
Bindings you'll need are TCP/IP, Client for Microsoft Networks / File and Printer Sharing, Network Monitor is safe and QoS shouldn't matter if you'ev it installed (I always uninstall it, no need for it).
3) Take the switch out of the equation - connect the PCs with a simple network-cable (Gbit-NICs must support MDI-X, so a crossover-cable is not needed anymore).
You can add the switch later to find out if it is crap - if it works probably you absolutely shouldn't see any difference (that's the case with my ~30 Euro Dlink DGS-1005D I bought in March 2007, just for reference).
4) Offloading was a nice idea when CPUs were slow back in the day, and it might made a minimal difference if both your connection and CPU are near 100% most of the time - for your application (single stream) it's completely unnecessary.
That's probably why a simple (and cheap) design like the Realteks can offer more throughput at the same CPU-load than a more complex Intel-Design (which might offer an advantage with several hundret streams running parallel - though I personally wouldn't pay 30 Euros per NIC just to find out if that's really the case).
5) Same goes for Jumbo-Frames - quite helpfull when Interrupt moderation and MSI-signaling (PCIe) didn't exist yet, now that both is supported by the cheap Realteks their advantage is pretty much gone - if you get them working at all.
They didn't make a noticeable difference with the Intel-NICs, I couldn't really get them working with Linux, and I think I actually caused a BSOD trying them once with some old Realtek Windows-driver...
Feel free to try, but don't expect much.
6) Ok, back to testing - now that you've ensure no filter-drivers, firewalls (Windows-firewall is safe, others may not be), scanners or switches are in the way, you're ready to test raw TCP/IP-throughput.
I personally prefer netio for this, easy to use and available as Windows/Linux-Binary and as easy to compile source.
(URL in next post)
Run on the server: netio -t -s
and on the client: netio -t 126.96.36.199
(of course 188.8.131.52 is the IP-address of the server)
You should reach >110MB/s pretty independent of packet-size, I'd expect CPU-load to be around 20%-30%, most of it being kernel-load.
Thinks look quite different when you want to transfer files - actually Linux's Samba/CIFS implementation by now tends to be better than the one by Microsoft themselves.
After I was done with all my research, the german magazine c't did some benchmarking, while they came near wire-spead (>90MB/s) with the Samba-Client in Vista, they only reached >60MB/s with the client in XP-SP2 - that's actually also the performance I got - they did their measurements with a Ramdisk, and so did I.
Copying file from/to my Linux 2.6-Storage I average around 40MB/s-50MB/s, I think that should be possibly with your hardware.
Actually I once testet Win2003 as a server, I think it was minimally slower, not more than 10% though.
Regarding those 4MB/s over 100MBit:
Back in 1998 at my first employee we had a Linux 2.0-Development-Server and NT4-Clients with 3Com-NICs, usual throughput was ~6MB/s with the 3com-NICs.
Since those had issues with the already-mentioned expensive Catalyst-Switch we had (took over 2 minutes after power-on until the NICs had a link), I one day replaced my 3com with a cheap Realtek - I then reached 7MB/s-8MB/s.
Of course that was 10 years ago - my only device left with only a 100mbit-Port is my 5 year old noteobook with a Pentium-M 1.3GHz and a pretty slow 80GB-disc (>200 Euros at that time).
I copy files with a constant 99% usage (according to the Network-monitor in the Taskmanager) and a throughput >10MB/s - both PCs are running XP SP2.
I've not done any optimisations done on the TCP-Stack, I tried them but I didn't find a difference - those 4MB/s sound like some firewall/netfilter, or a really slow disc - never experienced that slow performance.
Hope those reference-numbers and things help - unfortunately there're not many tweaks I can provide to you, simply because they did nothing for me - GBit-performance is hard to reach with one stream (=file-copying with one client), and after trying a lot of things (iSCSI, AoE, FTP with vsFtp) I settled with simply Windows networking - most of the time my WD6400AACS limits the transfer-speed anyway.
Good luck on your quest anyway!