brad> In the past i was pretty pleased to write 60MB/s under
brad> real-world conditions on my linux systems a few years back. I
brad> just wanted to ping the group about this. Does anyone know if a
brad> hard drive or even memory come close to writing data at this
brad> speed on a linux system? Or, what might the limit for data rates
brad> on a linux PC these days. From my google'ing it appears hard
brad> drives top out at around 100MB/s.
If you're serious about grabbing all the data and writing it to disk,
then you'll need to setup some sort of RAID, where you spread your
writes across a bunch of controllers and disks.
In the past i've used a RAID style arrangement to capture data at seemly fast data rates (though not quite at 1GB/s).
In this case, if you
get say three high end PCI-express SATA cards with four ports, you
could stripe the data across 12 1Tb disks at 1GB (You said GigaBYTE,
right?) without too much trouble, assuming you have a bunch of PCIe
slots on the board.
Yes, i did say GigaBTYE. Oddly enough, in my initial calculations i was off by an order of 2 -- 500MB/s, and i thought, umm there's a slim chance i can write that fast. Then i re-ran the numbers and got 1GB/s, at which point i thought i'm looking at an impossible task.
<snip>
BTW, How will this device move the data to your file server? 10Gbit
Ethernet? Direct PCI-E interace? Infiniband?
The data will be moved via SneakerNet. My machines are not connected to any kind of network.
Do you need to actually keep all the data, or can you pre-process it
and compress it or summarize it, etc? Then write those results to
disk? Loading up the system with lots of RAM will give you more
leeway, but you're still going to be hurting to handle this flood.
Good question. All of the data is necessary, pre-processing isn't really an option nor do i have a hook to process data on the instrument. (At the very least i need to show an acceptable amount sensitivity to the instrument's readings to keep the funding dollars coming in, and a job :) ). Currently i use a lot of RAM as a large ring buffer and spool that out to disk when i get a chance, but again, that's only on the order of around 50MB/s.
brad> Note: This piece of instrumentation has onboard memory for
brad> writing, which i'll probably have to use, but if there was a way
brad> to stream the data, it would be preferable.
You're going to have to invest in a big box with lots of bandwidth to
handle this, along with the disks to hold all this data.
This is why i'm thinking i'll just have to use the instrument's onboard memory. i'm not sure how feasible a 'big box' will be in my lab, dollars are tight and we really need to have some degree of portability.
And I'd
really really really suggest you try to pre-process and summarize and
reduce the data before you try to write it.
yeah, i've looked in to trying to compress the data in a lossless manner, but that ends up taking up more CPU than i can spare since the data is all binary.
Good luck, and let us know what you end up doing!
Thanks, i'll need it! It helps to hear ideas (thanks!) just to see what other people might suggest. I'm still in the early stages. I'll look into the RAID solutions through various hardware interfaces (SATA, SAS). Although in my experience RAID only yields a performance increase by a factor of 2, and it looks like i'm going to need a 10-fold performance boost. I'm probably going to be throttled by the RAM capacity rather than disk capacity, b/c i don't think i'm going to be able to write to disk fast enough, at least with the kind of system