WLUG'ers, At work, i might be getting a piece of instrumentation in my lab that that is capable of out putting out 1GB of data per second. In the past i was pretty pleased to write 60MB/s under real-world conditions on my linux systems a few years back. I just wanted to ping the group about this. Does anyone know if a hard drive or even memory come close to writing data at this speed on a linux system? Or, what might the limit for data rates on a linux PC these days. From my google'ing it appears hard drives top out at around 100MB/s. Note: This piece of instrumentation has onboard memory for writing, which i'll probably have to use, but if there was a way to stream the data, it would be preferable. Thanks, -- brad
brad> At work, i might be getting a piece of instrumentation in my lab brad> that that is capable of out putting out 1GB of data per brad> second. Fun! You're going to have an interesting time handling data from this device for sure. brad> In the past i was pretty pleased to write 60MB/s under brad> real-world conditions on my linux systems a few years back. I brad> just wanted to ping the group about this. Does anyone know if a brad> hard drive or even memory come close to writing data at this brad> speed on a linux system? Or, what might the limit for data rates brad> on a linux PC these days. From my google'ing it appears hard brad> drives top out at around 100MB/s. If you're serious about grabbing all the data and writing it to disk, then you'll need to setup some sort of RAID, where you spread your writes across a bunch of controllers and disks. In this case, if you get say three high end PCI-express SATA cards with four ports, you could stripe the data across 12 1Tb disks at 1GB (You said GigaBYTE, right?) without too much trouble, assuming you have a bunch of PCIe slots on the board. I'd probably setup RAID 0 (striping) with a stride setup so that every 1 to 16 Megabytes you write to a new disk, so as to take advantage of the cache on the disks. You'll have to do some testing to determine the best numbers. Also, if this data is important, I'd setup mirrored pairs of disks and then stripe across those disks. More controllers and disks and power. BTW, How will this device move the data to your file server? 10Gbit Ethernet? Direct PCI-E interace? Infiniband? Do you need to actually keep all the data, or can you pre-process it and compress it or summarize it, etc? Then write those results to disk? Loading up the system with lots of RAM will give you more leeway, but you're still going to be hurting to handle this flood. brad> Note: This piece of instrumentation has onboard memory for brad> writing, which i'll probably have to use, but if there was a way brad> to stream the data, it would be preferable. You're going to have to invest in a big box with lots of bandwidth to handle this, along with the disks to hold all this data. And I'd really really really suggest you try to pre-process and summarize and reduce the data before you try to write it. Good luck, and let us know what you end up doing! John
(sorry list moderator, i initially replied with the wrong email address). Thanks for the reply.
brad> In the past i was pretty pleased to write 60MB/s under brad> real-world conditions on my linux systems a few years back. I brad> just wanted to ping the group about this. Does anyone know if a brad> hard drive or even memory come close to writing data at this brad> speed on a linux system? Or, what might the limit for data rates brad> on a linux PC these days. From my google'ing it appears hard brad> drives top out at around 100MB/s.
If you're serious about grabbing all the data and writing it to disk, then you'll need to setup some sort of RAID, where you spread your writes across a bunch of controllers and disks.
In the past i've used a RAID style arrangement to capture data at seemly fast data rates (though not quite at 1GB/s).
In this case, if you get say three high end PCI-express SATA cards with four ports, you could stripe the data across 12 1Tb disks at 1GB (You said GigaBYTE, right?) without too much trouble, assuming you have a bunch of PCIe slots on the board.
Yes, i did say GigaBTYE. Oddly enough, in my initial calculations i was off by an order of 2 -- 500MB/s, and i thought, umm there's a slim chance i can write that fast. Then i re-ran the numbers and got 1GB/s, at which point i thought i'm looking at an impossible task.
<snip>
BTW, How will this device move the data to your file server? 10Gbit Ethernet? Direct PCI-E interace? Infiniband?
The data will be moved via SneakerNet. My machines are not connected to any kind of network.
Do you need to actually keep all the data, or can you pre-process it and compress it or summarize it, etc? Then write those results to disk? Loading up the system with lots of RAM will give you more leeway, but you're still going to be hurting to handle this flood.
Good question. All of the data is necessary, pre-processing isn't really an option nor do i have a hook to process data on the instrument. (At the very least i need to show an acceptable amount sensitivity to the instrument's readings to keep the funding dollars coming in, and a job :) ). Currently i use a lot of RAM as a large ring buffer and spool that out to disk when i get a chance, but again, that's only on the order of around 50MB/s.
brad> Note: This piece of instrumentation has onboard memory for brad> writing, which i'll probably have to use, but if there was a way brad> to stream the data, it would be preferable.
You're going to have to invest in a big box with lots of bandwidth to handle this, along with the disks to hold all this data.
This is why i'm thinking i'll just have to use the instrument's onboard memory. i'm not sure how feasible a 'big box' will be in my lab, dollars are tight and we really need to have some degree of portability.
And I'd really really really suggest you try to pre-process and summarize and reduce the data before you try to write it.
yeah, i've looked in to trying to compress the data in a lossless manner, but that ends up taking up more CPU than i can spare since the data is all binary.
Good luck, and let us know what you end up doing!
Thanks, i'll need it! It helps to hear ideas (thanks!) just to see what other people might suggest. I'm still in the early stages. I'll look into the RAID solutions through various hardware interfaces (SATA, SAS). Although in my experience RAID only yields a performance increase by a factor of 2, and it looks like i'm going to need a 10-fold performance boost. I'm probably going to be throttled by the RAM capacity rather than disk capacity, b/c i don't think i'm going to be able to write to disk fast enough, at least with the kind of system Thanks for the response, - brad
participants (2)
-
brad
-
John Stoffel