Posts by Mikus

1) Message boards : Number crunching : Wishlist (Message 6805)
Posted 2037 days ago by Mikus
.
Let me express a wish for trying to better predict the length of ABC work units.

I run off-line. Meaning that when I connect, I need to download enough work to keep my system busy until the next time I connect.


Every workunit provides an estimated # of floating point operations to complete that workunit. When that workunit finishes, the BOINC client divides this estimate by the actual # of floating point operations it measured, and keeps the result as a 'correction factor' (to be applied by the client's work scheduler in calculating how long it will take to crunch "to be done next" work from this project).

As far as I can tell, the workunits supplied by ABC all carry the __same__ estimated # of floating point operations. But the actual amount of crunching needed is __not__ the same for all workunits. Once in a while a "joker" workunit comes along, which takes from five to more than ten times *more* crunching than the "average" ABC workunit. And when the "correction factor" as calculated from that "joker" workunit gets applied to any "to be done next" ABC workunits, *their* calculated crunching requirement gets grossly inflated (unless they too happen to be "joker" workunits, which is normally not the case). [Although the value of the composite 'correction factor' for the project is re-adjusted as each workunit finishes, that adjusted value "drops" at a much slower rate than it "rises".]

--------

Suppose it takes my system one hour to crunch a "normal" ABC workunit. When a "joker" ABC workunit comes along and takes six hours to crunch, the client's work scheduler will expect "to be done next" ABC workunits would *also* need closer to six hours to finish (since the estimate each provides is the same as the estimate that the "joker" provided).

If I now connect and request work, far *fewer* ABC workunits will be fetched than if the "joker" had not inflated the 'correction factor'. If those ABC workunits take only one hour each, my system can run out of work unless the subsequent connection (to again fetch work) is made sooner than I planned.


What I __wish__ is that the ABC project could *identify* "joker" workunits, and assign them larger estimates than those assigned to "normal" ABC workunits.
.
2) Message boards : Number crunching : little work for offline computers ? (Message 4231)
Posted 2177 days ago by Mikus
I have for __ages__ had this exact same symptom -- the *most* workunits that ABC will assign to my system is five at a time. [Then the client is told to defer asking for 121 seconds; when the client does ask again, at the most five more workunits are assigned and the client is told to defer asking for 121 seconds; this cycle keeps on repeating as long as the client still wants work from ABC.] I do not like these two-minute deferrals between ABC downloads, since I run offline (and need to manually provide the internet connection). There are occasions when I get tired of waiting around, and take down the internet connection *before* as much work as the client wants from ABC has managed to dribble in.

I'm running the 64-bit ABC Linux application under the (most recent from boinc.berkeley.edu, beta or not) 64-bit Linux client, on 64-bit Ubuntu 7.04. The five-at-a-time ABC download behavior persists, even if I detach from the project, then re-attach. There is nothing that I can see in the various boinc .xml files at my system that would explain the five-at-a-time ABC downloads -- I believe the ABC server is responsible for this behavior. None of the other projects to which my system is attached (with the possible exception of Leiden, which seems to have difficulty finding work for my system which meets their homogeneous redundancy criteria) have any difficulty filling however big the work request is that my system sends to them.
.
3) Message boards : Number crunching : estimated workunit length (Message 2714)
Posted 2263 days ago by Mikus
I don't agree with "let it go". Because of the way the workunit length of some ABC@home workunits was specified, my system ran out of work. I run off-line, and dial in from time to time. When I previously dialed in, the large RDCF for ABC@home caused the (left on the ready queue from an earlier download) ABC@home workunits then queued at my system to be estimated to take four hours each. So the BOINC client (properly) figured there was still lots of work to be done, and did not bother then to download new work (even from other projects). But overnight (when the system was *not* connected) all the ABC@home workunits were completed - they only took 20-40 minutes each. This __drastic__ reduction of the previous estimated "amount of work to be done" caused my system to go idle, once it had finished the last ready workunit on my queue (this is before I saw what was happening and dialed in again - the way my communiction hardware is set up, I have to manually make the connection).


My reason for posting here is that every workunit has a field <rsc_fpops_est>, whose value is inserted by the project. All the ABC@home workunits currently at my system have the value 5000000000000.000000 there. What that value is __supposed__ to be is the estimated number of floating-point-operations needed to complete that workunit. [This is how BOINC calculates the estimated workunit length -- it takes the <rsc_fpops_est> value from the workunit, multiplies it by the current RDCF, and divides that total by the floating-point-crunching-rate of the system (as determined by running the BOINC benchmark) -- that gives the estimated amount of CPU time it would take to complete crunching of that workunit.]

I believe that the way the RDCF is calculated is by using the above formula in reverse -- I believe BOINC takes the measured time it took to complete the workunit, multiples that by the floating-point-crunching-rate of the system, and divides that total by the <rsc_fpops_est> value specified for that workunit. You can see that, when the <rsc_fpops_est> value specified in the workunit is being used as a divisor, an unrealistically low number in <rsc_fpops_est> will cause the calculated RDCF to be correspondingly high. And I blame the high RDCF being calculated for ABC@home for my sytem going to idle (because the actual work ended up being way less than the estimated work).

--------

What I am asking the people who create the ABC@home workunits to do is to put in more accurate <rsc_fpops_est> values in their workunits. Specifically, given the variation in the length of those workunits, the value in the <rsc_fpops_est> field of each workunit __needs__ to be proportionate to the crunching required to complete that workunit.

[Suppose a workunit AA has a particular value in <rsc_fpops_est>, but there also exists a workunit BB which has been estimated to require eight times the crunching of workunit AA -- then workunit BB should be assigned a <rsc_fpops_est> value which is eight times the <rsc_fpops_est> value assigned to workunit AA.]

Thank you.
.
4) Message boards : Number crunching : estimated workunit length (Message 2710)
Posted 2263 days ago by Mikus
Hum, you're posting this on the ABC@Home forums. ;-)
It's probably better at home on the SIMAP forums.


Please forgive a "senior check" moment. Although I wrote SIMAP, I meant ABC@home.

My message was correctly posted to the ABC@home forum. Please replace the word SIMAP in my message with the word ABC@home. What I described is happening to me with ABC@home workunits. I posted to alert the ABC@home work issuers as to the consequences upon project participants of under-estimated workunit lengths.
.


p.s. Many forums let me go in afterwards and edit something I've written. Just tried it with my the above message to ABC@home, but this forum didn't give me the option to edit.
.
5) Message boards : Number crunching : estimated workunit length (Message 2691)
Posted 2263 days ago by Mikus
On my system, the current BOINC 'Duration Correction Factor' for SIMAP has a value of more than 6.0

I run off-line, and dial in from time to time to fetch work. The effect of this large RDF is that each SIMAP workunit is being estimated to take several hours longer than these workunits "typically" need on my system. And yesterday all the actual SIMAP workunits took only a relatively short time to complete. The result was that despite having filled my "ready queue" with work estimated to take many hours, my system had actually run out of work (i.e., had gone idle) by the time I dialed in this morning. Not something I want.

What a RDF of 6.0 tells the BOINC client is that whatever the workunit length estimated by the project, it should multiply that requirement by 6. And how does the RDF get to be so large? By one or more SIMAP workunits taking 6 times as long to process as the estimate placed on that workunit by the project.

Looking at the recent SIMAP workunits my system has completed, the "typical" seems to be taking 20-40 minutes. There were several which took more than two hours, and one which took more than four hours.

I'm guessing that there were SIMAP workunits (perhaps the "long" ones) which had an estimated workunit length which was 6 times shorter than their actual processing time. Would you please make an effort to avoid issuing SIMAP workunits having an unrealistic estimated workunit length.
.
6) Message boards : Number crunching : AMD64 (Message 2331)
Posted 2281 days ago by Mikus
boinc client crash:

Linux (64-bit Ubuntu 6.10). 32-bit boinc client 5.8.15. Currently 32-bit ABC application 1.02.

When workunit http://abcathome.com/result.php?resultid=1345765 finishes, the boinc client goes into a 100% CPU loop. Must use kill -9 to get rid of it. Was not able to get any kind of trace or dump.

Manually aborted the workunit, to get it out of my system. Several other (shorter) ABC workunits ran correctly.

[However, for my next download I plan to replace the 32-bit ABC a[[lication with the 64-bit ABC application - that's why I'm trying ABC - to get experience with 64-bit applications.]
.


Return to ABC@home main page


Copyright © 2013 University of Leiden