On Sat, 29 Oct 2011, Krisztian Krajczar wrote: > Hi all, > > method "1a)" does work, the Pede job finished correctly (output db file is at > /afs/cern.ch/cms/CAF/CMSALCA/ALCA_TRACKERALIGN/MP/MPproduction/mp0899/jobData/jobm). Hi Krisztian, I noticed that you did not switch off the misalignment here. Probably you were mislead by my naming scheme 1) and 1a)... Should not have caused much harm, but I am not sure. Since these jobs do not need so much ressources, you could rerun them, spefifying the smaller need: 1) In setup_align.pl my $mem = 35000; ==> my $mem = 9000; 2) In all .py templates process.AlignmentProducer.algoConfig.pedeSteerer.pedeCommand = '/afs/cern.ch/user/c/ckleinw/bin/rev81/pede_32GB' ==> process.AlignmentProducer.algoConfig.pedeSteerer.pedeCommand = '/afs/cern.ch/user/c/ckleinw/bin/rev81/pede_8GB' (hope this fits, otherwise 16000 and pede_16GB) Cheers Gero > I go on with the submission of the weighted cosmics samples. > >> method "1)" did not work, the Pede job failed again with the same symptoms. >> The output in pede.dumb simply stops again without reaching the end: >> >> Record 12900000 ... still reading >> Record >> >> The dump is at >> /afs/cern.ch/cms/CAF/CMSALCA/ALCA_TRACKERALIGN/MP/MPproduction/mp0898/jobData/jobm >> >> I will proceed with your other method, "1a)". >> >> Cheers, >> Krisztian >> >>> thanks for the comments! >>> >>> I will modify the alignment_x.py config files according to your suggestion >>> "1)". >>> >>> I have moved the diagnostic files to a backup directory for future >>> reference: >>> /afs/cern.ch/cms/CAF/CMSALCA/ALCA_TRACKERALIGN/MP/MPproduction/mp0897/backup_failingJobVer1 >>> >>> Cheers, >>> Krisztian >>> >>>>> The mps_stat.pl command reports that the Pede job for the alignment of >>>>> ideal geometry failed. However, there are outputs produced in the >>>>> directory you indicated in your earlier email. >>>>> >>>>> I have checked the Pede dump in search for any errors, but found no >>>>> errors. >>>> >>>> Hi Krisztian, >>>> (adding Claus as pede expert asking for advice in the end) >>>> indeed this is the first file to look into. And it does not look healthy, >>>> but simply stops at some point - the last line should be something like >>>> >>>> < Millepede II-P ending ... Wed Oct 26 22:52:11 2011 >>>> >>>> as in mp0896/jobData/jobm/pede.dump. MPS looks for that line and reports >>>> failure since it is not there. >>>> >>>>> The memory usage was normal, although it was slightly higher than for >>>>> the previous alignments: >>>>> >>>>> Memory space: total 32.000000 GB >>>>> used 31.226771 GB = 97.58 % >>>>> >>>>> In STDOUT I found a possible source of the "fail" report of the >>>>> mps_stat.pl. One of the automatic root macros failed to run: >>>>> >>>>> --------- >>>>> Processing readPedeHists.C+("print nodraw")... >>>>> Info in : creating shared library >>>>> /pool/lsf/krajczar/182146920/./readPedeHists_C.so >>>>> Error in : failed reading x-y-dx-dy >>>>> content >>>>> --------- >>>> >>>> Before that I see >>>> >>>> sh: line 1: 27036 CPU time limit exceeded >>>> /afs/cern.ch/user/c/ckleinw/bin/rev81/pede_32GB pedeSteerMaster.txt > >>>> pede.dump >>>> >>>> and that tells us the reason whay pede did not run through - it is a >>>> serious problem! It is also stated in alignment.log.gz from CMSSW: >>>> >>>> %MSG-e Alignment: AfterModEndJob PedeReader() 28-Oct-2011 07:12:10 CEST >>>> PostEndRun >>>> Problem opening pede output file millepede.res >>>> %MSG >>>> %MSG-i Alignment: AfterModEndJob PedeReader::read() 28-Oct-2011 07:12:10 >>>> CEST PostEndRun >>>> will read parameters for run range 1 - 4294967295 >>>> %MSG >>>> %MSG-i Alignment: AfterModEndJob PedeReader::read() 28-Oct-2011 07:12:10 >>>> CEST PostEndRun >>>> 0 parameters for 0 alignables >>>> >>>> What you point to is a consequence of that: pede did not run through, so >>>> millepede.his with histogram-like infos of the pede job is not well >>>> behaving and cannot be correctly converted into ROOT/.ps - and there the >>>> error you see comes from. >>>> >>>>> For the previous rounds of alignments this problem did not appear. >>>>> >>>>> Reference: >>>>> /afs/cern.ch/cms/CAF/CMSALCA/ALCA_TRACKERALIGN/MP/MPproduction/mp0897/jobData/jobm/pede.dump >>>>> /afs/cern.ch/cms/CAF/CMSALCA/ALCA_TRACKERALIGN/MP/MPproduction/mp0897/jobData/jobm/STDOUT >>>>> >>>>> Is this a serious issue? Can I submit the Pede jobs for the weighted >>>>> samples regardless this error? >>>> >>>> The question is: >>>> Why does it need more CPU starting from ideal (but bows). Internally it >>>> is using an iterative procedure (MINRES) for solving the big matrix - and >>>> this is done three (4?) times with your settings. Then after each solving >>>> there is a line search in 1D. Procedures like that tend to have >>>> difficulties if we start too close to the final result (needing more >>>> MINRES iterations - see e.g. last page of >>>> mp0896/jobData/jobm/millepede.his.ps.gz how kuch this can vary in a >>>> succesfull job.)... >>>> >>>> So - what to do? >>>> >>>> 1) We can introduce a bit of noise in the procedure by adding some random >>>> misalignment. >>>> 1a) If that does not help, we could remove the bow-misalignment and the >>>> bow determination from teh alignment job - in the very end we could >>>> probably use the bows that are the result of the jobs starting from >>>> current MC scenario >>>> 2) I'll ask for a larger CPU limit on the special millepede queue. >>>> >>>> Claus - do you have another suggestion? >>>> >>>> about 1) >>>> add to configs >>>> process.AlignmentProducer.doMisalignmentScenario = True >>>> process.AlignmentProducer.MisalignmentScenario = cms.PSet( >>>> setRotations = cms.bool(True), >>>> setTranslations = cms.bool(True), >>>> seed = cms.int32(1234567), >>>> distribution = cms.string('gaussian'), #fixed'), >>>> setError = cms.bool(True), >>>> TIBBarrels = cms.PSet(DetUnits = cms.PSet( >>>> dXlocal = cms.double(0.001)) >>>> ) >>>> # same for TIDEndcaps, TECEndcap, TPBBarrels and TPEEndcaps >>>> # but leave out TOB for now >>>> ) >>>> about 1a) >>>> - setup new directory >>>> - remove process.trackerBowedSensors stuff from startgeometry.txt >>>> - deselect the bow parameters in alignables.txt: >>>> * last three '1' to set to '0' for single sensors (SelectorBowed) >>>> * remove 'SelectorTwoBowed' and add double sensor modules (TOB, outer >>>> TEC) to SelectorBowed with '101111 000' parameterisation. >>>> >>>> >>>> Cheers >>>> >>>> Gero >>>> >>>> -- >>>> ----------------------------------------------------------------------- >>>> Gero Flucke >>>> - Analysis Centre, Helmholtz Alliance "Physics at the Terascale" >>>> * Statistics Tools >>>> - CMS: Tracker Alignment Convenor >>>> DESY/CMS, Notkestr. 85, D-22607 Hamburg, Germany >>>> Bldg. 1e, Rm. 02.501 >>>> phone: +49 (0)40 8998 3525 >>>> fax: +49 (0)40 8998 3092 >>>> >>> >> > -- ----------------------------------------------------------------------- Gero Flucke - Analysis Centre, Helmholtz Alliance "Physics at the Terascale" * Statistics Tools - CMS: Tracker Alignment Convenor DESY/CMS, Notkestr. 85, D-22607 Hamburg, Germany Bldg. 1e, Rm. 02.501 phone: +49 (0)40 8998 3525 fax: +49 (0)40 8998 3092