TABLE OF CONTENTS


ABINIT/dfpt_cgwf [ Functions ]

[ Top ] [ Functions ]

NAME

 dfpt_cgwf

FUNCTION

 Update one single wavefunction (cwavef), non self-consistently.
 Uses a conjugate-gradient algorithm.
 Try to keep close to the formulas in PRB 55, 10337 (1997) [[cite:Gonze1997]], for the
 non-self-consistent case, except that we are computing here
 the second-derivative of the total energy, and not E(2). There
 is a factor of 2 between the two quantities.
 The wavefunction that is generated is always orthogonal to cgq.
 It is orthogonal to the active Hilbert space, and will be complemented
 by contributions from the active space in the calling routine, if needed.

 As concerns the MPI algorithm: cgq and gscq are distributed inside comm_band and
 each proc has nband_me non-overlapping blocks. Each proc in comm_band call dfpt_cgwf
 with a different u^1_{band} state (the band index is therefore LOCAL) but then
 we need to communicate every time we call projbd to orthogonalize wrt the MPI-distributed cgq.
 Other arrays such as rank_band and bands_treated_now are GLOBAL i.e. all procs in comm_band
 are supposed to call dfpt_cgwf with the same values.

INPUTS

  u1_band_=which particular band we are converging (LOCAL)
    A negative value is used when the routine is called in band-mode with MPI-distributed cgq
    to indicate that this proc is not optimizing abs(u1_band). Used when calling dfpt_cgw in EPH.
  band_me=cpu-local index in cgq array of band which we are converging.
  berryopt=option for Berry phase
  cgq(2,mcgq)=wavefunction coefficients for MY bands at k+Q
  cwave0(2,npw*nspinor)=GS wavefunction at k, in reciprocal space
  cwaveprj0(natom,nspinor*usecprj)=GS wave function at k projected with nl projectors
  rank_band(nband)=rank of processor in band_comm which have the other bands for cgq below (GLOBAL)
  bands_treated_now(nband)  (GLOBAL)
  eig0_k=0-order eigenvalues for the present wavefunction at k
  eig0_kq(nband)=GS eigenvalues at k+Q (hartree)
  grad_berry(2,mpw1,dtefield%mband_occ) = the gradient of the Berry phase term
  gscq(2,mgscq)=<g|S|Cnk+q> coefficients for MY bands (PAW) at k+Q
  gs_hamkq <type(gs_hamiltonian_type)>=all data for the Hamiltonian at k+Q
  icgq=shift to be applied on the location of data in the array cgq
  igscq=shift to be applied on the location of data in the array gscq
  idir=direction of the perturbation
  ipert=type of the perturbation
  mcgq=second dimension of the cgq array
  mgscq=second dimension of gscq, with only mband_mem bands
  mpi_enreg=information about MPI parallelization
  mpw1=maximum number of planewave for first-order wavefunctions
  natom=number of atoms in cell.
  nband=number of bands.
  nband_me=number of bands on this cpu
  nbdbuf=number of buffer bands for the minimisation
  nline=number of line minimizations per band.
  npw=number of planewaves in basis sphere at given k.
  npw1=number of planewaves in basis sphere at k+Q
  nspinor=number of spinorial components of the wavefunctions
  opt_gvnlx1=option controlling the use of gvnlx1 array:
            0: used as an output
            1: used as an input:    - used only for ipert=natom+2
                 NCPP: contains the ddk 1-st order WF
                 PAW: contains frozen part of 1st-order hamiltonian
            2: used as input/ouput:    - used only for PAW and ipert=natom+2
                 At input: contains the ddk 1-st order WF (times i)
                 At output: contains frozen part of 1st-order hamiltonian
  prtvol=control print volume and debugging output
  quit= if 1, proceeds to smooth ending of the job.
  dfpt_sciss=scissor shift (Ha)
  tolrde=tolerance on the ratio of differences of energies (for the line minimisation)
  tolwfr=tolerance on largest wf residual
  usedcwavef=flag controlling the use of dcwavef array (PAW only):
             0: not used (not allocated)
             1: used as input
             2: used as output
  wfoptalg=govern the choice of algorithm for wf optimisation (0 or 10, at present)

OUTPUT

  eig1_k(2*nband**2)=matrix of first-order eigenvalues (hartree)
                     eig1(:,ii,jj)=<C0 ii|H1|C0 jj> for norm-conserving psps
                     eig1(:,ii,jj)=<C0 ii|H1-(eig0_k+eig0_k+q)/2.S(1)|C0 jj> for PAW
  ghc(2,npw1*nspinor)=<G|H0-eig0_k.I|C1 band,k> (NCPP) or <G|H0-eig0_k.S0|C1 band,k> (PAW)
  gvnlxc(2,npw1*nspinor)=<G|Vnl+VFockACE|C1 band,k>
  gvnlx1(2,npw1*nspinor)=  part of <G|K1+Vnl1|C0 band,k> not depending on VHxc1           (NCPP)
                       or part of <G|K1+Vnl1-eig0k.S1|C0 band,k> not depending on VHxc1 (PAW)
  resid=wf residual for current band
  gh1c_n= <G|H1|C0 band,k> (NCPP) or <G|H1-eig0k.S1|C0 band,k> (PAW).
          This vector is not projected on the subspace orthogonal to the WF.
  === if gs_hamkq%usepaw==1 ===
  gsc(2,npw1*nspinor*usepaw)=<G|S0|C1 band,k>

SIDE EFFECTS

  Input/Output:
  cwavef(2,npw1*nspinor)=first-order  wavefunction at k,q, in reciprocal space (updated)

  === if gs_hamkq%usepaw==1 ===
  cwaveprj(natom,nspinor)= wave functions at k projected with nl projectors

  === if also usedcwavef>0 ===
  dcwavef(2,npw1*nspinor)=change of wavefunction due to change of overlap:
         dcwavef is delta_Psi(1)=-1/2.Sum_{j}[<C0_k+q_j|S(1)|C0_k_i>.|C0_k+q_j>]
         see PRB 78, 035105 (2008) [[cite:Audouze2008]], Eq. (42)
         input if usedcwavef=1, output if usedcwavef=2

SOURCE

 152 subroutine dfpt_cgwf(u1_band_,band_me,rank_band,bands_treated_now,berryopt,cgq,cwavef,cwave0,cwaveprj,cwaveprj0,&
 153 & rf2,dcwavef,&
 154 & eig0_k,eig0_kq,eig1_k,ghc,gh1c_n,grad_berry,gsc,gscq,&
 155 & gs_hamkq,gvnlxc,gvnlx1,icgq,idir,ipert,igscq,&
 156 & mcgq,mgscq,mpi_enreg,mpw1,natom,nband,nband_me,nbdbuf,nline_in,npw,npw1,nspinor,&
 157 & opt_gvnlx1,prtvol,quit,resid,rf_hamkq,dfpt_sciss,tolrde,tolwfr,&
 158 & usedcwavef,wfoptalg,nlines_done)
 159 
 160 !Arguments ------------------------------------
 161 !scalars
 162  integer,intent(in) :: u1_band_,berryopt
 163  integer,intent(in) :: band_me, nband_me
 164  integer,intent(in) :: icgq,idir,igscq,ipert,mcgq,mgscq,mpw1,natom,nband
 165  integer,intent(in) :: nbdbuf,nline_in,npw,npw1,nspinor,opt_gvnlx1
 166  integer,intent(in) :: prtvol,quit,usedcwavef,wfoptalg
 167  integer,intent(inout) :: nlines_done
 168  real(dp),intent(in) :: dfpt_sciss,tolrde,tolwfr
 169  real(dp),intent(out) :: resid
 170  type(MPI_type),intent(in) :: mpi_enreg
 171  type(rf2_t), intent(in) :: rf2
 172  type(gs_hamiltonian_type),intent(inout) :: gs_hamkq
 173  type(rf_hamiltonian_type),intent(inout) :: rf_hamkq
 174 !arrays
 175  integer,intent(in) :: rank_band(nband)
 176  integer,intent(in) :: bands_treated_now (nband)
 177  real(dp),intent(in) :: cgq(2,mcgq),eig0_kq(nband)
 178  real(dp),intent(in) :: eig0_k(nband)
 179  real(dp),intent(in) :: grad_berry(2,mpw1*nspinor,nband),gscq(2,mgscq)
 180  real(dp),intent(inout) :: cwave0(2,npw*nspinor),cwavef(2,npw1*nspinor)
 181  real(dp),intent(inout) :: dcwavef(2,npw1*nspinor*((usedcwavef+1)/2))
 182  real(dp),intent(inout) :: eig1_k(2*nband**2)
 183  real(dp),intent(out) :: gh1c_n(2,npw1*nspinor)
 184  real(dp),intent(out) :: ghc(2,npw1*nspinor)
 185  real(dp),intent(out) :: gsc(2,npw1*nspinor*gs_hamkq%usepaw)
 186  real(dp),intent(inout) :: gvnlx1(2,npw1*nspinor),gvnlxc(2,npw1*nspinor)
 187  type(pawcprj_type),intent(inout) :: cwaveprj(natom,nspinor)
 188  type(pawcprj_type),intent(inout) :: cwaveprj0(natom,nspinor*gs_hamkq%usecprj)
 189 
 190 !Local variables-------------------------------
 191 !scalars
 192  integer,parameter :: level=15,tim_getgh1c=1,tim_getghc=2,tim_projbd=2
 193  integer,save :: nskip=0
 194  integer :: cpopt,iband,igs,iline,indx_cgq,ipw,me_g0,comm_fft
 195  integer :: iband_me, jband_me, ierr, me_band, np_band, band_off, u1_band !, unit_me
 196  integer :: ipws,ispinor,istwf_k,jband,nline,optlocal,optnl,dc_shift_band,sij_opt
 197  integer :: test_is_ok,useoverlap,usepaw,usevnl,usetolrde
 198  real(dp) :: d2edt2,d2te,d2teold,dedt,deltae,deold,dotgg
 199  real(dp) :: dotgp,doti,dotr,eshift,eshiftkq,gamma,optekin,prod1,prod2
 200  real(dp) :: theta,tol_restart,u1h0me0u1
 201  logical :: gen_eigenpb
 202  integer :: skipme, bands_skipped_now(nband)
 203  character(len=500) :: msg
 204 !arrays
 205  real(dp) :: dummy(0,0),tsec(2)
 206  real(dp) :: eig1_k_loc(2,nband,nband)
 207  real(dp),allocatable :: conjgr(:,:),cwaveq(:,:),cwwork(:,:),direc(:,:)
 208  real(dp),allocatable :: gberry(:,:),gh1c(:,:),gh_direc(:,:),gresid(:,:),gvnlx1_saved(:,:)
 209  real(dp),allocatable :: gs1c(:,:),gvnlx_direc(:,:),pcon(:),sconjgr(:,:)
 210  real(dp),allocatable :: scprod(:,:),work(:,:),work1(:,:),work2(:,:)
 211  real(dp),pointer :: kinpw1(:)
 212  type(pawcprj_type),allocatable :: conjgrprj(:,:)
 213  type(pawcprj_type) :: cprj_dummy(0,0)
 214 
 215 ! *********************************************************************
 216 
 217  DBG_ENTER("COLL")
 218 
 219  call timab(122,1,tsec)
 220 
 221  !======================================================================
 222  !========= LOCAL VARIABLES DEFINITIONS AND ALLOCATIONS ================
 223  !====================================================================
 224 
 225  u1_band = abs(u1_band_)
 226 
 227  nline = nline_in
 228  usetolrde = 1
 229 
 230  ! LB-23/04/17:
 231  ! For ipert=natom+10 or ipert=natom+11, the Sternheimer equation is non-self-consistent, so we have
 232  ! to solve a true linear problem (A.X = B) for each kpoint and band. In this case, the conjugate
 233  ! gradient algorithm can find the exact solution (within the numerical precision) with only ONE call
 234  ! of dfpt_cgwf (per kpoint and band...). The solution is found with at most N iterations, N being the dimension of X.
 235  ! In order to avoid useless scfcv loops (and many calls of rf2_init, which can be time consuming),
 236  ! we want to leave this routine only if 'tolwfr' is reached. Consequently, 'tolrde' is not used and 'nline' is set to 100.
 237  ! One could use nline=npw1*nspinor (>> 100 !) instead, but when the method cannot converge (i.e when tolwfr is lower than the
 238  ! numerical noise) the program could be stuck here for a very long time.
 239  ! NOTE : This is also true for ipert==natom+1, but a lot of references in the test suite have to be changed...
 240  if(ipert==natom+10.or.ipert==natom+11) then
 241    nline = 100 ! The default value is only 4... This should be sufficient to converge with nstep=1 or 2
 242    if (nline_in > 100) nline = nline_in ! Keep the possibility to increase nline
 243    usetolrde = 0  ! see below
 244  end if
 245 
 246  if (prtvol>=10) then
 247    !Tell us what is going on:
 248    write(msg,'(a,i6,2x,a,i3,a)')' --- dfpt_cgwf is called for band',u1_band,'for',nline,' lines'
 249    call wrtout(std_out,msg)
 250  end if
 251 
 252  me_g0 = mpi_enreg%me_g0
 253  comm_fft = mpi_enreg%comm_fft
 254  me_band = mpi_enreg%me_band
 255  np_band = mpi_enreg%nproc_band
 256  !unit_me = 300+u1_band
 257 
 258  skipme = 0
 259 
 260  ! if PAW, one has to solve a generalized eigenproblem
 261  usepaw=gs_hamkq%usepaw
 262  gen_eigenpb=(usepaw==1)
 263  useoverlap=0;if (gen_eigenpb) useoverlap=1
 264 
 265  ! Use scissor shift on 0-order eigenvalue
 266  eshift=eig0_k(u1_band)-dfpt_sciss
 267 
 268  ! Additional initializations
 269  istwf_k=gs_hamkq%istwf_k
 270  optekin=0;if (wfoptalg>=10) optekin=1
 271  tol_restart=tol12;if (gen_eigenpb) tol_restart=tol8
 272  if (ipert == natom+10 .or. ipert == natom+11) tol_restart = tol7
 273 
 274  kinpw1 => gs_hamkq%kinpw_kp
 275 
 276  ! Memory allocations
 277  ABI_MALLOC(gh1c,(2,npw1*nspinor))
 278  ABI_MALLOC(pcon,(npw1))
 279  ABI_MALLOC(scprod,(2,nband_me))
 280 
 281  if (berryopt== 4.or.berryopt== 6.or.berryopt== 7.or. berryopt==14.or.berryopt==16.or.berryopt==17) then
 282    ABI_MALLOC(gberry,(2,npw1*nspinor))
 283    gberry(:,1:npw1*nspinor)=grad_berry(:,1:npw1*nspinor,u1_band)
 284  else
 285    ABI_MALLOC(gberry,(0,0))
 286  end if
 287 
 288 !TODO MJV: this should probably be adjusted as well for natom+10/11 perts: set to band_me instead of u1_band
 289  dc_shift_band=(u1_band-1)*npw1*nspinor
 290 
 291 ! this is used many times - no use de and re allocating
 292  ABI_MALLOC(work,(2,npw1*nspinor))
 293 
 294 !DEBUG!! Several checking statements
 295  if (prtvol==-level.or.prtvol==-19.or.prtvol==-20) then
 296    write(msg,'(a)') " ** cgwf3 : debugging mode, tests will be done"
 297    ! Search CGWF3_WARNING in the log file to find errors (if any)
 298    call wrtout(std_out,msg)
 299    ABI_MALLOC(work1,(2,npw1*nspinor))
 300    !  ===== Check <Psi_k+q^(0)|S(0)|Psi_k+q^(0)>=delta_{ij}
 301    if (.not.gen_eigenpb) work1(:,:)=cgq(:,1+npw1*nspinor*(band_me-1)+icgq:npw1*nspinor*band_me+icgq)
 302    if (     gen_eigenpb) work1(:,:)=gscq(:,1+npw1*nspinor*(band_me-1)+igscq:npw1*nspinor*band_me+igscq)
 303    ! NB: this loop is not band-block diagonal
 304    ! the present logic does a lot of communication: 1 for each band.
 305    ! Could be grouped outside the jband loop into 1 big one, but if the wf are big this is a waste. Tradeoffs...
 306    jband_me = 0
 307    do jband=1,nband
 308      if (bands_treated_now(jband) == 0) cycle
 309      if (rank_band(jband) == me_band) then
 310        jband_me = jband_me + 1
 311        work(:,:)=cgq(:,1+npw1*nspinor*(jband_me-1)+icgq:npw1*nspinor*jband_me+icgq)
 312      end if
 313      ! send to everyone else, who is also working on jband right now
 314      call xmpi_bcast(work,rank_band(jband),mpi_enreg%comm_band,ierr)
 315 
 316      call dotprod_g(dotr,doti,istwf_k,npw1*nspinor,2,work1,work,me_g0,mpi_enreg%comm_spinorfft)
 317      test_is_ok=1
 318      if(jband==u1_band) then
 319        if(abs(dotr-one)>tol12) test_is_ok=0
 320      else
 321        if(abs(dotr)>tol12) test_is_ok=0
 322      end if
 323      if(abs(doti)>tol12) test_is_ok=0
 324      if(test_is_ok/=1) then
 325        write(msg,'(a,i3,a,2es22.15)') "CGWF3_WARNING : <Psi_k+q,i^(0)|S(0)|Psi_k+q,j^(0)> for band j=",jband," is ",dotr,doti
 326        call wrtout(std_out,msg)
 327      end if
 328    end do
 329 
 330    !  ===== Check Pc.Psi_k+q^(0)=0
 331    ! each jband is checked by everybody, against the bands attributed to present cpu
 332    ! NB - this does not depend on the "band" input
 333    jband_me = 0
 334    do jband=1,nband
 335      if (bands_treated_now(jband) == 0) cycle
 336      if (rank_band(jband) == me_band) then
 337        jband_me = jband_me + 1
 338        work(:,:)=cgq(:,1+npw1*nspinor*(jband_me-1)+icgq:npw1*nspinor*jband_me+icgq)
 339      end if
 340      ! send to everyone else, who is also working on jband right now
 341      call xmpi_bcast(work,rank_band(jband),mpi_enreg%comm_band,ierr)
 342      work1 = work
 343 
 344      call projbd(cgq,work,-1,icgq,igscq,istwf_k,mcgq,mgscq,nband_me,npw1,nspinor,&
 345        gscq,scprod,0,tim_projbd,useoverlap,me_g0,comm_fft)
 346 
 347      ! if bands are parallelized, I have only projected against bands on my cpu
 348      !   Pc|work>  = |work> - Sum_l <psi_{k+q, l}|work> |psi_{k+q, l}>
 349      !             = Sum_nproc_band (|work> - Sum_{my l} <psi_{k+q, l}|work> |psi_{k+q, l}>) - (nproc_band-1) |work>
 350      !
 351      if (mpi_enreg%nproc_band > 1) then
 352        call xmpi_sum(work,mpi_enreg%comm_band,ierr)
 353        !TODO: make this a blas call? zaxpy
 354        work = work - (mpi_enreg%nproc_band-1)*work1
 355      end if
 356 
 357      call sqnorm_g(dotr,istwf_k,npw1*nspinor,work,me_g0,comm_fft)
 358      if(sqrt(dotr)>tol12) then
 359        write(msg,'(a,i3,a,es22.15)') "CGWF3_WARNING : Norm of Pc.Psi_k+q_j^(0) for band j=",jband," is ",sqrt(dotr)
 360        call wrtout(std_out,msg)
 361      end if
 362    end do
 363 
 364    ! ===== Check Pc.Psi_k^(0)=0
 365    ! NB: this _does_ depend on the input band "band" stored in cwave0
 366    do iband = 1, nband
 367      if (bands_treated_now(iband) == 0) cycle
 368      if (rank_band(iband) == me_band) work(:,:)=cwave0(:,:)
 369 
 370      ! send to everyone else, who is also working on jband right now
 371      call xmpi_bcast(work,rank_band(iband),mpi_enreg%comm_band,ierr)
 372      work1 = work
 373      call projbd(cgq,work,-1,icgq,igscq,istwf_k,mcgq,mgscq,nband_me,npw1,nspinor,&
 374        gscq,scprod,0,tim_projbd,useoverlap,me_g0,comm_fft)
 375 
 376      if (mpi_enreg%nproc_band > 1) then
 377        call xmpi_sum(work,mpi_enreg%comm_band,ierr)
 378        !TODO: make this a blas call? zaxpy
 379        work = work - (mpi_enreg%nproc_band-1)*work1
 380      end if
 381 
 382      call sqnorm_g(dotr,istwf_k,npw1*nspinor,work,me_g0,comm_fft)
 383      if(sqrt(dotr)>tol12) then
 384        write(msg,'(a,i3,a,es22.15)') "CGWF3_WARNING : Norm of Pc.Psi_k^(0) for band ",u1_band," is ",sqrt(dotr)
 385        call wrtout(std_out,msg)
 386      end if
 387    end do
 388 
 389 
 390    ! ===== Check Pc.dcwavef=0 (for 2nd order only)
 391    if(ipert==natom+10.or.ipert==natom+11) then
 392      do iband = 1, nband
 393        if (bands_treated_now(iband) == 0) cycle
 394        if (rank_band(iband) == me_band) then
 395          work(:,:)=rf2%dcwavef(:,1+dc_shift_band:npw1*nspinor+dc_shift_band)
 396        end if
 397        ! send to everyone else, who is also working on jband right now
 398        call xmpi_bcast(work,rank_band(iband),mpi_enreg%comm_band,ierr)
 399        work1 = work
 400        call projbd(cgq,work,-1,icgq,igscq,istwf_k,mcgq,mgscq,nband_me,npw1,nspinor,&
 401          gscq,scprod,0,tim_projbd,useoverlap,me_g0,comm_fft)
 402 
 403        if (mpi_enreg%nproc_band > 1) then
 404          call xmpi_sum(work,mpi_enreg%comm_band,ierr)
 405          !TODO: make this a blas call? zaxpy
 406          work = work - (mpi_enreg%nproc_band-1)*work1
 407        end if
 408 
 409        call sqnorm_g(dotr,istwf_k,npw1*nspinor,work,me_g0,comm_fft)
 410        if(sqrt(dotr)>tol10) then
 411          write(msg,'(a,i3,a,es22.15)') "CGWF3_WARNING : Norm of Pc.dcwavef for band ",u1_band," is ",sqrt(dotr)
 412          call wrtout(std_out,msg)
 413        end if
 414      end do
 415    end if
 416 
 417    ! ===== Check Pc^*.S(0).Psi_k+q^(0)=0
 418    ! NB: here again does not depend on input "u1_band"
 419    if (gen_eigenpb) then
 420      jband_me = 0
 421      do jband=1,nband
 422        if (bands_treated_now(jband) == 0) cycle
 423        if (rank_band(jband) == me_band) then
 424          jband_me = jband_me + 1
 425          work(:,:)=gscq(:,1+npw1*nspinor*(jband_me-1)+igscq:npw1*nspinor*jband_me+igscq)
 426        end if
 427        ! send to everyone else, who is also working on jband right now
 428        call xmpi_bcast(work,rank_band(jband),mpi_enreg%comm_band,ierr)
 429        work1 = work
 430 
 431        call projbd(gscq,work,-1,igscq,icgq,istwf_k,mgscq,mcgq,nband_me,npw1,nspinor,&
 432          cgq,scprod,0,tim_projbd,useoverlap,me_g0,comm_fft)
 433 
 434        if (mpi_enreg%nproc_band > 1) then
 435          call xmpi_sum(work,mpi_enreg%comm_band,ierr)
 436          !TODO: make this a blas call? zaxpy
 437          work = work - (mpi_enreg%nproc_band-1)*work1
 438        end if
 439 
 440        call sqnorm_g(dotr,istwf_k,npw1*nspinor,work,me_g0,comm_fft)
 441        if(sqrt(dotr)>tol12) then
 442          write(msg,'(a,i3,a,es22.15)') "CGWF3_WARNING : Norm of Pc^*.S(0).Psi_k+q_j^(0) for band j=",jband," is ",sqrt(dotr)
 443          call wrtout(std_out,msg)
 444        end if
 445      end do
 446    end if
 447    ABI_FREE(work1)
 448  end if
 449 !ENDDEBUG!! Several checking statements
 450 
 451 
 452  !======================================================================
 453  !========== INITIALISATION OF MINIMIZATION ITERATIONS =================
 454  !======================================================================
 455 
 456  if(ipert/=natom+10.and.ipert/=natom+11) then
 457    !  The following is needed for first order perturbations only
 458    !  Otherwise, the work is already done in rf2_init (called in dfpt_vtowfk.F90)
 459 
 460    ! Compute H(1) applied to GS wavefunction Psi(0)
 461    if (gen_eigenpb) then
 462      sij_opt=1
 463      ABI_MALLOC(gs1c,(2,npw1*nspinor))
 464    else
 465      ABI_MALLOC(gs1c,(0,0))
 466      sij_opt=0
 467    end if
 468    usevnl=1; optlocal=1; optnl=2
 469    if (prtvol==-level.or.prtvol==-19) then
 470      ABI_MALLOC(gvnlx1_saved,(2,npw1*nspinor))
 471      gvnlx1_saved(:,:) = gvnlx1(:,:)
 472    end if
 473    call getgh1c(berryopt,cwave0,cwaveprj0,gh1c,gberry,gs1c,gs_hamkq,gvnlx1,idir,ipert,eshift,&
 474      mpi_enreg,optlocal,optnl,opt_gvnlx1,rf_hamkq,sij_opt,tim_getgh1c,usevnl)
 475 
 476    if (gen_eigenpb) then
 477      if (ipert/=natom+2) then  ! S^(1) is zero for ipert=natom+2
 478 !$OMP PARALLEL
 479 !$OMP DO
 480        do ipw=1,npw1*nspinor
 481          gh1c (1:2,ipw)=gh1c (1:2,ipw)-eshift*gs1c(1:2,ipw)
 482        end do
 483 !$OMP END DO NOWAIT
 484        if (opt_gvnlx1/=1) then
 485 !$OMP DO
 486          do ipw=1,npw1*nspinor
 487            gvnlx1(1:2,ipw)=gvnlx1(1:2,ipw)-eshift*gs1c(1:2,ipw)
 488          end do
 489 !$OMP END DO NOWAIT
 490        end if
 491 !$OMP END PARALLEL
 492      end if
 493 
 494      ! If generalized eigenPb and dcwavef requested, compute it:
 495      ! dcwavef is delta_Psi(1)=-1/2.Sum_{j}[<C0_k+q_j|S(1)|C0_k_i>.|C0_k+q_j>]
 496      ! see PRB 78, 035105 (2008) [[cite:Audouze2008]], Eq. (42)
 497      if (usedcwavef==2) then
 498        call getdc1(u1_band,rank_band,bands_treated_now,cgq,cprj_dummy,dcwavef,cprj_dummy,&
 499 &           0,icgq,istwf_k,mcgq,0,&
 500 &           mpi_enreg,natom,nband,nband_me,npw1,nspinor,0,gs1c)
 501      end if
 502    end if ! gen_eigenpb
 503 
 504  else
 505    ! 2nd order case (wrt k perturbation)
 506    ! Copy RHS_Stern(:,:) of the given band in gh1c
 507    gh1c(:,:)=rf2%RHS_Stern(:,1+dc_shift_band:npw1*nspinor+dc_shift_band)
 508  end if
 509 
 510  if (prtvol==-level.and.usedcwavef==2) then
 511    !Check that Pc^*.(H^(0)-E.S^(0)).delta_Psi^(1) is zero ! This is a consequence of P_c delta_Psi^(1) = 0
 512    ABI_MALLOC(cwwork,(2,npw1*nspinor))
 513    do iband = 1, nband
 514      if (bands_treated_now(iband) == 0) cycle
 515      if (rank_band(iband) == me_band) then
 516        cwwork=dcwavef
 517        !  - Apply H^(0)-E.S^(0) to delta_Psi^(1)
 518        sij_opt=0;if (gen_eigenpb) sij_opt=-1
 519        cpopt=-1
 520        ABI_MALLOC(work1,(2,npw1*nspinor*((sij_opt+1)/2)))
 521        ABI_MALLOC(work2,(2,npw1*nspinor))
 522        call getghc(cpopt,cwwork,conjgrprj,work,work1,gs_hamkq,work2,eshift,mpi_enreg,&
 523          1,prtvol,sij_opt,tim_getghc,0,select_k=KPRIME_H_KPRIME)
 524        ABI_FREE(work1)
 525        ABI_FREE(work2)
 526      end if
 527      call xmpi_bcast(work,rank_band(iband),mpi_enreg%comm_band,ierr)
 528      cwwork=work
 529 
 530      ! -Apply Pc^*
 531      call projbd(gscq,cwwork,-1,igscq,icgq,istwf_k,mgscq,mcgq,nband_me,npw1,nspinor,&
 532        cgq,scprod,0,tim_projbd,useoverlap,me_g0,comm_fft)
 533 
 534      call xmpi_sum(cwwork,mpi_enreg%comm_band,ierr)
 535 
 536      if (mpi_enreg%nproc_band > 1) then
 537        !TODO: make this a blas call? zaxpy
 538        cwwork = cwwork - (mpi_enreg%nproc_band-1)*work
 539      end if
 540 
 541      call sqnorm_g(dotr,istwf_k,npw1*nspinor,cwwork,me_g0,comm_fft)
 542      if(sqrt(dotr)>tol12) then
 543        write(msg,'(a,i3,a,es22.15)') 'CGWF3_WARNING : |Pc^*.(H^(0)-E.S^(0)).delta_Psi^(1)| (band ',u1_band,')=',sqrt(dotr)
 544        call wrtout(std_out,msg)
 545      end if
 546    end do
 547    ABI_FREE(cwwork)
 548  end if ! prtvol==-level.and.usedcwavef==2
 549 
 550  call cg_zcopy(npw1*nspinor,gh1c,gh1c_n)
 551 
 552  ! Projecting out all bands
 553  ! While we could avoid calculating all the eig1_k to obtain the perturbed density,
 554  ! we do need all of the matrix elements when outputting the full 1st-order wfn.
 555  ! Note the subtlety:
 556  ! -For the generalized eigenPb, S|cgq> is used in place of |cgq>,
 557  ! in order to apply P_c+ projector (see PRB 73, 235101 (2006) [[cite:Audouze2006]], Eq. (71), (72))
 558  eig1_k_loc = zero
 559  do iband = 1, nband
 560    if (bands_treated_now(iband) == 0) cycle
 561    if (rank_band(iband) == me_band) work = gh1c
 562    call xmpi_bcast(work,rank_band(iband),mpi_enreg%comm_band,ierr)
 563 
 564    if(gen_eigenpb)then
 565      call projbd(gscq,work,-1,igscq,icgq,istwf_k,mgscq,mcgq,nband_me,npw1,nspinor,&
 566        cgq,scprod,0,tim_projbd,useoverlap,me_g0,comm_fft)
 567    else
 568      call projbd(cgq,work,-1,icgq,0,istwf_k,mcgq,mgscq,nband_me,npw1,nspinor,&
 569        dummy,scprod,0,tim_projbd,useoverlap,me_g0,comm_fft)
 570    end if
 571 
 572    ! sum projections against all bands k+q
 573    call xmpi_sum_master(work, rank_band(iband), mpi_enreg%comm_band, ierr)
 574 
 575    ! scprod now contains scalar products of band iband (runs over all bands in current queue) with local bands j
 576    jband_me = 0
 577    do jband=1,nband
 578      if (rank_band(jband) /= me_band) cycle
 579      jband_me = jband_me + 1
 580      eig1_k_loc(:,jband,iband)=scprod(:,jband_me)
 581    end do
 582 
 583    ! save this for me only
 584    !TODO: make this a blas call? zaxpy
 585    if (rank_band(iband) == me_band) then
 586      gh1c = work - (mpi_enreg%nproc_band-1)*gh1c
 587    end if
 588 
 589  end do !iband
 590 
 591 
 592  if(ipert/=natom+10.and.ipert/=natom+11) then
 593    ! For ipert=natom+10 or natom+11, this is done in rf2_init
 594 
 595    ! The array eig1_k contains:
 596    ! <u_(jband,k+q)^(0)|H_(k+q,k)^(1)|u_(iband,k)^(0)>                              (NC psps)
 597    ! or <u_(jband,k+q)^(0)|H_(k+q,k)^(1)-(eig0_k+eig0_k+q)/2.S^(1)|u_(iband,k)^(0)> (PAW)
 598    ! so in case of PAW need to add the overlap term below
 599    !
 600    ! NB: 2019 11 15: MJV: I swapped the names of jband and iband to be more consistent with other loops above
 601    if (gen_eigenpb) then
 602      do iband=1,nband
 603        if (bands_treated_now(iband) == 0) cycle
 604        if (rank_band(iband) == me_band) work = gs1c
 605        ! for iband on this proc, bcast to all others to get full line of iband,jband pairs
 606        call xmpi_bcast(work,rank_band(iband),mpi_enreg%comm_band,ierr)
 607 
 608        ! add PAW overlap correction term to present iband (all procs) and local jband elements
 609        indx_cgq=icgq
 610        do jband=1,nband
 611          if (rank_band(jband) /= me_band) cycle
 612 
 613          eshiftkq=half*(eig0_kq(jband)-eig0_k(iband))
 614          call dotprod_g(dotr,doti,istwf_k,npw1*nspinor,2,cgq(:,indx_cgq+1:indx_cgq+npw1*nspinor),work,&
 615            me_g0,mpi_enreg%comm_spinorfft)
 616          eig1_k_loc(1,jband,iband)=eig1_k_loc(1,jband,iband)-eshiftkq*dotr
 617          eig1_k_loc(2,jband,iband)=eig1_k_loc(2,jband,iband)-eshiftkq*doti
 618          indx_cgq=indx_cgq+npw1*nspinor
 619        end do
 620      end do ! iband
 621    end if ! PAW and generalized eigenproblem
 622 
 623    ! No more need of gs1c
 624    ABI_FREE(gs1c)
 625 
 626    ! reduce over band procs to fill in the matrix for all jband (distributed over procs)
 627    ! must only do this once for eig1_k_loc: now have all jband for current ibands on all procs
 628    call xmpi_sum(eig1_k_loc,mpi_enreg%comm_band,ierr)
 629 
 630    ! TODO: I think this is just a reshape
 631    do iband=1,nband
 632      if (bands_treated_now(iband) == 0) cycle
 633      band_off=(iband-1)*2*nband
 634      do jband=1,nband
 635        eig1_k(2*jband-1+band_off)=eig1_k_loc(1,jband,iband)
 636        eig1_k(2*jband  +band_off)=eig1_k_loc(2,jband,iband)
 637      end do
 638    end do
 639  end if ! ipert/=natom+10.and.ipert/=natom+11
 640 
 641  ! Filter the wavefunctions for large modified kinetic energy (see routine mkkin.f)
 642  ! TODO: should this also be applied to cwaveq for the preconditioning with kinpw1 below?
 643  do ispinor=1,nspinor
 644    ipws=(ispinor-1)*npw1
 645 !$OMP PARALLEL DO PRIVATE(ipw) SHARED(cwavef,kinpw1,ipws,npw1)
 646    do ipw=1+ipws,npw1+ipws
 647      if(kinpw1(ipw-ipws)>huge(zero)*1.d-11)then
 648        cwavef(1:2,ipw)=zero
 649      end if
 650    end do
 651  end do
 652 
 653  ! Apply the orthogonality condition: <C1 k,q|C0 k+q>=0 (NCPP) or <C1 k,q|S0|C0 k+q>=0 (PAW)
 654  ! Project out all bands from cwavef, i.e. apply P_c projector on cwavef
 655  ! (this is needed when there are some partially or unoccupied states)
 656  do iband = 1, nband
 657    if (bands_treated_now(iband) == 0) cycle
 658 
 659    if (rank_band(iband) == me_band) work = cwavef
 660    call xmpi_bcast(work,rank_band(iband),mpi_enreg%comm_band,ierr)
 661 
 662    call projbd(cgq,work,-1,icgq,igscq,istwf_k,mcgq,mgscq,nband_me,npw1,nspinor,&
 663      gscq,scprod,0,tim_projbd,useoverlap,me_g0,comm_fft)
 664 
 665    call xmpi_sum_master(work, rank_band(iband), mpi_enreg%comm_band, ierr)
 666 
 667    ! save this for me_band only
 668    !TODO: make this a blas call? zaxpy
 669    if (rank_band(iband) == me_band) then
 670      cwavef = work - (mpi_enreg%nproc_band-1)*cwavef
 671    end if
 672  end do
 673 
 674 
 675  if(ipert/=natom+10.and.ipert/=natom+11) then
 676    ! If PAW, the orthogonality condition is <C1 k,q|S0|C0 k+q>+1/2<C0 k|S1|C0 k+q>=0
 677    if (usepaw==1.and.usedcwavef>0) then
 678 !$OMP PARALLEL DO
 679      do ipw=1,npw1*nspinor
 680        cwavef(1:2,ipw)=cwavef(1:2,ipw)+dcwavef(1:2,ipw)
 681      end do
 682    end if
 683  else
 684    ! In 2nd order case, dcwavef/=0 even in NC, and it is already computed in rf2_init (called in dfpt_vtowfk.F90)
 685    do ipw=1,npw1*nspinor
 686      cwavef(:,ipw)=cwavef(:,ipw)+rf2%dcwavef(:,ipw+dc_shift_band)
 687    end do
 688  end if
 689 
 690  if (u1_band>max(1,nband-nbdbuf))then
 691    ! Treat the case of buffer bands
 692    cwavef=zero
 693    ghc   =zero
 694    gvnlxc =zero
 695    if (gen_eigenpb) gsc=zero
 696    if (usedcwavef==2) dcwavef=zero
 697    if (usepaw==1) then
 698      call pawcprj_set_zero(cwaveprj)
 699    end if
 700    if (usedcwavef==2) dcwavef=zero
 701    ! Number of one-way 3D ffts skipped
 702    nskip=nskip+nline
 703 
 704    ! At the end of the treatment of a set of bands, write the number of one-way 3D ffts skipped
 705    if (xmpi_paral==1 .and. u1_band==nband .and. prtvol>=10) then
 706      write(msg,'(a,i0)')' dfpt_cgwf: number of one-way 3D ffts skipped in cgwf3 until now =',nskip
 707      call wrtout(std_out,msg)
 708    end if
 709 
 710    skipme = 1
 711  end if
 712 
 713  ! If not a buffer band, perform the optimisation
 714 
 715  ABI_MALLOC(conjgr,(2,npw1*nspinor))
 716  ABI_MALLOC(direc,(2,npw1*nspinor))
 717  ABI_MALLOC(gresid,(2,npw1*nspinor))
 718  ABI_MALLOC(cwaveq,(2,npw1*nspinor))
 719  if (usepaw==1) then
 720    ABI_MALLOC(conjgrprj,(natom,nspinor))
 721    call pawcprj_alloc(conjgrprj,0,gs_hamkq%dimcprj)
 722  else
 723    ABI_MALLOC(conjgrprj,(0,0))
 724  end if
 725 
 726  cwaveq(:,:)=cgq(:,1+npw1*nspinor*(band_me-1)+icgq:npw1*nspinor*band_me+icgq)
 727  dotgp=one
 728 
 729  ! Here apply H(0) at k+q to input orthogonalized 1st-order wfs
 730  sij_opt=0;if (gen_eigenpb) sij_opt=1
 731  cpopt=-1+usepaw
 732  call getghc(cpopt,cwavef,cwaveprj,ghc,gsc,gs_hamkq,gvnlxc,eshift,mpi_enreg,1,&
 733    prtvol,sij_opt,tim_getghc,0,select_k=KPRIME_H_KPRIME)
 734 
 735  ! ghc also includes the eigenvalue shift
 736  if (gen_eigenpb) then
 737    call cg_zaxpy(npw1*nspinor, [-eshift, zero], gsc,ghc)
 738  else
 739    call cg_zaxpy(npw1*nspinor, [-eshift, zero], cwavef,ghc)
 740  end if
 741 
 742  ! Initialize resid, in case of nline==0
 743  resid=zero
 744 
 745  bands_skipped_now = 0
 746  bands_skipped_now(u1_band) = skipme
 747  call xmpi_sum(bands_skipped_now,mpi_enreg%comm_band,ierr)
 748 
 749  ! ======================================================================
 750  ! ====== BEGIN LOOP FOR A GIVEN BAND: MINIMIZATION ITERATIONS ==========
 751  ! ======================================================================
 752  do iline=1,nline
 753 
 754 
 755    ! ======================================================================
 756    ! ================= COMPUTE THE RESIDUAL ===============================
 757    ! ======================================================================
 758    ! Note that gresid (=steepest-descent vector, Eq.(26) of PRB 55, 10337 (1996) [[cite:Gonze1997]])
 759    ! is precomputed to guarantee cancellation of errors
 760    ! and allow residuals to reach values as small as 1.0d-24 or better.
 761    if (berryopt== 4.or.berryopt== 6.or.berryopt== 7.or. berryopt==14.or.berryopt==16.or.berryopt==17) then
 762      if (ipert==natom+2) then
 763        if (opt_gvnlx1/=1) gvnlx1=zero
 764 !$OMP PARALLEL DO
 765        do ipw=1,npw1*nspinor
 766          gresid(1:2,ipw)=-ghc(1:2,ipw)-gh1c(1:2,ipw)
 767        end do
 768      else
 769 !$OMP PARALLEL DO
 770        do ipw=1,npw1*nspinor
 771          gresid(1,ipw)=-ghc(1,ipw)-gh1c(1,ipw)+gberry(2,ipw)
 772          gresid(2,ipw)=-ghc(2,ipw)-gh1c(2,ipw)-gberry(1,ipw)
 773        end do
 774      end if
 775    else
 776 !$OMP PARALLEL DO
 777      do ipw=1,npw1*nspinor
 778        gresid(1:2,ipw)=-ghc(1:2,ipw)-gh1c(1:2,ipw)
 779      end do
 780    end if
 781 
 782    ! ======================================================================
 783    ! =========== PROJECT THE STEEPEST DESCENT DIRECTION ===================
 784    ! ========= OVER THE SUBSPACE ORTHOGONAL TO OTHER BANDS ================
 785    ! ======================================================================
 786    ! Project all bands from gresid into direc:
 787    ! The following projection over the subspace orthogonal to occupied bands
 788    ! is not optional in the RF case, unlike the GS case.
 789    ! However, the order of operations could be changed, so that
 790    ! as to make it only applied at the beginning, to H(1) psi(0),
 791    ! so, THIS IS TO BE REEXAMINED
 792    ! Note the subtlety:
 793    ! -For the generalized eigenPb, S|cgq> is used in place of |cgq>,
 794    ! in order to apply P_c+ projector (see PRB 73, 235101 (2006) [[cite:Audouze2006]], Eq. (71), (72)
 795    do iband = 1, nband
 796      if (bands_treated_now(iband) == 0) cycle
 797      if (rank_band(iband) == me_band) work = gresid
 798      call xmpi_bcast(work,rank_band(iband),mpi_enreg%comm_band,ierr)
 799 
 800      if(gen_eigenpb)then
 801        call projbd(gscq,work,-1,igscq,icgq,istwf_k,mgscq,mcgq,nband_me,npw1,nspinor,&
 802          cgq,  scprod,0,tim_projbd,useoverlap,me_g0,comm_fft)
 803      else
 804        call projbd( cgq,work,-1, icgq,   0,istwf_k,mcgq,mgscq,nband_me,npw1,nspinor,&
 805          dummy,scprod,0,tim_projbd,useoverlap,me_g0,comm_fft)
 806      end if
 807 
 808      call xmpi_sum_master(work, rank_band(iband), mpi_enreg%comm_band, ierr)
 809 
 810      ! save this for me_band only
 811      !TODO: make this a blas call? zaxpy
 812      if (rank_band(iband) == me_band) then
 813        gresid = work - (mpi_enreg%nproc_band-1)*gresid
 814      end if
 815    end do
 816 
 817    call cg_zcopy(npw1*nspinor,gresid,direc)
 818 
 819    ! ======================================================================
 820    ! ============== CHECK FOR CONVERGENCE CRITERIA ========================
 821    ! ======================================================================
 822 
 823    ! Compute second-order derivative of the energy using a variational expression
 824    call dotprod_g(prod1,doti,istwf_k,npw1*nspinor,1,cwavef,gresid,me_g0,mpi_enreg%comm_spinorfft)
 825    call dotprod_g(prod2,doti,istwf_k,npw1*nspinor,1,cwavef,gh1c,me_g0,mpi_enreg%comm_spinorfft)
 826    d2te=two*(-prod1+prod2)
 827    ! write(std_out,'(a,f14.6,a,f14.6)') 'prod1 = ',prod1,' prod2 = ',prod2 ! Keep this debugging feature!
 828 
 829    ! Compute <u_m(1)|H(0)-e_m(0)|u_m(1)>
 830    ! (<u_m(1)|H(0)-e_m(0).S|u_m(1)> if gen. eigenPb),
 831    ! that should be positive,
 832    ! except when the eigenvalue eig_mk(0) is higher than
 833    ! the lowest non-treated eig_mk+q(0). For insulators, this
 834    ! has no influence, but for metallic occupations,
 835    ! the conjugate gradient algorithm breaks down. The solution adopted here
 836    ! is very crude, and rely upon the fact that occupancies of such
 837    ! levels should be smaller and smaller with increasing nband, so that
 838    ! a convergence study will give the right result.
 839    ! The same trick is also used later.
 840    u1h0me0u1=-prod1-prod2
 841 
 842    ! Some tolerance is allowed, to account for very small numerical inaccuracies and cancellations.
 843    if(u1h0me0u1<-tol_restart) then ! .and. skipme == 0)then
 844      if (prtvol==-level.or.prtvol==-19) then
 845        write(msg,'(a,es22.13e3)') '  cgwf3: u1h0me0u1 = ',u1h0me0u1
 846        call wrtout(std_out,msg)
 847      end if
 848      cwavef =zero
 849      ghc    =zero
 850      gvnlxc  =zero
 851      if (gen_eigenpb) gsc(:,:)=zero
 852      if (usepaw==1) then
 853        call pawcprj_set_zero(cwaveprj)
 854      end if
 855      ! A negative residual will be the signal of this problem ...
 856      resid=-one
 857      if (prtvol > 0) call wrtout(std_out,' dfpt_cgwf: problem of minimisation (likely metallic), set resid to -1')
 858      ! Number of one-way 3D ffts skipped
 859      nskip=nskip+(nline-iline+1)
 860 
 861      skipme = 1
 862    end if
 863 
 864    if (skipme == 0) then
 865      ! Compute residual (squared) norm
 866      call sqnorm_g(resid,istwf_k,npw1*nspinor,gresid,me_g0,comm_fft)
 867      if (prtvol==-level.or.prtvol==-19)then
 868        write(msg,'(a,a,i3,f14.6,a,a,4es12.4)') ch10,&
 869         ' dfpt_cgwf : iline,eshift     =',iline,eshift,ch10,&
 870         '         resid,prod1,prod2,d2te=',resid,prod1,prod2,d2te
 871        call wrtout(std_out,msg)
 872      end if
 873    end if
 874 
 875    ! If residual sufficiently small stop line minimizations
 876    if (resid<tolwfr .and. skipme == 0) then
 877      if(prtvol>=10)then
 878        write(msg,'(a,i4,a,i2,a,es12.4)')&
 879            ' dfpt_cgwf: band',u1_band,' converged after ',iline,' line minimizations: resid = ',resid
 880        call wrtout(std_out,msg)
 881      end if
 882      nskip=nskip+(nline-iline+1)  ! Number of two-way 3D ffts skipped
 883      skipme = 1
 884    end if
 885 
 886    ! If user require exiting the job, stop line minimisations
 887    if (quit==1) then
 888      write(msg,'(a,i0)')' dfpt_cgwf: user require exiting => skip update of band ',u1_band
 889      call wrtout(std_out,msg)
 890      nskip=nskip+(nline-iline+1)  ! Number of two-way 3D ffts skipped
 891      exit                         ! Exit from the loop on iline
 892    end if
 893 
 894    ! Check that d2te is decreasing on succeeding lines:
 895    if (iline/=1) then
 896      if (d2te>d2teold+tol6 .and. u1_band_ > 0) then
 897        write(msg,'(a,i0,a,e14.6,a,e14.6)')'New trial energy at line ',iline,' = ',d2te,'is higher than former: ',d2teold
 898        ABI_WARNING(msg)
 899      end if
 900    end if
 901    d2teold=d2te
 902 
 903    !write(msg, *)"iline, resid, skipme", iline, resid, skipme; call wrtout(std_out, msg)
 904 
 905    !DEBUG Keep this debugging feature !
 906    !call sqnorm_g(dotr,istwf_k,npw1*nspinor,direc,me_g0,comm_fft)
 907    !write(std_out,*)' dfpt_cgwf : before precon, direc**2=',dotr
 908    !if (gen_eigenpb) then
 909    !call dotprod_g(dotr,doti,istwf_k,npw1*nspinor,1,cwaveq,&
 910    !&                 gscq(:,1+npw1*nspinor*(u1_band-1)+igscq:npw1*nspinor*u1_band+igscq),me_g0,mpi_enreg%comm_spinorfft)
 911    !else
 912    !call sqnorm_g(dotr,istwf_k,npw1*nspinor,cwaveq,me_g0,comm_fft)
 913    !end if
 914    !write(std_out,*)' dfpt_cgwf : before precon, cwaveq**2=',dotr
 915    !ENDDEBUG
 916 
 917    ! ======================================================================
 918    ! ======== PRECONDITION THE STEEPEST DESCENT DIRECTION =================
 919    ! ======================================================================
 920 
 921    ! If wfoptalg>=10, the precondition matrix is kept constant
 922    ! during iteration; otherwise it is recomputed
 923    if (wfoptalg<10.or.iline==1) then
 924      !print *, 'cwaveq cg_precon ', cwaveq; print *, 'direc cg_precon ', direc
 925      call cg_precon(cwaveq,zero,istwf_k,kinpw1,npw1,nspinor,me_g0,0,pcon,direc,mpi_enreg%comm_fft)
 926    else
 927      do ispinor=1,nspinor
 928        igs=(ispinor-1)*npw1
 929 !$OMP PARALLEL DO PRIVATE(ipw) SHARED(igs,npw,direc,pcon)
 930        do ipw=1+igs,npw1+igs
 931          direc(1:2,ipw)=direc(1:2,ipw)*pcon(ipw-igs)
 932        end do
 933      end do
 934    end if
 935 
 936    !DEBUG Keep this debugging feature !
 937    !call sqnorm_g(dotr,istwf_k,npw1*nspinor,direc,me_g0,comm_fft)
 938    !write(std_out,*)' dfpt_cgwf : after precon, direc**2=',dotr
 939    !ENDDEBUG
 940 
 941    ! ======================================================================
 942    ! ======= PROJECT THE PRECOND. STEEPEST DESCENT DIRECTION ==============
 943    ! ========= OVER THE SUBSPACE ORTHOGONAL TO OTHER BANDS ================
 944    ! ======================================================================
 945 
 946    ! Projecting again out all bands:
 947    ! -For the simple eigenPb, gscq is used as dummy argument
 948    do iband = 1, nband
 949      if (bands_treated_now(iband) == 0) cycle
 950 
 951      if (rank_band(iband) == me_band) work = direc
 952      call xmpi_bcast(work,rank_band(iband),mpi_enreg%comm_band,ierr)
 953 
 954      call projbd(cgq,work,-1,icgq,igscq,istwf_k,mcgq,mgscq,nband_me,npw1,nspinor,&
 955        gscq,scprod,0,tim_projbd,useoverlap,me_g0,comm_fft)
 956 
 957      call xmpi_sum_master(work, rank_band(iband), mpi_enreg%comm_band, ierr)
 958 
 959      ! save this for me_band only
 960      !TODO: make this a blas call? zaxpy
 961      if (rank_band(iband) == me_band) then
 962        direc = work - (mpi_enreg%nproc_band-1)*direc
 963      end if
 964    end do
 965 
 966 
 967    !DEBUG Keep this debugging feature !
 968    !call sqnorm_g(dotr,istwf_k,npw1*nspinor,direc,me_g0,comm_fft)
 969    !write(std_out,*)' dfpt_cgwf : after projbd, direc**2=',dotr
 970    !ENDDEBUG
 971 
 972    ! ======================================================================
 973    ! ================= COMPUTE THE CONJUGATE-GRADIENT =====================
 974    ! ======================================================================
 975 
 976    ! get dot of direction vector with residual vector
 977    call dotprod_g(dotgg,doti,istwf_k,npw1*nspinor,1,direc,gresid,me_g0,mpi_enreg%comm_spinorfft)
 978 
 979    if (iline==1) then
 980      ! At first iteration, gamma is set to zero
 981      gamma=zero
 982      dotgp=dotgg
 983      call cg_zcopy(npw1*nspinor,direc,conjgr)
 984    else
 985      ! At next iterations, h = g + gamma * h
 986      gamma=dotgg/dotgp
 987      dotgp=dotgg
 988      if (prtvol==-level.or.prtvol==-19)then
 989        write(msg,'(a,2es16.6)') 'dfpt_cgwf: dotgg,gamma = ',dotgg,gamma
 990        call wrtout(std_out,msg)
 991      end if
 992 !$OMP PARALLEL DO
 993      do ipw=1,npw1*nspinor
 994        conjgr(1:2,ipw)=direc(1:2,ipw)+gamma*conjgr(1:2,ipw)
 995      end do
 996      if (prtvol==-level.or.prtvol==-19) call wrtout(std_out,'dfpt_cgwf: conjugate direction has been found')
 997    end if
 998 
 999    ! ======================================================================
1000    ! ===== COMPUTE CONTRIBUTIONS TO 1ST AND 2ND DERIVATIVES OF ENERGY =====
1001    ! ======================================================================
1002    ! ...along the search direction
1003 
1004    ! Compute dedt, Eq.(29) of of PRB55, 10337 (1997) [[cite:Gonze1997]],
1005    ! with an additional factor of 2 for the difference between E(2) and the 2DTE
1006    dedt = zero
1007    call dotprod_g(dedt,doti,istwf_k,npw1*nspinor,1,conjgr,gresid,me_g0,mpi_enreg%comm_spinorfft)
1008    dedt=-two*two*dedt
1009 
1010    if((prtvol==-level.or.prtvol==-19.or.prtvol==-20).and.dedt-tol14>0) then
1011      call wrtout(std_out,' DFPT_CG_WARNING : dedt>0')
1012    end if
1013    ABI_MALLOC(gvnlx_direc,(2,npw1*nspinor))
1014    ABI_MALLOC(gh_direc,(2,npw1*nspinor))
1015    if (gen_eigenpb)  then
1016      ABI_MALLOC(sconjgr,(2,npw1*nspinor))
1017    else
1018      ABI_MALLOC(sconjgr,(0,0))
1019    end if
1020    sij_opt=0;if (gen_eigenpb) sij_opt=1
1021    cpopt=-1+usepaw
1022    call getghc(cpopt,conjgr,conjgrprj,gh_direc,sconjgr,gs_hamkq,gvnlx_direc,&
1023      eshift,mpi_enreg,1,prtvol,sij_opt,tim_getghc,0,select_k=KPRIME_H_KPRIME)
1024 
1025    ! ghc also includes the eigenvalue shift
1026    if (gen_eigenpb) then
1027 !$OMP PARALLEL DO
1028      do ipw=1,npw1*nspinor
1029        gh_direc(1:2,ipw)=gh_direc(1:2,ipw)-eshift*sconjgr(1:2,ipw)
1030      end do
1031    else
1032 !$OMP PARALLEL DO
1033      do ipw=1,npw1*nspinor
1034        gh_direc(1:2,ipw)=gh_direc(1:2,ipw)-eshift*conjgr(1:2,ipw)
1035      end do
1036    end if
1037 
1038    ! compute d2edt2, Eq.(30) of of PRB55, 10337 (1997) [[cite:Gonze1997]],
1039    ! with an additional factor of 2 for the difference
1040    ! between E(2) and the 2DTE, and neglect of local fields (SC terms)
1041    d2edt2 = zero
1042    call dotprod_g(d2edt2,doti,istwf_k,npw1*nspinor,1,conjgr,gh_direc,me_g0,mpi_enreg%comm_spinorfft)
1043    d2edt2=two*two*d2edt2
1044 
1045    if(prtvol==-level.or.prtvol==-19)then
1046      write(msg,'(a,2es14.6)') 'dfpt_cgwf: dedt,d2edt2=',dedt,d2edt2
1047      call wrtout(std_out,msg)
1048    end if
1049 
1050    ! ======================================================================
1051    ! ======= COMPUTE MIXING FACTOR - CHECK FOR CONVERGENCE ===============
1052    ! ======================================================================
1053 
1054 
1055    ! see Eq.(31) of PRB55, 10337 (1997) [[cite:Gonze1997]]
1056    !
1057    if(d2edt2<-tol_restart .and. skipme == 0)then
1058      ! This may happen when the eigenvalue eig_mk(0) is higher than
1059      ! the lowest non-treated eig_mk+q(0). The solution adopted here
1060      ! is very crude, and rely upon the fact that occupancies of such
1061      ! levels should be smaller and smaller with increasing nband, so that
1062      ! a convergence study will give the right result.
1063      theta=zero
1064 
1065      cwavef=zero
1066      ghc   =zero
1067      gvnlxc =zero
1068      if (gen_eigenpb) gsc=zero
1069      if (usepaw==1) then
1070        call pawcprj_set_zero(cwaveprj)
1071      end if
1072      ! A negative residual will be the signal of this problem ...
1073      resid=-two
1074      if (prtvol > 0 .and. u1_band_ > 0) then
1075        call wrtout(std_out,' dfpt_cgwf: problem of minimisation (likely metallic), set resid to -2')
1076      end if
1077    else if (d2edt2 > 1.d-40) then
1078      ! Here, the value of theta that gives the minimum
1079      theta=-dedt/d2edt2
1080      !write(std_out,*)' dfpt_cgwf: dedt,d2edt2=',dedt,d2edt2
1081    else
1082      if (u1_band_ > 0) call wrtout(std_out, "DFPT_CGWF WARNING: d2edt2 is zero, skipping update")
1083      theta=zero
1084    end if
1085 
1086    ! Check that result is above machine precision
1087    if (one+theta==one) then
1088      if (prtvol > 0 .and. u1_band_ > 0) then
1089        write(msg, '(a,es16.4)' )' dfpt_cgwf: converged with theta= ',theta
1090        call wrtout(std_out,msg)
1091      end if
1092      nskip=nskip+2*(nline-iline) ! Number of one-way 3D ffts skipped
1093      skipme = 1
1094    end if
1095 
1096    ! ======================================================================
1097    ! ================ GENERATE NEW |wf>, H|wf>, Vnl|Wf ... ================
1098    ! ======================================================================
1099 
1100    if (skipme == 0) then
1101      call cg_zaxpy(npw1*nspinor, [theta, zero], conjgr,cwavef)
1102      ! Filter the wavefunctions for large modified kinetic energy (see routine mkkin.f)
1103      do ispinor=1,nspinor
1104        ipws=(ispinor-1)*npw1
1105 !$OMP PARALLEL DO PRIVATE(ipw) SHARED(cwavef,kinpw1,ipws,npw1)
1106        do ipw=1+ipws,npw1+ipws
1107          if(kinpw1(ipw-ipws)>huge(zero)*1.d-11)then
1108            cwavef(1:2,ipw)=zero
1109          end if
1110        end do
1111      end do
1112 
1113      call cg_zaxpy(npw1*nspinor, [theta, zero], gh_direc,ghc)
1114      call cg_zaxpy(npw1*nspinor, [theta, zero], gvnlx_direc,gvnlxc)
1115 
1116      if (gen_eigenpb) then
1117        call cg_zaxpy(npw1*nspinor, [theta, zero], sconjgr, gsc)
1118      end if
1119      if (usepaw==1) then
1120        call pawcprj_axpby(theta,one,conjgrprj,cwaveprj)
1121      end if
1122    end if
1123 
1124    ABI_FREE(gh_direc)
1125    ABI_FREE(gvnlx_direc)
1126    ABI_FREE(sconjgr)
1127 
1128    ! ======================================================================
1129    ! =========== CHECK CONVERGENCE AGAINST TRIAL ENERGY ===================
1130    ! ======================================================================
1131 
1132    if(usetolrde/=0) then
1133      ! Check reduction in trial energy deltae, Eq.(28) of PRB55, 10337 (1997) [[cite:Gonze1997]]
1134      deltae=half*d2edt2*theta**2+theta*dedt
1135 
1136      if (iline==1) then
1137        deold=deltae
1138        ! The extra factor of two should be removed !
1139      else if (abs(deltae)<tolrde*two*abs(deold) .and. iline/=nline) then
1140        if(prtvol>=10.or.prtvol==-level.or.prtvol==-19)then
1141          write(msg, '(a,i4,1x,a,1p,e12.4,a,e12.4,a)' ) &
1142           ' dfpt_cgwf: line',iline,' deltae=',deltae,' < tolrde*',deold,' =>skip lines'
1143          call wrtout(std_out,msg)
1144        end if
1145        nskip=nskip+2*(nline-iline) ! Number of one-way 3D ffts skipped
1146        skipme = 1
1147      end if
1148    end if
1149 
1150    ! if all bands are skippable, we can exit the iline loop for good.
1151    ! otherwise, all procs are needed for the projbd and other operations,
1152    ! even if the present band will not be updated
1153    bands_skipped_now = 0
1154    bands_skipped_now(u1_band) = skipme
1155    if (u1_band_ < 0) bands_skipped_now(u1_band) = 0  ! Handle negative index
1156    call xmpi_sum(bands_skipped_now,mpi_enreg%comm_band,ierr)
1157 
1158    ! bands_skipped_now = bands_skipped_now - bands_treated_now
1159    if (sum(abs(bands_skipped_now - bands_treated_now)) == 0) then
1160      exit
1161    end if
1162 
1163    nlines_done = nlines_done + 1
1164  end do ! iline
1165 
1166 !--------------------------------------------------------------------------
1167 !             DEBUG
1168 !--------------------------------------------------------------------------
1169  ! Check that final cwavef (Psi^(1)) satisfies the orthogonality condition
1170  if (prtvol==-level.or.prtvol==-19) then
1171    sij_opt=0 ; usevnl=1 ; optlocal=1 ; optnl=2 ; if (gen_eigenpb)  sij_opt=1
1172    ABI_MALLOC(work1,(2,npw1*nspinor*((sij_opt+1)/2)))
1173    ABI_MALLOC(work2,(2,npw1*nspinor*sij_opt))
1174    iband_me = 0
1175    do iband=1,nband
1176      if (bands_treated_now(iband) == 0) cycle
1177      if (rank_band(iband) == me_band) then
1178        iband_me = iband_me+1
1179        if (gen_eigenpb) then
1180          work(:,:)=gscq(:,1+npw1*nspinor*(iband-1)+igscq:npw1*nspinor*iband+igscq)
1181        else
1182          work(:,:)=cgq(:,1+npw1*nspinor*(iband-1)+icgq:npw1*nspinor*iband+icgq)
1183        end if
1184      end if
1185      call xmpi_bcast(work,rank_band(iband),mpi_enreg%comm_band,ierr)
1186 
1187      ! Compute: <Psi^(0)_i,k+q|Psi^(1)_j,k,q>
1188      call dotprod_g(prod1,prod2,istwf_k,npw1*nspinor,2,work,cwavef,me_g0,mpi_enreg%comm_spinorfft)
1189 
1190      if (ipert/=natom+10.and.ipert/=natom+11) then
1191        if (gen_eigenpb) then
1192          call getgh1c(berryopt,cwave0,cwaveprj0,work1,gberry,work2,gs_hamkq,gvnlx1_saved,idir,ipert,eshift,&
1193            mpi_enreg,optlocal,optnl,opt_gvnlx1,rf_hamkq,sij_opt,tim_getgh1c,usevnl)
1194 
1195          if (rank_band(iband) == me_band) then
1196            work(:,:)=cgq(:,1+npw1*nspinor*(iband_me-1)+icgq:npw1*nspinor*iband_me+icgq)
1197          end if
1198          call xmpi_bcast(work,rank_band(iband),mpi_enreg%comm_band,ierr)
1199          call dotprod_g(dotr,doti,istwf_k,npw1*nspinor,2,work,work2,me_g0,mpi_enreg%comm_spinorfft)
1200        else
1201          dotr=zero; doti=zero
1202        end if
1203        dotr=prod1+half*dotr
1204        doti=prod2+half*doti
1205      else if(prtvol==-19) then ! 2nd order case
1206        dotr=prod1+half*rf2%amn(1,iband+(u1_band-1)*nband)
1207        doti=prod2+half*rf2%amn(2,iband+(u1_band-1)*nband)
1208      else
1209        write(msg,'(a)') 'CGWF3_WARNING : Use prtvol=-19 to test orthogonality for ipert=natom+10 or +11'
1210        call wrtout(std_out,msg,'COLL')
1211      end if
1212      dotr=sqrt(dotr**2+doti**2)
1213      if(dotr>tol10) then
1214        !if (gen_eigenpb) then
1215        !  write(msg,'(2a,i3,a,2es22.15)') 'CGWF3_WARNING : <Psi^(1)_i,k,q|S^(0)|Psi^(0)_j,k+q>',&
1216        !    '+ 1/2<Psi^(0)_i,k|S^(1)|Psi^(0)_j,k+q>, for j= ',iband,' is ',dotr,doti
1217        !else
1218        write(msg,'(a,i3,a,es22.15)') 'CGWF3_WARNING : |<Psi^(0)_i,k+q|Psi^(1)_j,k,q>+amn(i,j)/2|, for j= ',iband,' is ',dotr
1219        !end if
1220        call wrtout(std_out,msg)
1221      end if
1222    end do
1223    ABI_FREE(work1)
1224    ABI_FREE(work2)
1225    if (ipert/=natom+10.and.ipert/=natom+11) then
1226      ABI_FREE(gvnlx1_saved)
1227    end if
1228  end if ! prtvol==-level.or.prtvol==-19
1229 
1230  if (prtvol==-level.or.prtvol==-19)then
1231    !  Check that final cwavef Psi^(1) is Pc.Psi^(1)+delta_Psi^(1)
1232    ABI_MALLOC(cwwork,(2,npw1*nspinor))
1233    ! -Apply Pc to Psi^(1)
1234    do iband = 1, nband
1235      if (bands_treated_now(iband) == 0) cycle
1236      if (rank_band(iband) == me_band) cwwork=cwavef
1237 
1238      call xmpi_bcast(cwwork,rank_band(iband),mpi_enreg%comm_band,ierr)
1239 
1240      if(gen_eigenpb)then
1241        call projbd(cgq,cwwork,-1,igscq,icgq,istwf_k,mgscq,mcgq,nband_me,npw1,nspinor,&
1242          cgq,scprod,0,tim_projbd,useoverlap,me_g0,comm_fft)
1243      else
1244        call projbd(cgq,cwwork,-1,icgq,0,istwf_k,mcgq,mgscq,nband_me,npw1,nspinor,&
1245          dummy,scprod,0,tim_projbd,useoverlap,me_g0,comm_fft)
1246      end if
1247 
1248      call xmpi_sum(cwwork,mpi_enreg%comm_band,ierr)
1249 
1250      ! save this for me_band only
1251      !TODO: make this a blas call? zaxpy
1252      if (rank_band(iband) == me_band) then
1253        cwwork = cwwork - (mpi_enreg%nproc_band-1)*cwavef
1254      end if
1255    end do
1256 
1257    ! -Add delta_Psi^(1)
1258    if (usedcwavef>0) cwwork=cwwork+dcwavef
1259    if(ipert==natom+10.or.ipert==natom+11) cwwork=cwwork+rf2%dcwavef(:,1+dc_shift_band:npw1*nspinor+dc_shift_band)
1260    ! -Compare to Psi^(1)
1261    cwwork=cwwork-cwavef
1262    call sqnorm_g(dotr,istwf_k,npw1*nspinor,cwwork,me_g0,comm_fft)
1263    ABI_FREE(cwwork)
1264    if(sqrt(dotr)>tol10) then
1265      !if (gen_eigenpb) then
1266      !  write(msg,'(a,i3,a,es22.15)') &
1267      !  'CGWF3_WARNING : |(Pc.Psi^(1)_i,k,q + delta_Psi^(1)_i,k) - Psi^(1)_i,k,q|^2 (band ',u1_band,')=',dotr
1268      !else
1269      write(msg,'(a,es22.15)') 'CGWF3_WARNING : |(Pc.Psi^(1)_i,k,q + delta_Psi^(1)_i,k) - Psi^(1)_i,k,q| = ',sqrt(dotr)
1270      !end if
1271      call wrtout(std_out,msg)
1272    end if
1273  end if  ! prtvol==-level.or.prtvol==-19
1274 
1275  if(prtvol==-level.or.prtvol==-19.or.prtvol==-20)then
1276    ! Check that final cwavef (Psi^(1)) solves the Sternheimer equation
1277    ABI_MALLOC(cwwork,(2,npw1*nspinor))
1278    ! -Apply Pc to Psi^(1)
1279    do iband = 1, nband
1280      if (bands_treated_now(iband) == 0) cycle
1281 
1282      if (rank_band(iband) == me_band) cwwork=cwavef
1283      call xmpi_bcast(cwwork,rank_band(iband),mpi_enreg%comm_band,ierr)
1284 
1285      if(gen_eigenpb)then
1286        call projbd(cgq,cwwork,-1,igscq,icgq,istwf_k,mgscq,mcgq,nband_me,npw1,nspinor,&
1287          gscq,scprod,0,tim_projbd,useoverlap,me_g0,comm_fft)
1288      else
1289        call projbd(cgq,cwwork,-1,icgq,0,istwf_k,mcgq,mgscq,nband_me,npw1,nspinor,&
1290          dummy,scprod,0,tim_projbd,useoverlap,me_g0,comm_fft)
1291      end if
1292 
1293      call xmpi_sum_master(cwwork, rank_band(iband), mpi_enreg%comm_band, ierr)
1294 
1295      ! save this for me_band only
1296      !TODO: make this a blas call? zaxpy
1297      if (rank_band(iband) == me_band) then
1298       cwwork = cwwork - (mpi_enreg%nproc_band-1)*cwavef
1299      end if
1300    end do
1301 
1302    ! - Apply H^(0)-E.S^(0)
1303    sij_opt=0;if (gen_eigenpb) sij_opt=1
1304    cpopt=-1
1305    ABI_MALLOC(work1,(2,npw1*nspinor*((sij_opt+1)/2)))
1306    ABI_MALLOC(work2,(2,npw1*nspinor))
1307    call getghc(cpopt,cwwork,conjgrprj,work,work1,gs_hamkq,work2,eshift,&
1308      mpi_enreg,1,prtvol,sij_opt,tim_getghc,0,select_k=KPRIME_H_KPRIME)
1309    if (gen_eigenpb) then
1310      cwwork=work-eshift*work1
1311    else
1312      cwwork=work-eshift*cwwork
1313    end if
1314    ABI_FREE(work1)
1315    ABI_FREE(work2)
1316 
1317    ! The following is not mandatory, as Pc has been already applied to Psi^(1)
1318    ! and Pc^* H^(0) Pc = Pc^* H^(0) = H^(0) Pc (same for S^(0)).
1319    ! However, in PAW, to apply Pc^* here seems to reduce the numerical error
1320    ! -Apply Pc^*
1321    do iband = 1, nband
1322      if (bands_treated_now(iband) == 0) cycle
1323      if (rank_band(iband) == me_band) work=cwwork
1324      call xmpi_bcast(work,rank_band(iband),mpi_enreg%comm_band,ierr)
1325 
1326      if(gen_eigenpb)then
1327        call projbd(gscq,  work,-1,igscq,icgq,istwf_k,mgscq,mcgq,nband_me,npw1,nspinor,&
1328          cgq,scprod,0,tim_projbd,useoverlap,me_g0,comm_fft)
1329      else
1330        call projbd(cgq,work,-1,icgq,0,istwf_k,mcgq,mgscq,nband_me,npw1,nspinor,&
1331          dummy,scprod,0,tim_projbd,useoverlap,me_g0,comm_fft)
1332      end if
1333 
1334      call xmpi_sum(work,mpi_enreg%comm_band,ierr)
1335 
1336      ! save this for me_band only
1337      !TODO: make this a blas call? zaxpy
1338      if (rank_band(iband) == me_band) then
1339        cwwork = cwwork - (mpi_enreg%nproc_band-1)*work
1340      end if
1341    end do
1342 
1343    ! - Add Pc^*(H^(1)-E.S^(1)).Psi^(0)
1344    cwwork=cwwork+gh1c
1345    call sqnorm_g(dotr,istwf_k,npw1*nspinor,cwwork,me_g0,comm_fft)
1346    ABI_FREE(cwwork)
1347    write(msg,'(a,i3,a,es22.15,2a,i4)') &
1348      '*** CGWF3 Sternheimer equation test for band ',u1_band,'=',sqrt(dotr),ch10,&
1349      'It should go to zero for large nline : nlines_done = ',nlines_done
1350    call wrtout(std_out,msg)
1351  end if ! prtvol==-level.or.prtvol==-19.or.prtvol==-20
1352 
1353  if(prtvol==-level.or.prtvol==-19.or.prtvol==-20)then
1354    ! Check that < Psi^(0) | ( H^(0)-eps^(0) S^(0) ) | Psi^(1) > is in agreement with eig^(1)
1355    ABI_MALLOC(cwwork,(2,npw1*nspinor))
1356    cwwork=cwavef
1357    ! - Apply H^(0)-E.S^(0)
1358    sij_opt=0;if (gen_eigenpb) sij_opt=1
1359    cpopt=-1
1360    ABI_MALLOC(work1,(2,npw1*nspinor*((sij_opt+1)/2)))
1361    ABI_MALLOC(work2,(2,npw1*nspinor))
1362    call getghc(cpopt,cwwork,conjgrprj,work,work1,gs_hamkq,work2,eshift,&
1363      mpi_enreg,1,prtvol,sij_opt,tim_getghc,0,select_k=KPRIME_H_KPRIME)
1364    if (gen_eigenpb) then
1365      cwwork=work-eshift*work1
1366    else
1367      cwwork=work-eshift*cwwork
1368    end if
1369    ABI_FREE(work1)
1370    ABI_FREE(work2)
1371    cwwork=cwwork+gh1c_n
1372    jband=(u1_band-1)*2*nband
1373    iband_me = 0
1374    do iband=1,nband
1375      if (bands_treated_now(iband) == 0) cycle
1376      if (rank_band(iband) == me_band) then
1377        iband_me = iband_me+1
1378        work(:,:)=cgq(:,1+npw1*nspinor*(iband_me-1)+icgq:npw1*nspinor*iband_me+icgq)
1379      end if
1380      call xmpi_bcast(work,rank_band(iband),mpi_enreg%comm_band,ierr)
1381 
1382      call dotprod_g(dotr,doti,istwf_k,npw1*nspinor,2,work,cwwork,me_g0,mpi_enreg%comm_spinorfft)
1383      dotr = dotr - eig1_k(2*iband-1+jband)
1384      doti = doti - eig1_k(2*iband  +jband)
1385      dotr = sqrt(dotr**2+doti**2)
1386      if (dotr > tol8) then
1387        write(msg,'(2(a,i3),a,es22.15)') &
1388          'CGWF3_WARNING < Psi^(0) | ( H^(0)-eps^(0) S^(0) ) | Psi^(1) > for i=',iband,' j=',u1_band,&
1389        ' : ',sqrt(dotr**2+doti**2)
1390        call wrtout(std_out,msg)
1391      end if
1392    end do
1393    !write(std_out,'(a)') '< Psi^(0) | ( H^(0)-eps^(0) S^(0) ) | Psi^(1) > is done.'
1394    ABI_FREE(cwwork)
1395  end if ! prtvol==-level.or.prtvol==-19.or.prtvol==-20
1396 !--------------------------------------------------------------------------
1397 !            END DEBUG
1398 !--------------------------------------------------------------------------
1399 
1400  if (allocated(gh_direc))  then
1401    ABI_FREE(gh_direc)
1402  end if
1403  if (allocated(gvnlx_direc))  then
1404    ABI_FREE(gvnlx_direc)
1405  end if
1406  ABI_FREE(conjgr)
1407  ABI_FREE(cwaveq)
1408  ABI_FREE(direc)
1409  ABI_FREE(gresid)
1410  if (usepaw==1) then
1411    call pawcprj_free(conjgrprj)
1412  end if
1413  ABI_FREE(conjgrprj)
1414 
1415  if(u1_band>max(1,nband-nbdbuf))then
1416    ! A small negative residual will be associated with these
1417    ! in the present algorithm all bands need to be in the loops over cgq etc... for the parallelization
1418    resid=-0.1_dp
1419  end if
1420 
1421  ! At the end of the treatment of a set of bands, write the number of one-way 3D ffts skipped
1422  if (xmpi_paral==1 .and. u1_band==nband .and. prtvol>=10) then
1423    write(msg,'(a,i0)')' dfpt_cgwf: number of one-way 3D ffts skipped in cgwf3 until now =',nskip
1424    call wrtout(std_out,msg)
1425  end if
1426 
1427  ABI_FREE(work)
1428  ABI_FREE(gh1c)
1429  ABI_FREE(pcon)
1430  ABI_FREE(scprod)
1431  ABI_FREE(gberry)
1432 
1433  call timab(122,2,tsec)
1434 
1435  DBG_EXIT("COLL")
1436 
1437 end subroutine dfpt_cgwf

ABINIT/m_dfpt_cgwf [ Modules ]

[ Top ] [ Modules ]

NAME

  m_dfpt_cgwf

FUNCTION

 Update one single wavefunction (cwavef), non self-consistently.
 Uses a conjugate-gradient algorithm.

COPYRIGHT

  Copyright (C) 1999-2024 ABINIT group (XG,DRH,XW,FJ,MT,LB)
  This file is distributed under the terms of the
  GNU General Public License, see ~abinit/COPYING
  or http://www.gnu.org/copyleft/gpl.txt .

SOURCE

17 #if defined HAVE_CONFIG_H
18 #include "config.h"
19 #endif
20 
21 #include "abi_common.h"
22 
23 module m_dfpt_cgwf
24 
25  use defs_basis
26  use m_abicore
27  use m_errors
28  use m_xmpi
29  use m_cgtools
30  use m_rf2
31 
32  use defs_abitypes, only : MPI_type
33  use m_time,        only : timab
34  use m_pawcprj,     only : pawcprj_type, pawcprj_alloc, pawcprj_free, pawcprj_set_zero, pawcprj_axpby
35  use m_hamiltonian, only : gs_hamiltonian_type, rf_hamiltonian_type, KPRIME_H_KPRIME
36  use m_getghc,      only : getghc
37  use m_getgh1c,     only : getgh1c, getdc1
38 
39  implicit none
40 
41  private