EMsolver

The program primarily solves the Jefimenko equations for different charge densities and current densities.; The objective is to obtain the electromagnetic field parameters for varying charge densities and current densities.

Jefimenko

To understand the background and principles of the application of the program you can refer to the following article JefiGPU: Jefimenko’s Equations on GPU

Advantages of the program

  • The Jefimenko equations are efficiently processed using CUDA for fast parallel computation.

Structure of the program

  • Class EBsolver()
    • init()
    • update_rho_J()
    • Jefimenko_solver()
  • Class EMsolver()
    • init()
    • terminate()
    • Jefimenko_solver()
    • acquire_EB_field()
  • areal_position()
  • generate_rho_J()
  • real_position()
  • copy_new_rho_J_to_GPU()
  • retarded_time_index()
  • r_distance() The overall logic of the program can be described as follows:

flowchart

EMsolver()

Main class for solving E and B from Jefimenko’s equations. The calculated rho and J are always stored as the self.data.

The Params:

​ len_time_snapshots: int, the maximum time sequence that can be stored in GPU memory

​ x_grid_size_o, y_grid_size_o, z_grid_size_o: number of grid sizes in the obervation region

​ x_grid_size_s, y_grid_size_s, z_grid_size_s: number of grid sizes in the source region

​ dx_o, dy_o, dz_o: infinitesimal difference of the spatial coordinates in the observation region

​ dx_s, dy_s, dz_s: infinitesimal difference of the spatial coordinates in the source region

​ x_left_boundary_o, y_left_boundary_o, z_left_boundary_o: the left boundary of the observation region

​ x_left_boundary_s, y_left_boundary_s, z_left_boundary_s: the left boundary of the source region

​ dt: time step, epsilon0: the numerical value of epsilon0 used in FU (Flexible Unit),

​ c: the numerical value of c used in FU (Flexible Unit)

class EMsolver():
    def __init__(self, \
                 len_time_snapshots, \
                 x_grid_size_o, y_grid_size_o, z_grid_size_o, \
                 x_grid_size_s, y_grid_size_s, z_grid_size_s, \
                 dx_o, dy_o, dz_o, x_left_boundary_o, y_left_boundary_o, z_left_boundary_o, \
                 dx_s, dy_s, dz_s, x_left_boundary_s, y_left_boundary_s, z_left_boundary_s, \
                 dt, epsilon0, c):
        
        # save the variables in the class
        self.len_time_snapshots = len_time_snapshots
        self.dx_o, self.dy_o, self.dz_o = dx_o, dy_o, dz_o
        self.dx_s, self.dy_s, self.dz_s = dx_s, dy_s, dz_s
        self.x_left_boundary_o, self.y_left_boundary_o, self.z_left_boundary_o = x_left_boundary_o, y_left_boundary_o, z_left_boundary_o
        self.x_left_boundary_s, self.y_left_boundary_s, self.z_left_boundary_s = x_left_boundary_s, y_left_boundary_s, z_left_boundary_s
        self.x_grid_size_o, self.y_grid_size_o, self.z_grid_size_o = x_grid_size_o, y_grid_size_o, z_grid_size_o
        self.x_grid_size_s, self.y_grid_size_s, self.z_grid_size_s = x_grid_size_s, y_grid_size_s, z_grid_size_s
        self.dt = dt
        self.epsilon0 = epsilon0
        self.c = c
        
        # total grid numbers and total time ticks in GPU
        self.total_grid_o = x_grid_size_o*y_grid_size_o*z_grid_size_o     
        self.total_grid_s = x_grid_size_s*y_grid_size_s*z_grid_size_s        
        self.all_grids = self.total_grid_o*self.total_grid_s
   
        # Configure the blocks
        self.threadsperblock = 32
        # configure the grids
        self.blockspergrid_o = (self.total_grid_o + (self.threadsperblock - 1)) // self.threadsperblock
        self.blockspergrid_s = (self.total_grid_s + (self.threadsperblock - 1)) // self.threadsperblock
        self.blockspergrid_all = (self.all_grids + (self.threadsperblock - 1)) // self.threadsperblock
        
        # define device array of rho and J with zero initial values of the source region
        self.rho_GPU, self.Jx_GPU, self.Jy_GPU, self.Jz_GPU = \
        (cupy.zeros([len_time_snapshots, self.total_grid_s], dtype=np.float64) for _ in range(4))
        
        # coefficient for E and B when doing integration in the source region
        self.coeff_E = 1/(4*math.pi*epsilon0)*dx_s*dy_s*dz_s
        self.coeff_B = -1/(4*math.pi*epsilon0*c**2)*dx_s*dy_s*dz_s
        
        # store the sequences of the time steps chosen
        self.dt_series = cupy.zeros([len_time_snapshots])
        self.dt_list = []
        
        # time tick
        self.time_tick = 0

        # accumulated time steps from the current dt till the previous
        self.accumu_dt_series = cupy.zeros([len_time_snapshots])
        self.accumu_dt_index = cupy.zeros([len_time_snapshots], dtype=np.int64)

update_rho_J()

The function primarily calculates the time index and saves the current charge density during the time iteration process. It consists of two sub-functions.

def update_rho_J(self, rho, Jx, Jy, Jz):
        # i_time: int, index at which to copy the data
        # i_round: i_step = i_rounds*len_time_snapshots + i_time
        self.i_round, self.i_time = time_index_in_GPU(self.time_tick, self.len_time_snapshots)
        self.t = self.time_tick*self.dt
        self.time_tick += 1
        
        # send these data to GPU
        rho, Jx, Jy, Jz = (cupy.asarray(i) for i in [rho, Jx, Jy, Jz])

        # copy newly calculated rho and J to GPU
        copy_new_rho_J_to_GPU[self.blockspergrid_s, self.threadsperblock](self.i_time, rho, Jx, Jy, Jz, \
                                                                          self.rho_GPU, self.Jx_GPU, self.Jy_GPU, self.Jz_GPU, \
                                                                          self.total_grid_s)
  • time_index_in_GPU()——The time index calculation is performed to prevent the iteration process from exceeding the maximum time step;
    def time_index_in_GPU(i_time, len_time_snapshots):
    return (i_time//len_time_snapshots, i_time%len_time_snapshots)
    
  • copy_new_rho_J_to_GPU()—— All current densities and charge densities are recorded according to the time index;
    @cuda.jit
    def copy_new_rho_J_to_GPU(i_time, rho, Jx, Jy, Jz, \
                          rho_GPU, Jx_GPU, Jy_GPU, Jz_GPU, \
                          total_grid_s):
    i = cuda.grid(1)
    # loop through each spatial grid
    if i < total_grid_s:
        rho_GPU[i_time, i] = rho[i]
        Jx_GPU[i_time, i] = Jx[i]
        Jy_GPU[i_time, i] = Jy[i]
        Jz_GPU[i_time, i] = Jz[i]
    

signal_indicator()

The function primarily determines whether there is an overlap between the user-defined source region and the target region positions and provides indices for different scenarios.

The Params:

​ o indicates observational and s indicates source

​ dx_o, dy_o, dz_o, x_left_boundary_o, y_left_boundary_o, z_left_boundary_o

​ dx_s, dy_s, dz_s, x_left_boundary_s, y_left_boundary_s, z_left_boundary_s

​ x_grid_size_o, y_grid_size_o, z_grid_size_o

​ x_grid_size_s, y_grid_size_s, z_grid_size_s

def signal_indicator(dx_o, dy_o, dz_o, x_left_boundary_o, y_left_boundary_o, z_left_boundary_o, \
                     dx_s, dy_s, dz_s, x_left_boundary_s, y_left_boundary_s, z_left_boundary_s, \
                     x_grid_size_o, y_grid_size_o, z_grid_size_o, \
                     x_grid_size_s, y_grid_size_s, z_grid_size_s):
    
    # find the left and right boundaries
    x_right_boundary_o, y_right_boundary_o, z_right_boundary_o = x_left_boundary_o + dx_o*x_grid_size_o, \
                                                                 y_left_boundary_o + dy_o*y_grid_size_o, \
                                                                 z_left_boundary_o + dz_o*z_grid_size_o

    x_right_boundary_s, y_right_boundary_s, z_right_boundary_s = x_left_boundary_s + dx_s*x_grid_size_s, \
                                                                 y_left_boundary_s + dy_o*y_grid_size_s, \
                                                                 z_left_boundary_s + dz_o*z_grid_size_s
    
    xro,yro,zro,xlo,ylo,zlo,xrs,yrs,zrs,xls,yls,zls = x_right_boundary_o, y_right_boundary_o, z_right_boundary_o,\
                                                      x_left_boundary_o, y_left_boundary_o, z_left_boundary_o, \
                                                      x_right_boundary_s, y_right_boundary_s, z_right_boundary_s, \
                                                      x_left_boundary_s, y_left_boundary_s, z_left_boundary_s
    
    # there are four cases for the relavent spatial positions of the two regions
    # we use the boundaries points in observational region to subtract the boundaries in source region
    # first we compare the points in observational region with source region
    ob = []
    for i in (xlo,xro):
        ii=step_func(i,xls,xrs)
        for j in (ylo,yro):
            jj=step_func(j,yls,yrs)
            for k in (zlo,zro):
                kk=step_func(k,zls,zrs)
                ob.append([ii,jj,kk])
    
    # count how many 8 or -8 and where are they
    sob = np.sum(ob,0)
    count, index, sign = conut_eights_or_minus_eights(sob)
    
    # the two regions overlap
    if count == 0:
        return 0.
    # one direction exceeds
    elif count == 1:
         return sign_distance(sign[0], xlo,ylo,zlo, xrs,yrs,zrs, xls,yls,zls, xro,yro,zro, index[0])
    # two directions exceed
    elif count == 2:
        r1 = sign_distance(sign[0], xlo,ylo,zlo, xrs,yrs,zrs, xls,yls,zls, xro,yro,zro, index[0])
        r2 = sign_distance(sign[1], xlo,ylo,zlo, xrs,yrs,zrs, xls,yls,zls, xro,yro,zro, index[1])
        return math.sqrt(r1**2+r2**2)
    elif count ==3:
        r1 = sign_distance(sign[0], xlo,ylo,zlo, xrs,yrs,zrs, xls,yls,zls, xro,yro,zro, index[0])
        r2 = sign_distance(sign[1], xlo,ylo,zlo, xrs,yrs,zrs, xls,yls,zls, xro,yro,zro, index[1])
        r3 = sign_distance(sign[2], xlo,ylo,zlo, xrs,yrs,zrs, xls,yls,zls, xro,yro,zro, index[2])
        return math.sqrt(r1**2+r2**2+r3**2)   
  • step_func()

The “sob” divides the configuration of the observation region (x_r, x_l; y_r, y_l; z_r, z_l) into eight groups and compares them with the source region. If the left boundary of the observation region is less than the source region, it assigns -1. If it is between them, it assigns 0. If it is greater, it assigns 1. It then sums up all the results.

ob = []
for i in (xlo,xro):
    ii=step_func(i,xls,xrs)
    for j in (ylo,yro):
        jj=step_func(j,yls,yrs)
        for k in (zlo,zro):
            kk=step_func(k,zls,zrs)
            ob.append([ii,jj,kk])

def step_func(x, l, r):
    '''compare x with l and r. return -1 or 0 or 1.'''
    if x<l:
        return -1
    elif x>r:
        return 1
    else:
        return 0
  • conut_eights_or_minus_eights()
    • Perform an evaluation on the function ‘sob’.
      • The variable “count” indicates whether there is an overlap between the source area and the observation area.:

        count = 1 indicates that there is no overlap in one direction;
        count = 2 indicates that there is no overlap in two directions;
        count = 3 indicates that there is no overlap between the source area and the observation area.
        
      • The variable “indx” indicates which axis the source area and the observation area overlap:

        indx = 0 indicates that there is no overlap in the x-axis;
        indx = 1 indicates that there is no overlap in the y-axis;
        indx = 2 indicates that there is no overlap in the z-axis.
        
      • The variable “sign” indicates the direction of overlap between the source area and the observation area:

        sign = -1 indicates that the observation area is to the left of the source area;
        sign = 1 indicates that the observation area is to the right of the source area.
        
def conut_eights_or_minus_eights(x):
    count = 0
    index = []
    sign = []
    for i in range(3):
        if abs(x[i]) == 8:
            count+=1 
            index.append(i)
            sign.append(x[i]/8)
    return count,index,sign

Jefimenko_solver()

The function primarily calculates the electric field and magnetic field using the Jefimenko’s equations. At the same time, the program has incorporated the functionality of variable dt during the updating process, which means that it can automatically adjust the appropriate dt based on the simulation process during runtime. This feature provides the program with a wider range of application environments.

The Params:

​ rho, Jx, Jy, Jz: physical quantities of shape [total_grid_s]

​ current_dt: the time step used in updation

​ quasi_neutral: if quasi_neutral == 1, the system only considers the contribution of electric current

​ back_to_previous: ​ whether to use the previous distribution for updation

​ time_stride_back: ​ If distribution f(i) at time i is ilegal change f(i) to f(i-time_stride_back)

 def Jefimenko_solver(self, rho, Jx, Jy, Jz, current_dt, quasi_neutral, back_to_previous, time_stride_back):
        self.dt_list.append(current_dt)
        if back_to_previous == 0:
            
            # i_time: int, index at which to copy the data
            self.i_time = self.time_tick%self.len_time_snapshots # time_index_in_GPU(self.time_tick, self.len_time_snapshots)

            # copy newly calculated rho and J to GPU
            self.rho_GPU[self.i_time], self.Jx_GPU[self.i_time], self.Jy_GPU[self.i_time], self.Jz_GPU[self.i_time] = rho, Jx, Jy, Jz

            self.dt_series[self.i_time] = current_dt
            self.t = sum(self.dt_list)
            
        else:
            # i_time: int, index at which to copy the data
            self.i_time = (self.time_tick- time_stride_back)%self.len_time_snapshots  # time_index_in_GPU(self.time_tick, self.len_time_snapshots)
            self.time_tick = self.time_tick - time_stride_back
            # copy newly calculated rho and J to GPU
            try:
                self.rho_GPU[self.i_time+1 :self.i_time + time_stride_back], self.Jx_GPU[self.i_time+1 :self.i_time + time_stride_back], \
                self.Jy_GPU[self.i_time+1 :self.i_time + time_stride_back], self.Jz_GPU[self.i_time+1 :self.i_time + time_stride_back] = 0, 0, 0, 0
            except: # if self.i_time + time_stride_back > time_snap_shots '''Jun-Jie Zhang 2023/07/20'''
                self.rho_GPU[self.i_time+1 :], self.Jx_GPU[self.i_time+1 :], \
                self.Jy_GPU[self.i_time+1 :], self.Jz_GPU[self.i_time+1 :] = 0, 0, 0, 0
                self.rho_GPU[:time_stride_back-(self.len_time_snapshots-(self.i_time+1))], self.Jx_GPU[:time_stride_back-(self.len_time_snapshots-(self.i_time+1))], \
                self.Jy_GPU[:time_stride_back-(self.len_time_snapshots-(self.i_time+1))], self.Jz_GPU[:time_stride_back-(self.len_time_snapshots-(self.i_time+1))] = 0, 0, 0, 0

            self.dt_series[self.i_time:] = 0
            self.dt_series[self.i_time] = current_dt
            self.dt_list[-time_stride_back:] = [0] * time_stride_back 
            self.t = sum(self.dt_list)

        # save the sequences of dt from the current time step till the previous len_time_snapshots-th and the index
        accumu_i_time = self.i_time
        accumu_time_steps = current_dt
        for i_reorder_time in range(self.len_time_snapshots):
            self.accumu_dt_series[i_reorder_time] = accumu_time_steps
            self.accumu_dt_index[i_reorder_time] = accumu_i_time

            accumu_i_time -= 1
            if accumu_i_time<0:
                accumu_i_time += self.len_time_snapshots
            accumu_time_steps += self.dt_series[accumu_i_time]

        # define device array of E and B with zero initial values
        GEx, GEy, GEz, GBx, GBy, GBz = (cupy.zeros(self.total_grid_o, dtype=np.float64) for _ in range(6))        
        if quasi_neutral == 1:
            # calculate E and B
            Jefimenko_kernel[self.blockspergrid_o, self.threadsperblock](self.rho_GPU, self.Jx_GPU, self.Jy_GPU, self.Jz_GPU, \
                                                                         self.len_time_snapshots, current_dt, self.t, \
                                                                         self.accumu_dt_index, self.accumu_dt_series, \
                                                                         self.total_grid_o, self.total_grid_s, \
                                                                         self.x_grid_size_o, self.y_grid_size_o, self.z_grid_size_o, \
                                                                         self.x_grid_size_s, self.y_grid_size_s, self.z_grid_size_s, \
                                                                         self.dx_o, self.dy_o, self.dz_o, \
                                                                         self.dx_s, self.dy_s, self.dz_s, \
                                                                         self.x_left_boundary_o, self.y_left_boundary_o, self.z_left_boundary_o, \
                                                                         self.x_left_boundary_s, self.y_left_boundary_s, self.z_left_boundary_s, \
                                                                         GEx, GEy, GEz, GBx, GBy, GBz, self.coeff_E, self.coeff_B, self.c)


        self.GEx, self.GEy, self.GEz, self.GBx, self.GBy, self.GBz = GEx*self.coeff_E, GEy*self.coeff_E, GEz*self.coeff_E, GBx*self.coeff_B, GBy*self.coeff_B, GBz*self.coeff_B

        # very small values are forced to be zero
        self.GEx, self.GEy, self.GEz, self.GBx, self.GBy, self.GBz = \
        cupy.where(cupy.abs(self.GEx)<10**(-18),0.,self.GEx),\
        cupy.where(cupy.abs(self.GEy)<10**(-18),0.,self.GEy),\
        cupy.where(cupy.abs(self.GEz)<10**(-18),0.,self.GEz),\
        cupy.where(cupy.abs(self.GBx)<10**(-18),0.,self.GBx),\
        cupy.where(cupy.abs(self.GBy)<10**(-18),0.,self.GBy),\
        cupy.where(cupy.abs(self.GBz)<10**(-18),0.,self.GBz)
        
        # add one executed time step on time_tick
        self.time_tick += 1        
        # save the passed time in accumulate time_seg
  • Jefimenko_kernel() - performs parallel computation;

​ Firstly, it associates the block index in parallel computation with the corresponding position;

i_o = cuda.grid(1)

# loop through each spatial grid in the observation region
if i_o < total_grid_o:
    # convert 1D grid index into 3D
    iz_o = i_o%z_grid_size_o
    iz_rest_o = i_o//z_grid_size_o
    iy_o = iz_rest_o%y_grid_size_o
    iy_rest_o = iz_rest_o//y_grid_size_o
    ix_o = iy_rest_o%x_grid_size_o
    
    # express the position corresponding to the index
    x_o, y_o, z_o = real_position(ix_o, dx_o, x_left_boundary_o), \
                    real_position(iy_o, dy_o, y_left_boundary_o), \
                    real_position(iz_o, dz_o, z_left_boundary_o)
  
    # loop through all spatial grid in the source region for integration
    for i_s in range(total_grid_s):
        # 1D to 3D
        iz_s = i_s%z_grid_size_s
        iz_rest_s = i_s//z_grid_size_s
        iy_s = iz_rest_s%y_grid_size_s
        iy_rest_s = iz_rest_s//y_grid_size_s
        ix_s = iy_rest_s%x_grid_size_s
        # positions
        x_s, y_s, z_s = real_position(ix_s, dx_s, x_left_boundary_s), \
                        real_position(iy_s, dy_s, y_left_boundary_s), \
                        real_position(iz_s, dz_s, z_left_boundary_s)

        # dr = r - r' = r_o - r_s
        dx = x_o - x_s
        dy = y_o - y_s
        dz = z_o - z_s
        dr = r_distance(x_o, y_o, z_o, x_s, y_s, z_s)

Next, it calculates the time delay to prevent the time iteration process from exceeding the maximum time step;

tr

if dr > 10**-10:                            
    # find the retarded time for positions r_o
    # rtime_i gives the index of time in time_snapshots, it can be negtive
    rtime_i = retarded_time_index(t, dr, dt, i_round, len_time_snapshots)

    # rtime_i_minus: index for retarded_time - dt
    if rtime_i < 0:
        # no updation since signal has not arrived yet
        pass
    else:
        if rtime_i == 0:
            rtime_i_minus = len_time_snapshots-1
        elif rtime_i > 0:
            rtime_i_minus = rtime_i - 1

@cuda.jit(device=True)
def retarded_time_index(t, dr, dt, i_round, len_time_snapshots):
    tr = t - dr
    
    # if the signal has not been transmitted
    if tr < 0.:
        return -1
    
    # the current time tr corresponds to the ti-th row
    ti = int(math.floor(tr/dt)%len_time_snapshots)

    # if ti corresponds to the last row
    if ti == len_time_snapshots - 1:
        return 0
    else:
        return ti + 1

Calculate the time derivatives of current density and charge density:

pandj

rho_s, rho_minus_s = rho_GPU_s[rtime_i, i_s], rho_GPU_s[rtime_i_minus, i_s]
Jx_s, Jx_minus_s = Jx_GPU_s[rtime_i, i_s], Jx_GPU_s[rtime_i_minus, i_s]
Jy_s, Jy_minus_s = Jy_GPU_s[rtime_i, i_s], Jy_GPU_s[rtime_i_minus, i_s]
Jz_s, Jz_minus_s = Jz_GPU_s[rtime_i, i_s], Jz_GPU_s[rtime_i_minus, i_s]
prho_pt_s = (rho_s - rho_minus_s)/dt
pJx_pt_s = (Jx_s - Jx_minus_s)/dt
pJy_pt_s = (Jy_s - Jy_minus_s)/dt
pJz_pt_s = (Jz_s - Jz_minus_s)/dt

The current and charge densities are divided into three dimensions and summed using CUDA atomic operations;

cuda_atomic

cuda.atomic.add(GEx, i_o, (x_o-x_s)/dr**3*rho_s+(x_o-x_s)/dr**2*prho_pt_s-1/dr*pJx_pt_s)
cuda.atomic.add(GEy, i_o, (y_o-y_s)/dr**3*rho_s+(y_o-y_s)/dr**2*prho_pt_s-1/dr*pJy_pt_s)
cuda.atomic.add(GEz, i_o, (z_o-z_s)/dr**3*rho_s+(z_o-z_s)/dr**2*prho_pt_s-1/dr*pJz_pt_s)
cuda.atomic.add(GBx, i_o, -(((y_o-y_s)*Jz_s-(z_o-z_s)*Jy_s)/dr**3+((y_o-y_s)*pJz_pt_s-(z_o-z_s)*pJy_pt_s)/dr**2))
cuda.atomic.add(GBy, i_o, -(((z_o-z_s)*Jx_s-(x_o-x_s)*Jz_s)/dr**3+((z_o-z_s)*pJx_pt_s-(x_o-x_s)*pJz_pt_s)/dr**2))
cuda.atomic.add(GBz, i_o, -(((x_o-x_s)*Jy_s-(y_o-y_s)*Jx_s)/dr**3+((x_o-x_s)*pJy_pt_s-(y_o-y_s)*pJx_pt_s)/dr**2)) 

results matching ""

    No results matching ""