// If --emit-relocs is given, we'll copy relocation sections from input // files to an output file. if (ctx.arg.emit_relocs) create_reloc_sections(ctx);
for (i64 i = 0, end = ctx.chunks.size(); i < end; i++) if (OutputSection<E> *osec = ctx.chunks[i]->to_osec()) if (RelocSection<E> *x = osec->reloc_sec.get()) ctx.chunks.push_back(x); }
SHT_RELA The section holds relocation entries with explicit addends, such as type Elf32_Rela for the 32-bit class of object files. An object file may have multiple relocation sections. See “Relocation’’ below for details.
另外RelocSection的shdr flag为SHF_INFO_LINK,意义如下
This section headers sh_info field holds a section header table index.
设置sh_info的过程则是在后续compute_section_headers中
compute_section_headers
1 2
// Compute the section header values for all sections. compute_section_headers(ctx);
// Set section indices. i64 shndx = 1; for (i64 i = 0; i < ctx.chunks.size(); i++) if (ctx.chunks[i]->kind() != HEADER) ctx.chunks[i]->shndx = shndx++;
if (ctx.shdr) ctx.shdr->shdr.sh_size = shndx * sizeof(ElfShdr<E>);
// Some types of section header refer other section by index. // Recompute the section header to fill such fields with correct values. for (Chunk<E> *chunk : ctx.chunks) chunk->update_shdr(ctx);
e_shnum This member holds the number of entries in the section header table. Thus the product of e_shent size and e_shnum gives the section header table's size in bytes. If a file has no section header table,e_shnum holds the value of zero.
If the number of entries in the section header table is larger than or equal to SHN_LORESERVE(0xff00), e_shnum holds the value zero and the real number of entries in the section header table is held in the sh_size member of the initial entry in section header table. Otherwise, the sh_size member of the initial entry in the section header table holds the value zero.
e_shstrndx This member holds the section header table index of the entry associated with the section name string table. If the file has no section name string table, this member holds the value SHN_UNDEF.
If the index of section name string table section is larger than or equal to SHN_LORESERVE(0xff00), this member holds SHN_XINDEX(0xffff) and the real index of the section name string table section is held in thesh_link member of the initial entry in section header table. Otherwise, thesh_link member of the initial entry in section header table contains the value zero.
这里创建了一个SymtabShndxSection,也就是”symtab_shndx”段,这个段保留了特殊的symbol table section index arrry,指向与符号表关联的shdr的索引。
.symtab_shndx
This section holds the special symbol table section index array, as described above. The section’s attributes will include the SHF_ALLOC bit if the associated symbol table section does; otherwise that bit will be off.
这个section的sh_type为SHT_SYMTAB_SHNDX
SHT_SYMTAB_SHNDX
This section is associated with a section of type SHT_SYMTAB and is required if any of the section header indexes referenced by that symbol table contain the escape value SHN_XINDEX. The section is an array of Elf32_Word values. Each value corresponds one to one with a symbol table entry and appear in the same order as those entries. The values represent the section header indexes against which the symbol table entries are defined. Only if corresponding symbol table entry’s st_shndx field contains the escape value SHN_XINDEX will the matching Elf32_Word hold the actual section header index; otherwise, the entry must be SHN_UNDEF (0).
// This function assigns virtual addresses to output sections. Assigning // addresses is a bit tricky because we want to pack sections as tightly // as possible while not violating the constraints imposed by the hardware // and the OS kernel. Specifically, we need to satisfy the following // constraints: // // - Memory protection (readable, writable and executable) works at page // granularity. Therefore, if we want to set different memory attributes // to two sections, we need to place them into separate pages. // // - The ELF spec requires that a section's file offset is congruent to // its virtual address modulo the page size. For example, a section at // virtual address 0x401234 on x86-64 (4 KiB, or 0x1000 byte page // system) can be at file offset 0x3234 or 0x50234 but not at 0x1000. // // We need to insert paddings between sections if we can't satisfy the // above constraints without them. // // We don't want to waste too much memory and disk space for paddings. // There are a few tricks we can use to minimize paddings as below: // // - We want to place sections with the same memory attributes // contiguous as possible. // // - We can map the same file region to memory more than once. For // example, we can write code (with R and X bits) and read-only data // (with R bit) adjacent on file and map it twice as the last page of // the executable segment and the first page of the read-only data // segment. This doesn't save memory but saves disk space.
// TLS chunks alignments are special: in addition to having their virtual // addresses aligned, they also have to be aligned when the region of // tls_begin is copied to a new thread's storage area. In other words, their // offset against tls_begin also has to be aligned. // // A good way to achieve this is to take the largest alignment requirement // of all TLS sections and make tls_begin also aligned to that. Chunk<E> *first_tls_chunk = nullptr; u64 tls_alignment = 1; for (Chunk<E> *chunk : chunks) { if (chunk->shdr.sh_flags & SHF_TLS) { if (!first_tls_chunk) first_tls_chunk = chunk; tls_alignment = std::max(tls_alignment, (u64)chunk->shdr.sh_addralign); } }
for (i64 i = 0; i < chunks.size(); i++) { if (!(chunks[i]->shdr.sh_flags & SHF_ALLOC)) continue;
// .relro_padding is a padding section to extend a PT_GNU_RELRO // segment to cover an entire page. Technically, we don't need a // .relro_padding section because we can leave a trailing part of a // segment an unused space. However, the `strip` command would delete // such an unused trailing part and make an executable invalid. // So we add a dummy section. if (chunks[i] == ctx.relro_padding) { chunks[i]->shdr.sh_addr = addr; chunks[i]->shdr.sh_size = align_to(addr, ctx.page_size) - addr; addr += ctx.page_size; continue; }
// Handle --section-start first if (auto it = ctx.arg.section_start.find(chunks[i]->name); it != ctx.arg.section_start.end()) { addr = it->second; chunks[i]->shdr.sh_addr = addr; addr += chunks[i]->shdr.sh_size; continue; }
// Memory protection works at page size granularity. We need to // put sections with different memory attributes into different // pages. We do it by inserting paddings here. if (i > 0 && chunks[i - 1] != ctx.relro_padding) { i64 flags1 = get_flags(chunks[i - 1]); i64 flags2 = get_flags(chunks[i]);
if (flags1 != flags2) { switch (ctx.arg.z_separate_code) { case SEPARATE_LOADABLE_SEGMENTS: addr = align_to(addr, ctx.page_size); break; case SEPARATE_CODE: if ((flags1 & PF_X) != (flags2 & PF_X)) { addr = align_to(addr, ctx.page_size); break; } [[fallthrough]]; case NOSEPARATE_CODE: if (addr % ctx.page_size != 0) addr += ctx.page_size; break; default: unreachable(); } } }
// TLS BSS sections are laid out so that they overlap with the // subsequent non-tbss sections. Overlapping is fine because a STT_TLS // segment contains an initialization image for newly-created threads, // and no one except the runtime reads its contents. Even the runtime // doesn't need a BSS part of a TLS initialization image; it just // leaves zero-initialized bytes as-is instead of copying zeros. // So no one really read tbss at runtime. // // We can instead allocate a dedicated virtual address space to tbss, // but that would be just a waste of the address and disk space. if (is_tbss(chunks[i])) { u64 addr2 = addr; for (;;) { addr2 = align_to(addr2, alignment(chunks[i])); chunks[i]->shdr.sh_addr = addr2; addr2 += chunks[i]->shdr.sh_size; if (i + 2 == chunks.size() || !is_tbss(chunks[i + 1])) break; i++; } continue; }
template <typename E> i64 to_phdr_flags(Context<E> &ctx, Chunk<E> *chunk){ // All sections are put into a single RWX segment if --omagic if (ctx.arg.omagic) return PF_R | PF_W | PF_X;
// .text is not readable if --execute-only if (exec && ctx.arg.execute_only) { if (write) Error(ctx) << "--execute-only is not compatible with writable section: " << chunk->name; return PF_X; }
// .rodata is merged with .text if --no-rosegment if (!write && !ctx.arg.rosegment) exec = true;
sh_offset: The byte offset from the beginning of the file to the first byte in the section. Section type SHT_NOBITS occupies no space in the file. Its sh_offset member locates the conceptual placement in the file.
sh_size: The section’s size in bytes. Unless the section type is SHT_NOBITS, the section occupies sh_size bytes in the file. A section of type SHT_NOBITS can have a nonzero size, but it occupies no space in the file.
SHT_NOBITS: Identifies a section that occupies no space in the file but otherwise resembles SHT_PROGBITS. Although this section contains no bytes, the sh_offset member contains the conceptual file offset.
while (i < c.size() && !(c[i]->shdr.sh_flags & SHF_ALLOC)) i++;
auto assign_addr = [&] { if (i != 0) { i64 flags1 = to_phdr_flags(ctx, c[i - 1]); i64 flags2 = to_phdr_flags(ctx, c[i]);
// Memory protection works at page size granularity. We need to // put sections with different memory attributes into different // pages. We do it by inserting paddings here. if (flags1 != flags2) { switch (ctx.arg.z_separate_code) { case SEPARATE_LOADABLE_SEGMENTS: addr = align_to(addr, ctx.page_size); break; case SEPARATE_CODE: if ((flags1 & PF_X) != (flags2 & PF_X)) addr = align_to(addr, ctx.page_size); break; default: break; } } }
// Assign ALLOC sections contiguous file offsets as long as they // are contiguous in memory. for (;;) { chunks[i]->shdr.sh_offset = fileoff + chunks[i]->shdr.sh_addr - first.shdr.sh_addr; i++;
if (i >= chunks.size() || !(chunks[i]->shdr.sh_flags & SHF_ALLOC) || chunks[i]->shdr.sh_type == SHT_NOBITS) break;
// If --start-section is given, addresses may not increase // monotonically. if (chunks[i]->shdr.sh_addr < first.shdr.sh_addr) break;
// If --start-section is given, there may be a large gap between // sections. We don't want to allocate a disk space for a gap if // exists. if (gap_size >= ctx.page_size) break; }
while (i < chunks.size() && (chunks[i]->shdr.sh_flags & SHF_ALLOC) && chunks[i]->shdr.sh_type == SHT_NOBITS) i++; }
return fileoff; }
riscv_resize_sections
1 2 3 4 5 6
// On RISC-V, branches are encode using multiple instructions so // that they can jump to anywhere in ±2 GiB by default. They may // be replaced with shorter instruction sequences if destinations // are close enough. Do this optimization. ifconstexpr(is_riscv<E>) filesize = riscv_resize_sections(ctx);
// Shrink sections by interpreting relocations. // // This operation seems to be optional, because by default longest // instructions are being used. However, calling this function is actually // mandatory because of R_RISCV_ALIGN. R_RISCV_ALIGN is a directive to the // linker to align the location referred to by the relocation to a // specified byte boundary. We at least have to interpret them to satisfy // the alignment constraints. template <typename E> i64 riscv_resize_sections(Context<E> &ctx){ Timer t(ctx, "riscv_resize_sections");
// True if we can use the 2-byte instructions. This is usually true on // Unix because RV64GC is generally considered the baseline hardware. bool use_rvc = get_eflags(ctx) & EF_RISCV_RVC;
// Find all the relocations that can be relaxed. // This step should only shrink sections. tbb::parallel_for_each(ctx.objs, [&](ObjectFile<E> *file) { for (std::unique_ptr<InputSection<E>> &isec : file->sections) if (is_resizable(ctx, isec.get())) shrink_section(ctx, *isec, use_rvc); });
// Fix symbol values. tbb::parallel_for_each(ctx.objs, [&](ObjectFile<E> *file) { for (Symbol<E> *sym : file->symbols) { if (sym->file != file) continue;
InputSection<E> *isec = sym->get_input_section(); if (!isec || isec->extra.r_deltas.empty()) continue;
u32 ret = objs[0]->get_ehdr().e_flags; for (i64 i = 1; i < objs.size(); i++) if (objs[i]->get_ehdr().e_flags & EF_RISCV_RVC) ret |= EF_RISCV_RVC; return ret; }
for (i64 i = 0; i < rels.size(); i++) { const ElfRel<E> &r = rels[i]; Symbol<E> &sym = *isec.file.symbols[r.r_sym]; isec.extra.r_deltas[i] = delta;
// Handling R_RISCV_ALIGN is mandatory. // // R_RISCV_ALIGN refers NOP instructions. We need to eliminate some // or all of the instructions so that the instruction that immediately // follows the NOPs is aligned to a specified alignment boundary. if (r.r_type == R_RISCV_ALIGN) { // The total bytes of NOPs is stored to r_addend, so the next // instruction is r_addend away. u64 loc = isec.get_addr() + r.r_offset - delta; u64 next_loc = loc + r.r_addend; u64 alignment = bit_ceil(r.r_addend + 1); assert(alignment <= (1 << isec.p2align)); delta += next_loc - align_to(loc, alignment); continue; }
// Handling other relocations is optional. if (!ctx.arg.relax || i == rels.size() - 1 || rels[i + 1].r_type != R_RISCV_RELAX) continue;
// Linker-synthesized symbols haven't been assigned their final // values when we are shrinking sections because actual values can // be computed only after we fix the file layout. Therefore, we // assume that relocations against such symbols are always // non-relaxable. if (sym.file == ctx.internal_obj) continue;
switch (r.r_type) { case R_RISCV_CALL: case R_RISCV_CALL_PLT: { // These relocations refer an AUIPC + JALR instruction pair to // allow to jump to anywhere in PC ± 2 GiB. If the jump target is // close enough to PC, we can use C.J, C.JAL or JAL instead. i64 dist = compute_distance(ctx, sym, isec, r); if (dist & 1) break;
if (rd == 0 && sign_extend(dist, 11) == dist && use_rvc) { // If rd is x0 and the jump target is within ±2 KiB, we can use // C.J, saving 6 bytes. delta += 6; } elseif (rd == 1 && sign_extend(dist, 11) == dist && use_rvc && !E::is_64) { // If rd is x1 and the jump target is within ±2 KiB, we can use // C.JAL. This is RV32 only because C.JAL is RV32-only instruction. delta += 6; } elseif (sign_extend(dist, 20) == dist) { // If the jump target is within ±1 MiB, we can use JAL. delta += 4; } break; } case R_RISCV_HI20: { // If the upper 20 bits are all zero, we can remove LUI. // The corresponding instructions referred by LO12_I/LO12_S // relocations will use the zero register instead. i64 val = sym.get_addr(ctx); if (sign_extend(val, 11) == val) delta += 4; break; } case R_RISCV_TPREL_HI20: case R_RISCV_TPREL_ADD: { // These relocations are used to materialize the upper 20 bits of // an address relative to the thread pointer as follows: // // lui a5,%tprel_hi(foo) # R_RISCV_TPREL_HI20 (symbol) // add a5,a5,tp,%tprel_add(foo) # R_RISCV_TPREL_ADD (symbol) // // Then thread-local variable `foo` is accessed with a 12-bit offset // like this: // // sw t0,%tprel_lo(foo)(a5) # R_RISCV_TPREL_LO12_S (symbol) // // However, if the offset is ±2 KiB, we don't need to materialize // the upper 20 bits in a register. We can instead access the // thread-local variable directly with TP like this: // // sw t0,%tprel_lo(foo)(tp) // // Here, we remove `lui` and `add` if the offset is within ±2 KiB. i64 val = sym.get_addr(ctx) + r.r_addend - ctx.tp_addr; if (sign_extend(val, 11) == val) delta += 4; break; } } }