stub&lazy bind
When i learn mach-o format before,i just know a little about stub. Now i have enough free time to study it. so record it. I try to explain how stubs work detaily,hope it can helpful.
Tool
- Xcode
- MachOView
- Hopper Disassember
- dyld-551.3
Workflow
prepare
Use
Xcode
start aCommand Line Tool
project in macOS, here i just namedstubDebug
for demonstrate. Then replace the defaultNSLog(@"Hello world");
toprintf("Hello, World!\n");
,like below:1
2
3
4
5
6
7int main(int argc, const char * argv[]) {
@autoreleasepool {
// insert code here...
printf("Hello, World!\n");
}
return 0;
}
- Compile the project to generate the executable
stubDebug
(mach-o format),the drag it to bothMachOView
andHopper Disassember
analysis
In Hopper Disassember
,let’s start with _main
label,we can find an instruction below
1 | 0000000100000f48 call imp___stubs__printf |
this is the point where our source codeprintf("Hello, World!\n");
execute.
Click the imp___stubs__printf
label ,jump to
1 | 0000000100000f6e jmp qword [_printf_ptr] |
Then click _printf_prt
label, jump to
1 | 0000000100001020 dq _printf |
The _printf
is just a tip,not a acture address.So we will find the address 0x100001020
in MachOView
,it locates in __DATA,_la_symbol_ptr
1 | 100001020 0000000100000F98 Indirect Pointer [0x100001020 -> _print] |
Then go to address 0x100000F98
,it locates in __TXEXT,__stub_helper
For easy, we use Hopper Disassember
to ayalysis go on.
1 | ; Section __stub_helper |
When we arrive in address 0x100000f98
, then we push 0x3f
, push 0x100001008
,then jump dyld_stub_binder
.
What do 0x3f
and 0x100001008
mean? We will explain it for later, for now, we just consider they are two numberes.
Then wo search dyld_sub_binder
in dyld-551.3
source code. It’t a assembler code.We look at the __x86_64__
architecture,
What a bad luck, it’s too long! Don’t lose heart.We will only analysis some instruction below:
1 | movq MH_PARAM_RBP(%rbp),%rdi # call fastBindLazySymbol(loadercache, lazyinfo) |
Then we search fastBindLazySymbol
,it’s a c++
code.The 0x3f
and 0x100001008
are two parameters here actually (0x3f -> lazyBindingInfoOffset
, 0x100001008 -> imageLoaderCache
)
Now,imageLoaderCache
is 0x100001008, *imageLoaderCache
is the data located in address 0x100001008
,we can find the the data in address 0x100001008
is 0x0000000
in MachOView
.
So we will arrive at dyld::findMappedRange
,it’t a fast address->image lookups. Simply explain, it will find the ImageLoader* where address 0x100001008
locates in. It’s our stubDebug
main executable certainly.
Then execute doBindFastLazySymbol
function. Let’s look at the ImageLoaderMachOCompressed::doBindFastLazySymbol
First will focus on code below:
1 | getLazyBindingInfo(lazyBindingInfoOffset, start, end, &segIndex, &segOffset, &libraryOrdinal, &symbolName, &doneAfterBind) |
The code will analysis Laze Binding Info
for offset 0x3f
.Open MachOView
again, find Dynamic Loader Info ->Lazy Binding Info
.The Lazy Binding Info
start as 0x100002020
, then add offset 0x3f
,we get address 0x10000205f
,
1 | 10000205E 00 BIND_OPCODE_DONE |
The MachOView
is already help us to explain the meaning for every filed. So we go back ImageLoaderMachOCompressed::doBindFastLazySymbol
, know that:
1 | segIndex = 2; // 0 is `__PAGEZERO`,1 is `__TEXT`,2 is `__DATA` |
The general idea for this Lazy Binding Info
is go to dylib(3)
find symbol _printf
address , then fill the address in this segment(2)
segment with offset(32)
.
Then focus on code below:
1 | uintptr_t address = segActualLoadAddress(segIndex) + segOffset; |
segIndex =2
,is’s __DATA
segment.(0 is __PAGEZERO
,1 is __TEXT
,2 is __DATA
).
Follow the segActualLoadAddress
function,look at the LC_SEGMENT_64(__DATA)
in MachOView
,The VM Address is 0x100001000
,then add offset 0x20(segOffset=32),we get address = 0x100001020; It is the _printf
placeholer. We know it before actually ,the 0x100001020
will jump __stub_helper
before ,but now we will fill in actual address of _printf
.
1 | 100001020 0000000100000F98 Indirect Pointer [0x100001020 -> _print] |
Then look at bindAt
function,
1 | // resolve symbol |
The resolve
call stack is almost follows:
1 | -resolve() |
Let’s focus on libImage((unsigned int)libraryOrdinal-1)
first. the libraryOrdinal
is 3 actually depend on above.In this stubDebug
project,libImage(3-1) mean libSystem.B.dylib
.Why? We can see stubDebug
in MachOView
for Load Commands
part. We can see
1 | LC_LOAD_DYLIB(Foundation) |
Then focus on findExportedSymbol
, Because libSystem.B.dylib
is a collection of libsystem_c.dylib
,libsystem_kernal.dylib
…… It will look up for ecah in recursive way.
We know _printf
is in libsystem_c.dylib,
,so we assume we are in libsystem_c.dylib
,then execute findShallowExportedSymbol
,this function look up Dynamic Loader Info
->Export Info
of libsystem_c.dylib
Open libsystem_c.dylib
inMachOView
,The Export Info
is a trie,we can find _printf
in logic below:
1 | 92972 5F00 Node Lable '_' |
Now, we get the symbol offset 0x40EC4,this is the address of the symbol _printf
actually.We can confirm it in Hopper Disassembler
1 | _printf: |
Now,We find the address of symbol _printf
, and we also know we should bind it to stubDebug
‘s address 0x100001020
in __DATA,_la_symbol_ptr
,to replace __TEXT,__stub_helper
with_printf
,We finish the lazy bind.
Finally,we go back dylb_stub_binder
1 | Lbind: |
We jump _printf
to finish the statement printf("Hello World!\n")
. When call _printf
next time,we will call directly ,rather than by _stub_helper
Summary
- Generally called symbol is
const char *
structure. - Store export symbol info is
trie
,trie
can reduce memory. - The lazy bind process is very likely cache mechanism.