mach-o  disassembly  disassembly-tools  pagestuff  binaries  command-line 

Exploring Mach-O binaries. Tools - pagestuff

— Continued in Exploring Mach-O binaries. Tools - nm

Introduction

Most of the programmers start their way in the software development by learning specific language, trying hard to understand documentation which describes key features and specific details, investigating available tools such as standard language libraries, sometimes they even check source code of the language implementation or standard library in order to get full understanding how things work. And I think it’s a correct way of self-development and all of the listed activities are valuable, but there is a problem. Documentation rarely covers all aspects, also as source code isn’t always available for all of the critical parts. What to do in this case? I believe that reverse engineering and exploration of the binaries could help to complete the picture in such situations. That’s why today I start new series of posts related to analysis of binaries in the macOS (OS X) and iOS.

Mach-O binaries

Mach-O is a file format which is used in the most of the Mach-based operational systems. The history of Mach-kernel started in the Carnegie Mellon University and later continued in the OS X / iOS as part of hybrid XNU. There was a whole story about the transition from “classic” Mac OS to Mach-based operation system called OS X (and currently called macOS). I’ve put the links below to the original CMU project and related Wikipedia pages for curious. However, currently, I’m mostly interested in the technical aspect located in the Mach-O structure.

Notice: At the moment of writing this post, I didn’t find the Mach-O official reference at the Apple Developer. (The only document was there Mach-O Programming Topics which is helpful, but cannot replace specification completely). It seems strange because every file format has tons of details. So my current post is based on the information I found from unofficial sources. I hope that these documents still are relevant and the differences will not be too significant.

Any specification isn’t very exciting topic itself, because such documents should provide very detailed information. Usually, you will not read it as your favorite Lord of the Rings from start to the end. Most probably you will refer to the specification if you have some question. So I think it would be wise to use them in a similar way. I’ll select one of the tools for binary file analysis and will try to clarify what output it produces.

There is a whole bunch of tools to work with binaries. I prefer to start with the most standard set (it’s described in the “Mach-O Programming Topics” documentation):

pagestuff

I’ve choosen probably the most simple tool for start, it’s pagestuff, which has only few input parameters. The name of the tool isn’t very obvious, so the short description will not be redundant. Because of the fact that I like man-pages for their clarity, let’s check the description from man pagestuff:

pagestuff displays information about the specified logical pages of a file conforming to the Mach-O executable format. For each specified page of code, symbols (function and static data structure names) are displayed. If no pages are specified, symbols for all pages in the __TEXT, __text section are displayed.

Ok. man states that Mach-O structure could be represented with logical pages, which could be displayed by pagestuff. Well, let’s try and see. It’s obvious that for binary analysis we need a binary. Basically, for our goals, it is enough to use simple console application. So I’ve created a blank OS X console project in Xcode and executed it in order to produce an output.

Source code of the sample project:

#import <Foundation/Foundation.h>

@interface SampleClass : NSObject

@property (nonatomic, copy) NSString *property;

@end

@implementation SampleClass

@end

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        // insert code here...
        NSLog(@"Hello, World!");
        
        SampleClass *class = [[SampleClass alloc] init];
        class.property = @"property223";
    }
    return 0;
}

If the repo is downloaded, we are ready to try (run in Terminal).

pagestuff executable_filepath -a, where executable_filepath in my case is equal exploring_mach-o_binaries-tools_pagestuff/output/sampler.

The tool produces the following output:

File Page 0 contains Mach-O headers
File Page 0 contains contents of section (__TEXT,__text)
File Page 0 contains contents of section (__TEXT,__stubs)
File Page 0 contains contents of section (__TEXT,__stub_helper)
File Page 0 contains contents of section (__TEXT,__objc_classname)
File Page 0 contains contents of section (__TEXT,__objc_methname)
File Page 0 contains contents of section (__TEXT,__objc_methtype)
File Page 0 contains contents of section (__TEXT,__cstring)
File Page 0 contains contents of section (__TEXT,__unwind_info)
Symbols on file page 0 virtual address 0x100000d30 to 0x100000ff8
  0x0000000100000d30 -[SampleClass property]
  0x0000000100000d50 -[SampleClass setProperty:]
  0x0000000100000d90 -[SampleClass .cxx_destruct]
  0x0000000100000dd0 _main
File Page 1 contains contents of section (__DATA,__nl_symbol_ptr)
File Page 1 contains contents of section (__DATA,__la_symbol_ptr)
File Page 1 contains contents of section (__DATA,__cfstring)
File Page 1 contains contents of section (__DATA,__objc_classlist)
File Page 1 contains contents of section (__DATA,__objc_imageinfo)
File Page 1 contains contents of section (__DATA,__objc_const)
File Page 1 contains contents of section (__DATA,__objc_selrefs)
File Page 1 contains contents of section (__DATA,__objc_classrefs)
File Page 1 contains contents of section (__DATA,__objc_ivar)
File Page 1 contains contents of section (__DATA,__objc_data)
Symbols on file page 1 virtual address 0x100001000 to 0x100001230
  0x00000001000011d8 _OBJC_IVAR_$_SampleClass._property
  0x00000001000011e0 _OBJC_METACLASS_$_SampleClass
  0x0000000100001208 _OBJC_CLASS_$_SampleClass
File Page 2 contains dyld info for sliding an image
File Page 2 contains dyld info for binding symbols
File Page 2 contains dyld info for lazy bound symbols
File Page 2 contains dyld info for symbols exported by a dylib
File Page 2 contains data of function starts
File Page 2 contains symbol table for non-global symbols
File Page 2 contains symbol table for defined global symbols
File Page 2 contains symbol table for undefined symbols
File Page 2 contains indirect symbols table
File Page 2 contains string table
File Page 2 contains data of code signature
File Page 3 contains data of code signature
File Page 4 contains data of code signature

Analysis

We’ve got a list of “File page N …” items with a description for each item. I think it’s a good idea to check them one by one (in this post only pages 0 - 1 will be considered).

struct mach_header {
  unsigned long  magic;      /* Mach magic number identifier */
  cpu_type_t     cputype;    /* cpu specifier */
  cpu_subtype_t  cpusubtype; /* machine specifier */
  unsigned long  filetype;   /* type of file */
  unsigned long  ncmds;      /* number of load commands */
  unsigned long  sizeofcmds; /* size of all load commands */
  unsigned long  flags;      /* flags */
};

The compact unwind information for the executable’s code. Generated for exception handling on OS X.

– Mike Ash

Specifically, OR the __DATA,__objc_imageinfo section with “00 00 00 00 02 00 00 00”; normally this section is all zeros. The __objc_imageinfo section corresponds to struct objc_image_info in: http://www.opensource.apple.com/source/objc4/objc4-551.1/runtime/objc-private.h

– “Fossies” - the Fresh Open Source Software Archive

It’s seems that it. __TEXT and __DATA sections describe most of the executable structure. Other part (file pages 2 - 4), is absolutely another story, which we skip for today.

Summary

pagestuff provides a simple approach to get the overview of the specific Mach-O file structure. We used pagestuff file path -a options to get a verbose form. More strict version page stuff file path -p provides it in the form of concrete offsets and lengths without description (internal section name is used instead). However, if the goal is to get a detailed picture for each section/segment developer should consider using otool - object file displaying tool (which I will try to describe in the next post).

Original project of the Carnegie Mellon University

  1. Carnegie Mellon University - Mach project - Unfortunately, a lot of the links are broken and some of the documentation is lost.

Wikipedia pages

  1. Mach kernel
  2. Classic Mac OS

References to Mach-O specification:

  1. NextStep 3.3 - Mach-O specification
  2. Unofficial Mach-O specification

Other references

  1. objc.io - Mach-O Executables
  2. Mike Ash - Let’s build Mach-O executable
  3. StackOverflow question about stubs in Mach-O
  4. image_info at Fossie

Apple Open Source

  1. Apple Open Source - unwind info

Apple Developer documentation:

  1. Mach-O Programming Topics
  2. Managing Memory