Back

Stopping NSZombie Invasion (Code Included)

We share the zombie object detection mechanism we made for the iFunny app, and present tips for those who also want to get rid of this issue. So you turn on your laptop, open Crashlytics, and voila: EXC_BAD_ACCESS objc_release… What now?

1_tFFSE4-HoMyn0XZ0DgCkYw.jpeg

Let’s discuss the nature of such crashes

All NSObject heirs and Pure Swift objects are reference types. When passing a new variable to an object, we copy only the memory location address in the heap, not the entire contents.

When object reference semantics are handled incorrectly, we may unknowingly cause a memory leak. It can be a trivial retain cycle between two objects referring to each other.

In order to avoid leaks, Objective-C and Swift have special reference options: weak and unowned. While weak is one of the good guys, you should be extremely careful with unowned!

Firstly, it is possible to miscommunicate with an unowned link, even within a single thread. Secondly, the heap is shared by all application threads, so non-atomic access can be unsafe. Thirdly, in release builds, the compiler can convert our unowned(safe) references to unowned(unsafe) type during optimization. These are faster but more dangerous.

Let’s see what can happen when using an unowned(unsafe) reference leading to an object that has already been destroyed, or partially destroyed.

In the first case, the address may contain unprepared raw memory. Then, you’ll get the EXC_BAD_ACCESS KERN_INVALID_ADDRESS error.

Now, if there is a new object at the address, maybe even of the same type, you may not experience any crashes. However, the problem will persist and get accumulated in the app. This is clearly demonstrated by the following snippet:

With any luck, we’ll see the coveted “bar” in the console. And if there’s none, we’ll get a crash with the EXC_BAD_ACCESS error.

These errors are called “use after free”. Such issues can even lead to exploits in your apps.

What Apple says

The bowels of the CoreFoundation framework contain NSZombie, an excellent mechanism. The idea behind it is simple: when starting a process, there is a check whether the “NSZombieEnabled” environment variable is present. If it’s found, the magic begins.

For NSObject heirs, the dealloc method is redefined, and the object class is substituted by NSZombie (by isa ivar via object_setClass) via the runtime library. The memory is not freed after the substitution, and a leak occurs. Any access to the object triggers assert, which reports the object type and the name of the method invoked.

Then, we enable debugging, activate the environment variable and do various manipulations in the app.

Naturally, like many others, we began looking for zombie objects in this way.

iFunny is a product that earns money through advertising, and we cooperate with a dozen partner SDKs. Dealing with memory leaks and similar issues is an integral part of an iOS engineer’s routine in products like this.

However, the official approach has several shortcomings:

  • Manually searching for zombie objects is not much fun at all.
  • There is no way to activate the mechanism in the prod for users.
  • And even if it were possible to activate it on the prod, what is to be done with endless memory leaks?

This means we need another option.

What Mike Ash says

While fishing for information online, we came across an article. It describes a custom Zombie mechanism implementation using the public API of the Objective-C runtime library.

We scoured the open-source solutions and found several ready-made implementations. Here they are:

https://github.com/lilidan/NSZombie https://chromium.googlesource.com/chromium/src/+/179d013b254fce69feb811badfad8cc0cd4952f2/components/crash/core/common/objc_zombie.mm https://github.com/Dokay/DJZombieCheck https://github.com/AlexTing0/DDZombieMonitor

They inspired us to create an in-house zombie mechanism.

Funcorp’s way

Just like we do with other modules of our app, we implemented this solution as an SPM package. Let’s look at its structure:

The package consists of two targets. We pull the Swift.fatalError function, sorely lacking in Objective-C, from “swift_shims”. Under the hood, the transmitted message is logged in all the necessary system stuff (stderr, sys logs , etc.), and abort is invoked.

Now, the FNZombie main target is where the real magic happens. It contains Objective-C code and some tight deallocs/work with the runtime library, so you’ve got to resort to the -fno-objc-arc (compile without ARC) compilation flag. We tried putting some functions that can’t be used in ARC code into a separate target and pull them as a dependency; unfortunately, it wouldn’t work that way.

Publicly, there’s only one header with a single interface sticking out of the target:

We use it to turn the zombie object search mechanism on and off in the main app.

In order to implement the zombie object mechanism, a root class has to be created. FNZombie.h file

At the very least, a root class must contain a static initialize method, which is a runtime requirement.

Next, we define a forwardingTargetForSelector instance method. If the object can’t define the method being invoked, then the message sent is processed here. This is the perfect place to cause a fatalError.

We also define a bunch of methods that can be invoked while sending the message; you can also slip in a fatalError there.

After engaging our mechanism, we allocate a buffer where we will store and run our zombie objects. Overriding the dealloc method for objects:

The zombie_dealloc substituted method is a normal C function.

We’ve got to do a few things inside:

  1. Retrieve the object class and size
  1. Release all variables of the instance and “associated objects”
  1. “Zero” the memory in the object
  1. Substitute the object class for zombie using the isa ivar

And finally, put our zombie in the buffer!

The implementation is available here

Obviously, this mechanism is dangerous and unstable. We decided that we will only detect zombies for beta users in Test Flight. We also highly discourage you from using this solution in production.

A couple of releases later, we got the coveted message in Crashlytics:

This made us happy campers, as we pinpointed the issue. After getting rid of it, we continued to monitor use after free problems.

Bonus

We purposely omitted some details in our story about the nature of the crashes. If you want to know more, here are some great articles to figure things out:

https://www.guru99.com/stack-vs-heap.html https://www.avanderlee.com/swift/exc-bad-access-crash/ https://www.blackhat.com/docs/eu-16/materials/eu-16-Wen-Use-After-Use-After-Free-Exploit-UAF-By-Generating-Your-Own-wp.pdf