Closed Bug 969146 Opened 10 years ago Closed 10 years ago

Crashes from child processes don't get processed in b2g mochitests

Categories

(Testing :: Mochitest, defect, P2)

ARM
Gonk (Firefox OS)
defect

Tracking

(Not tracked)

RESOLVED FIXED
mozilla34

People

(Reporter: jld, Assigned: ahal)

References

Details

(Keywords: ateam-b2g-task)

Attachments

(1 file, 3 obsolete files)

Not sure if this is the right bugzilla component.  The problem is that, when a test causes a process crash (during `mach mochitest-remote` on B2G), the test framework waits for a relatively long timeout — 10 minutes, apparently — before failing.  It would be nice if this condition could be detected sooner, especially for people running tests locally.  (It would also save some resources for TBPL if test hosts could be freed sooner in that case, but the added cost is probably relatively small.)
Is this crashing the main b2g process or a content process?
Just to be clear.. the crash is being detected properly, right? It's just that the crash stack comes after the long timeout?
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #1)
> Is this crashing the main b2g process or a content process?

It's a content process crash, and that might be the key thing here.  Would it suffice for the main process to exit when it detects the content process crash, maybe?  My vague understanding of the mochitest setup is that we're replacing the b2g system app with something test-specific.

(In reply to Andrew Halberstadt [:ahal] from comment #2)
> Just to be clear.. the crash is being detected properly, right? It's just
> that the crash stack comes after the long timeout?

Yes; it's being detected (and the minidump is stackwalked if the environment is set up for that), but only after the 10-minute timeout.
We had a similar problem with the old Fennec for Android, where tests were running in a content process. We added a MOZ_CRASHREPORTER_SHUTDOWN variable that caused us to exit when we saw a content process crash. We probably need to do something similar for b2g testing.
I think with the landing of bug 976120, all we need to do to fix this is pass MOZ_CRASHREPORTER_SHUTDOWN in via B2GRunner.
I verified this works by doing a kill -6 on the b2g plugin-container processes. Note this will abort test runs even if a subprocess unrelated to the mochitest one crashes.. though I would think this is desired behaviour.
Assignee: nobody → ahalberstadt
Status: NEW → ASSIGNED
Attachment #8384770 - Flags: review?
Attachment #8384770 - Flags: review? → review?(ted)
Comment on attachment 8384770 [details] [diff] [review]
Patch 1.0 - pass in MOZ_CRASHREPORTER_SHUTDOWN via B2GRunner

Oh wait, this patch also stops crashreporting from working. It seems like we exit this way, B2GRunner assumes the run ended normally and doesn't check_for_crashes. I'll update the patch shortly.
Attachment #8384770 - Attachment is obsolete: true
Attachment #8384770 - Flags: review?(ted)
Let's give ted's review queue a break ;)
Attachment #8384789 - Flags: review?(mdas)
Comment on attachment 8384789 [details] [diff] [review]
Patch 1.0 - pass MOZ_CRASHREPORTER_SHUTDOWN into B2GRunner; always check for crash

Review of attachment 8384789 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm!
Attachment #8384789 - Flags: review?(mdas) → review+
Looks like the crashes all happened after the test run has finished. This patch modified B2GRunner to always check for crashes after a test run (instead of just on timeouts). My guess is that something in the way we shutdown b2g after the test run causes a crash.
Yep, I confirmed my theory locally. This crash happens anytime nsIAppStartup.quit() is called, regardless of whether it is at the end of a run or not. I'll file a separate bug.
Depends on: 980579
Depends on: 983489
Whiteboard: No meta-viewport tag found
Whiteboard: No meta-viewport tag found
Depends on: 987251
Priority: -- → P2
Unbitrot patch; carry forward r+
Attachment #8384789 - Attachment is obsolete: true
Attachment #8429452 - Flags: review+
Actually there still seem to be a fair number of shutdown crashes that are blocking me from landing:
https://tbpl.mozilla.org/?tree=Try&rev=6eb1527753f5
Depends on: 1016538
Depends on: 1016540
Morphing the title to reflect the cause instead of the symptom.
Summary: Crashes during B2G mochitest-remote time out instead of being immediately caught → Crashes from child processes don't get processed in b2g mochitests
The previous patch is bit-rotted, here's an updated one.
Attachment #8429452 - Attachment is obsolete: true
I am not sure if it is related to bug 1034527. Bug 1034527 happens when content process hits assertion and trigger signal handler. Then parent process tries to attach content process and hangs. After 450 seconds, the testing framework timeout and exists.
See Also: → 1034527
I just noticed that b2g mochitests are looking quite green on Pine (where this patch is applied):
https://tbpl.mozilla.org/?tree=Pine

Reftests are another story, but I don't think they are using mozrunner yet, so it might be unrelated. Here is a try run to see what happens:
https://tbpl.mozilla.org/?tree=Try&rev=faaa79a44038

If this breaks reftest/Mnw but mochitest retriggers remain green, I'll update the patch to only affect mochitests for the time being.
Comment on attachment 8471635 [details] [diff] [review]
always check for crash

Those look pretty green! I did a bunch of retriggers to be safe. In the meantime I'd like to get another review from Dave as he is working on crash detection with DeviceRunner.
Attachment #8471635 - Flags: review?(dave.hunt)
Attachment #8471635 - Flags: review?(dave.hunt) → review+
https://hg.mozilla.org/mozilla-central/rev/025955aed95e
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: