Android(安卓)WatchDog介绍
文章目录
-
- Android WatchDog
-
- WatchDog初始化
- HandlerChecker介绍
- WatchDog检测逻辑介绍
- 参考文献
WatchDog,在早期的嵌入式系统,设计它是为了防止软件系统跑飞后最后一个挽救措施,就是重启设备,虽然有点暴力,但是一般重启后,对于很多偶现的bug,基本都能临时解决
WatchDog的设计基本都需要包含如下三个功能
- 投喂机制
- dump异常日志
- 异常修复
投喂机制,又分成
- 被动 - 等系统来喂"食物"
- 主动 - 自己主动检查是否有"食物"
不管是主动还是被动,当没"食物"给到WatchDog的时候,都会触发异常,接着dump异常日志,然后尝试修复
早期嵌入式系统,WatchDog一般都是硬件设备,所以会采用软件系统喂的方式
对于为了软件系统而实现WatchDog,由于实现更加灵活,所以投喂机制就可以按需来实现
Android WatchDog
Android系统也存在WatchDog,主要用于监控systemserver内部各服务线程的运行情况,systemserver在初始化启动服务时,会完成WatchDog的初始化配置和启动
private void startOtherServices() {{ ... final Watchdog watchdog = Watchdog.getInstance(); watchdog.init(context, mActivityManagerService); ... mActivityManagerService.systemReady(new Runnable() { ..... Watchdog.getInstance().start(); })}
先调用init初始化,然后在AMS.systemReady完成后,启动WatchDog,那怎么往WatchDog配置监控线程或回调呢?直接拿AMS的配置代码举例:
public ActivityManagerService(Context systemContext) { ... Watchdog.getInstance().addMonitor(this); Watchdog.getInstance().addThread(mHandler);}
在构造函数结束前,添加了监控回调和与监控线程绑定的handler
WatchDog初始化
接着从代码来分析,先看WatchDog的构造函数
public class Watchdog extends Thread { private Watchdog() { super("watchdog"); // Initialize handler checkers for each common thread we want to check. Note // that we are not currently checking the background thread, since it can // potentially hold longer running operations with no guarantees about the timeliness // of operations there. // The shared foreground thread is the main checker. It is where we // will also dispatch monitor checks and do other work. mMonitorChecker = new HandlerChecker(FgThread.getHandler(), "foreground thread", DEFAULT_TIMEOUT); mHandlerCheckers.add(mMonitorChecker); // Add checker for main thread. We only do a quick check since there // can be UI running on the thread. mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()), "main thread", DEFAULT_TIMEOUT)); // Add checker for shared UI thread. mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(), "ui thread", DEFAULT_TIMEOUT)); // And also check IO thread. mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(), "i/o thread", DEFAULT_TIMEOUT)); // And the display thread. mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(), "display thread", DEFAULT_TIMEOUT)); } ...}
WatchDog派生自Thread,在构造时,主要初始化
- mMonitorChecker - monitor监控回调执行线程绑定的HandlerChecker
- mHandlerCheckers - 初始化预置的HandlerCheckers
HandlerChecker实现了对Handler绑定线程执行超时做监控,超时时间可在构造时配置,这个是默认行为,基于Android Handler Looper机制来实现的
除了默认行为,我们还可以通过设置HandlerChecker的monitor回调,来添加自定义的监控行为
WatchDog的monitor回调会被统一保存到mMonitorChecker
HandlerChecker介绍
HandlerChecker的核心实现介绍:
- post message到message queue的头部
public void scheduleCheckLocked() { //monitor回调为空并且looper是空闲的,状态置为完成直接返回 if (mMonitors.size() == 0 && mHandler.getLooper().isIdling()) { // If the target looper is or just recently was idling, then // there is no reason to enqueue our checker on it since that // is as good as it not being deadlocked. This avoid having // to do a context switch to check the thread. Note that we // only do this if mCheckReboot is false and we have no // monitors, since those would need to be executed at this point. mCompleted = true; return; } if (!mCompleted) { // we already have a check in flight, so no need return; } mCompleted = false; mCurrentMonitor = null; mStartTime = SystemClock.uptimeMillis(); //往头部插入message mHandler.postAtFrontOfQueue(this); }
- message关联runnable被执行
public void run() { final int size = mMonitors.size(); //执行monitor回调 for (int i = 0 ; i < size ; i++) { synchronized (Watchdog.this) { mCurrentMonitor = mMonitors.get(i); } mCurrentMonitor.monitor(); } //设置执行完成状态 synchronized (Watchdog.this) { mCompleted = true; mCurrentMonitor = null; } }
- 获取执行状态
public int getCompletionStateLocked() { if (mCompleted) { return COMPLETED; } else { long latency = SystemClock.uptimeMillis() - mStartTime; if (latency < mWaitMax/2) { return WAITING; } else if (latency < mWaitMax) { return WAITED_HALF; } } return OVERDUE; }
从上面的代码可以看出,在scheduleCheckLocked()被调用后,能够影响HandlerChecker状态置为COMPLETED就两点
- handlerchecker关联的线程阻塞,导致post message关联runnable在超时时间内没被执行
- runnable执行了,并配置了monitor回调,monitor回调执行超时了
WatchDog检测逻辑介绍
上头说了,WatchDog自身就是一条线程,在线程启动后触发检测,直接看代码吧
@Override public void run() { boolean waitedHalf = false; while (true) { final ArrayList blockedCheckers; final String subject; final boolean allowRestart; int debuggerWasConnected = 0; synchronized (this) { //检测间隔,默认半分钟 long timeout = CHECK_INTERVAL; // Make sure we (re)spin the checkers that have become idle within // this wait-and-check interval //遍历handlerchecker,依次触发检测 for (int i=0; i 0) { debuggerWasConnected--; } // NOTE: We use uptimeMillis() here because we do not want to increment the time we // wait while asleep. If the device is asleep then the thing that we are waiting // to timeout on is asleep as well and won't have a chance to run, causing a false // positive on when to kill things. long start = SystemClock.uptimeMillis(); while (timeout > 0) { if (Debug.isDebuggerConnected()) { debuggerWasConnected = 2; } try { //线程等待 wait(timeout); } catch (InterruptedException e) { Log.wtf(TAG, e); } if (Debug.isDebuggerConnected()) { debuggerWasConnected = 2; } timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start); } final int waitState = evaluateCheckerCompletionLocked(); if (waitState == COMPLETED) { // The monitors have returned; reset waitedHalf = false; continue; } else if (waitState == WAITING) { // still waiting but within their configured intervals; back off and recheck continue; } else if (waitState == WAITED_HALF) { if (!waitedHalf) { // We've waited half the deadlock-detection interval. Pull a stack // trace and wait another half. ArrayList pids = new ArrayList(); pids.add(Process.myPid()); ActivityManagerService.dumpStackTraces(true, pids, null, null, NATIVE_STACKS_OF_INTEREST); waitedHalf = true; } continue; } // 超时了 blockedCheckers = getBlockedCheckersLocked(); subject = describeCheckersLocked(blockedCheckers); allowRestart = mAllowRestart; } // If we got here, that means that the system is most likely hung. // First collect stack traces from all threads of the system process. // Then kill this process so that the system will restart. EventLog.writeEvent(EventLogTags.WATCHDOG, subject); ArrayList pids = new ArrayList(); pids.add(Process.myPid()); if (mPhonePid > 0) pids.add(mPhonePid); // Pass !waitedHalf so that just in case we somehow wind up here without having // dumped the halfway stacks, we properly re-initialize the trace file. final File stack = ActivityManagerService.dumpStackTraces( !waitedHalf, pids, null, null, NATIVE_STACKS_OF_INTEREST); // Give some extra time to make sure the stack traces get written. // The system's been hanging for a minute, another second or two won't hurt much. SystemClock.sleep(2000); // Pull our own kernel thread stacks as well if we're configured for that if (RECORD_KERNEL_THREADS) { dumpKernelStackTraces(); } // Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel log doSysRq('w'); doSysRq('l'); // Try to add the error to the dropbox, but assuming that the ActivityManager // itself may be deadlocked. (which has happened, causing this statement to // deadlock and the watchdog as a whole to be ineffective) Thread dropboxThread = new Thread("watchdogWriteToDropbox") { public void run() { mActivity.addErrorToDropBox( "watchdog", null, "system_server", null, null, subject, null, stack, null); } }; dropboxThread.start(); try { dropboxThread.join(2000); // wait up to 2 seconds for it to return. } catch (InterruptedException ignored) {} IActivityController controller; synchronized (this) { controller = mController; } if (controller != null) { Slog.i(TAG, "Reporting stuck state to activity controller"); try { Binder.setDumpDisabled("Service dumps disabled due to hung system process."); // 1 = keep waiting, -1 = kill system int res = controller.systemNotResponding(subject); if (res >= 0) { Slog.i(TAG, "Activity controller requested to coninue to wait"); waitedHalf = false; continue; } } catch (RemoteException e) { } } // Only kill the process if the debugger is not attached. if (Debug.isDebuggerConnected()) { debuggerWasConnected = 2; } if (debuggerWasConnected >= 2) { Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process"); } else if (debuggerWasConnected > 0) { Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process"); } else if (!allowRestart) { Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process"); } else { Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject); for (int i=0; i
从代码可以很明显的看出整个逻辑
- 通过无限循环来达到重复检测
- 在每次检测前,遍历所有的HandlerChecker并调用scheduleCheckLocked
- 通过调用wait函数并设置超时时间来使线程挂起一段时间
- 超时后线程继续执行,通过调用evaluateCheckerCompletionLocked获取各个HandlerChecker的最终执行状态,如果返回overdue,说明存在未完成的情况
- 通过调用ActivityManagerService.dumpStackTraces保存堆栈信息
- 通过mActivity.addErrorToDropBox将错误日志保存到dropbox
- 通过Process.killProcess(Process.myPid())和System.exit(10)杀死system server进程,从而触发Android设备的软重启
参考文献
Android7.0 Watchdog机制
更多相关文章
- android使用JSBridge机制原理
- Android(安卓)网络通信——Volley
- Android:常用设定延时的方法
- Android控件系列之ProgressBar&在Android中利用Handler处理多线
- 重拾Android
- android开发系列之消息机制
- android studio导入Xutils
- Android进程调度
- 启动一个没有界面的Activity