文章目录

    • Android WatchDog
      • WatchDog初始化
      • HandlerChecker介绍
      • WatchDog检测逻辑介绍
      • 参考文献

WatchDog,在早期的嵌入式系统,设计它是为了防止软件系统跑飞后最后一个挽救措施,就是重启设备,虽然有点暴力,但是一般重启后,对于很多偶现的bug,基本都能临时解决

WatchDog的设计基本都需要包含如下三个功能

  • 投喂机制
  • dump异常日志
  • 异常修复

投喂机制,又分成

  • 被动 - 等系统来喂"食物"
  • 主动 - 自己主动检查是否有"食物"

不管是主动还是被动,当没"食物"给到WatchDog的时候,都会触发异常,接着dump异常日志,然后尝试修复

早期嵌入式系统,WatchDog一般都是硬件设备,所以会采用软件系统喂的方式

对于为了软件系统而实现WatchDog,由于实现更加灵活,所以投喂机制就可以按需来实现

Android WatchDog

Android系统也存在WatchDog,主要用于监控systemserver内部各服务线程的运行情况,systemserver在初始化启动服务时,会完成WatchDog的初始化配置和启动

private void startOtherServices() {{    ...    final Watchdog watchdog = Watchdog.getInstance();    watchdog.init(context, mActivityManagerService);    ...    mActivityManagerService.systemReady(new Runnable() {        .....        Watchdog.getInstance().start();    })}

先调用init初始化,然后在AMS.systemReady完成后,启动WatchDog,那怎么往WatchDog配置监控线程或回调呢?直接拿AMS的配置代码举例:

public ActivityManagerService(Context systemContext) {   ...   Watchdog.getInstance().addMonitor(this);   Watchdog.getInstance().addThread(mHandler);}

在构造函数结束前,添加了监控回调和与监控线程绑定的handler

WatchDog初始化

接着从代码来分析,先看WatchDog的构造函数

public class Watchdog extends Thread {    private Watchdog() {        super("watchdog");        // Initialize handler checkers for each common thread we want to check.  Note        // that we are not currently checking the background thread, since it can        // potentially hold longer running operations with no guarantees about the timeliness        // of operations there.        // The shared foreground thread is the main checker.  It is where we        // will also dispatch monitor checks and do other work.        mMonitorChecker = new HandlerChecker(FgThread.getHandler(),                "foreground thread", DEFAULT_TIMEOUT);        mHandlerCheckers.add(mMonitorChecker);        // Add checker for main thread.  We only do a quick check since there        // can be UI running on the thread.        mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),                "main thread", DEFAULT_TIMEOUT));        // Add checker for shared UI thread.        mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),                "ui thread", DEFAULT_TIMEOUT));        // And also check IO thread.        mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),                "i/o thread", DEFAULT_TIMEOUT));        // And the display thread.        mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),                "display thread", DEFAULT_TIMEOUT));    }    ...}

WatchDog派生自Thread,在构造时,主要初始化

  • mMonitorChecker - monitor监控回调执行线程绑定的HandlerChecker
  • mHandlerCheckers - 初始化预置的HandlerCheckers

HandlerChecker实现了对Handler绑定线程执行超时做监控,超时时间可在构造时配置,这个是默认行为,基于Android Handler Looper机制来实现的

除了默认行为,我们还可以通过设置HandlerChecker的monitor回调,来添加自定义的监控行为

WatchDog的monitor回调会被统一保存到mMonitorChecker

HandlerChecker介绍

HandlerChecker的核心实现介绍:

  1. post message到message queue的头部
        public void scheduleCheckLocked() {            //monitor回调为空并且looper是空闲的,状态置为完成直接返回            if (mMonitors.size() == 0 && mHandler.getLooper().isIdling()) {                // If the target looper is or just recently was idling, then                // there is no reason to enqueue our checker on it since that                // is as good as it not being deadlocked.  This avoid having                // to do a context switch to check the thread.  Note that we                // only do this if mCheckReboot is false and we have no                // monitors, since those would need to be executed at this point.                mCompleted = true;                return;            }            if (!mCompleted) {                // we already have a check in flight, so no need                return;            }            mCompleted = false;            mCurrentMonitor = null;            mStartTime = SystemClock.uptimeMillis();            //往头部插入message            mHandler.postAtFrontOfQueue(this);        }
  1. message关联runnable被执行
        public void run() {            final int size = mMonitors.size();            //执行monitor回调            for (int i = 0 ; i < size ; i++) {                synchronized (Watchdog.this) {                    mCurrentMonitor = mMonitors.get(i);                }                mCurrentMonitor.monitor();            }            //设置执行完成状态            synchronized (Watchdog.this) {                mCompleted = true;                mCurrentMonitor = null;            }        }
  1. 获取执行状态
        public int getCompletionStateLocked() {            if (mCompleted) {                return COMPLETED;            } else {                long latency = SystemClock.uptimeMillis() - mStartTime;                if (latency < mWaitMax/2) {                    return WAITING;                } else if (latency < mWaitMax) {                    return WAITED_HALF;                }            }            return OVERDUE;        }

从上面的代码可以看出,在scheduleCheckLocked()被调用后,能够影响HandlerChecker状态置为COMPLETED就两点

  • handlerchecker关联的线程阻塞,导致post message关联runnable在超时时间内没被执行
  • runnable执行了,并配置了monitor回调,monitor回调执行超时了

WatchDog检测逻辑介绍

上头说了,WatchDog自身就是一条线程,在线程启动后触发检测,直接看代码吧

    @Override    public void run() {        boolean waitedHalf = false;        while (true) {            final ArrayList blockedCheckers;            final String subject;            final boolean allowRestart;            int debuggerWasConnected = 0;            synchronized (this) {                //检测间隔,默认半分钟                long timeout = CHECK_INTERVAL;                // Make sure we (re)spin the checkers that have become idle within                // this wait-and-check interval                //遍历handlerchecker,依次触发检测                for (int i=0; i 0) {                    debuggerWasConnected--;                }                // NOTE: We use uptimeMillis() here because we do not want to increment the time we                // wait while asleep. If the device is asleep then the thing that we are waiting                // to timeout on is asleep as well and won't have a chance to run, causing a false                // positive on when to kill things.                long start = SystemClock.uptimeMillis();                while (timeout > 0) {                    if (Debug.isDebuggerConnected()) {                        debuggerWasConnected = 2;                    }                    try {                        //线程等待                        wait(timeout);                    } catch (InterruptedException e) {                        Log.wtf(TAG, e);                    }                    if (Debug.isDebuggerConnected()) {                        debuggerWasConnected = 2;                    }                    timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);                }                final int waitState = evaluateCheckerCompletionLocked();                if (waitState == COMPLETED) {                    // The monitors have returned; reset                    waitedHalf = false;                    continue;                } else if (waitState == WAITING) {                    // still waiting but within their configured intervals; back off and recheck                    continue;                } else if (waitState == WAITED_HALF) {                    if (!waitedHalf) {                        // We've waited half the deadlock-detection interval.  Pull a stack                        // trace and wait another half.                        ArrayList pids = new ArrayList();                        pids.add(Process.myPid());                        ActivityManagerService.dumpStackTraces(true, pids, null, null,                                NATIVE_STACKS_OF_INTEREST);                        waitedHalf = true;                    }                    continue;                }                // 超时了                blockedCheckers = getBlockedCheckersLocked();                subject = describeCheckersLocked(blockedCheckers);                allowRestart = mAllowRestart;            }            // If we got here, that means that the system is most likely hung.            // First collect stack traces from all threads of the system process.            // Then kill this process so that the system will restart.            EventLog.writeEvent(EventLogTags.WATCHDOG, subject);            ArrayList pids = new ArrayList();            pids.add(Process.myPid());            if (mPhonePid > 0) pids.add(mPhonePid);            // Pass !waitedHalf so that just in case we somehow wind up here without having            // dumped the halfway stacks, we properly re-initialize the trace file.            final File stack = ActivityManagerService.dumpStackTraces(                    !waitedHalf, pids, null, null, NATIVE_STACKS_OF_INTEREST);            // Give some extra time to make sure the stack traces get written.            // The system's been hanging for a minute, another second or two won't hurt much.            SystemClock.sleep(2000);            // Pull our own kernel thread stacks as well if we're configured for that            if (RECORD_KERNEL_THREADS) {                dumpKernelStackTraces();            }            // Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel log            doSysRq('w');            doSysRq('l');            // Try to add the error to the dropbox, but assuming that the ActivityManager            // itself may be deadlocked.  (which has happened, causing this statement to            // deadlock and the watchdog as a whole to be ineffective)            Thread dropboxThread = new Thread("watchdogWriteToDropbox") {                    public void run() {                        mActivity.addErrorToDropBox(                                "watchdog", null, "system_server", null, null,                                subject, null, stack, null);                    }                };            dropboxThread.start();            try {                dropboxThread.join(2000);  // wait up to 2 seconds for it to return.            } catch (InterruptedException ignored) {}            IActivityController controller;            synchronized (this) {                controller = mController;            }            if (controller != null) {                Slog.i(TAG, "Reporting stuck state to activity controller");                try {                    Binder.setDumpDisabled("Service dumps disabled due to hung system process.");                    // 1 = keep waiting, -1 = kill system                    int res = controller.systemNotResponding(subject);                    if (res >= 0) {                        Slog.i(TAG, "Activity controller requested to coninue to wait");                        waitedHalf = false;                        continue;                    }                } catch (RemoteException e) {                }            }            // Only kill the process if the debugger is not attached.            if (Debug.isDebuggerConnected()) {                debuggerWasConnected = 2;            }            if (debuggerWasConnected >= 2) {                Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");            } else if (debuggerWasConnected > 0) {                Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");            } else if (!allowRestart) {                Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");            } else {                Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);                for (int i=0; i

从代码可以很明显的看出整个逻辑

  1. 通过无限循环来达到重复检测
  2. 在每次检测前,遍历所有的HandlerChecker并调用scheduleCheckLocked
  3. 通过调用wait函数并设置超时时间来使线程挂起一段时间
  4. 超时后线程继续执行,通过调用evaluateCheckerCompletionLocked获取各个HandlerChecker的最终执行状态,如果返回overdue,说明存在未完成的情况
  5. 通过调用ActivityManagerService.dumpStackTraces保存堆栈信息
  6. 通过mActivity.addErrorToDropBox将错误日志保存到dropbox
  7. 通过Process.killProcess(Process.myPid())和System.exit(10)杀死system server进程,从而触发Android设备的软重启

参考文献

Android7.0 Watchdog机制

更多相关文章

  1. android使用JSBridge机制原理
  2. Android(安卓)网络通信——Volley
  3. Android:常用设定延时的方法
  4. Android控件系列之ProgressBar&在Android中利用Handler处理多线
  5. 重拾Android
  6. android开发系列之消息机制
  7. android studio导入Xutils
  8. Android进程调度
  9. 启动一个没有界面的Activity

随机推荐

  1. Android的MediaPlayer架构介绍
  2. android操作sdcard中的多媒体文件(一)——
  3. Android简明开发教程一:概述
  4. Android中的控件
  5. adroid风格和主题
  6. android系统架构解析
  7. Android中LocationManager的简单使用,获
  8. 在Android中使用Handler和Thread线程执行
  9. Android(安卓)ADB常用命令以及环境配置
  10. Android优秀开源项目