Android(安卓)WatchDog介绍 - android

文章目录

- Android WatchDog
- - WatchDog初始化
  - HandlerChecker介绍
  - WatchDog检测逻辑介绍
  - 参考文献

WatchDog，在早期的嵌入式系统，设计它是为了防止软件系统跑飞后最后一个挽救措施，就是重启设备，虽然有点暴力，但是一般重启后，对于很多偶现的bug，基本都能临时解决

WatchDog的设计基本都需要包含如下三个功能

投喂机制
dump异常日志
异常修复

投喂机制，又分成

被动 - 等系统来喂"食物"
主动 - 自己主动检查是否有"食物"

不管是主动还是被动，当没"食物"给到WatchDog的时候，都会触发异常，接着dump异常日志，然后尝试修复

早期嵌入式系统，WatchDog一般都是硬件设备，所以会采用软件系统喂的方式

对于为了软件系统而实现WatchDog，由于实现更加灵活，所以投喂机制就可以按需来实现

Android WatchDog

Android系统也存在WatchDog，主要用于监控systemserver内部各服务线程的运行情况，systemserver在初始化启动服务时，会完成WatchDog的初始化配置和启动

private void startOtherServices() {{    ...    final Watchdog watchdog = Watchdog.getInstance();    watchdog.init(context, mActivityManagerService);    ...    mActivityManagerService.systemReady(new Runnable() {        .....        Watchdog.getInstance().start();    })}

先调用init初始化，然后在AMS.systemReady完成后，启动WatchDog，那怎么往WatchDog配置监控线程或回调呢？直接拿AMS的配置代码举例：

public ActivityManagerService(Context systemContext) {   ...   Watchdog.getInstance().addMonitor(this);   Watchdog.getInstance().addThread(mHandler);}

在构造函数结束前，添加了监控回调和与监控线程绑定的handler

WatchDog初始化

接着从代码来分析，先看WatchDog的构造函数

public class Watchdog extends Thread {    private Watchdog() {        super("watchdog");        // Initialize handler checkers for each common thread we want to check.  Note        // that we are not currently checking the background thread, since it can        // potentially hold longer running operations with no guarantees about the timeliness        // of operations there.        // The shared foreground thread is the main checker.  It is where we        // will also dispatch monitor checks and do other work.        mMonitorChecker = new HandlerChecker(FgThread.getHandler(),                "foreground thread", DEFAULT_TIMEOUT);        mHandlerCheckers.add(mMonitorChecker);        // Add checker for main thread.  We only do a quick check since there        // can be UI running on the thread.        mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),                "main thread", DEFAULT_TIMEOUT));        // Add checker for shared UI thread.        mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),                "ui thread", DEFAULT_TIMEOUT));        // And also check IO thread.        mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),                "i/o thread", DEFAULT_TIMEOUT));        // And the display thread.        mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),                "display thread", DEFAULT_TIMEOUT));    }    ...}

WatchDog派生自Thread，在构造时，主要初始化

mMonitorChecker - monitor监控回调执行线程绑定的HandlerChecker
mHandlerCheckers - 初始化预置的HandlerCheckers

HandlerChecker实现了对Handler绑定线程执行超时做监控，超时时间可在构造时配置，这个是默认行为，基于Android Handler Looper机制来实现的

除了默认行为，我们还可以通过设置HandlerChecker的monitor回调，来添加自定义的监控行为

WatchDog的monitor回调会被统一保存到mMonitorChecker

HandlerChecker介绍

HandlerChecker的核心实现介绍：

post message到message queue的头部

        public void scheduleCheckLocked() {            //monitor回调为空并且looper是空闲的，状态置为完成直接返回            if (mMonitors.size() == 0 && mHandler.getLooper().isIdling()) {                // If the target looper is or just recently was idling, then                // there is no reason to enqueue our checker on it since that                // is as good as it not being deadlocked.  This avoid having                // to do a context switch to check the thread.  Note that we                // only do this if mCheckReboot is false and we have no                // monitors, since those would need to be executed at this point.                mCompleted = true;                return;            }            if (!mCompleted) {                // we already have a check in flight, so no need                return;            }            mCompleted = false;            mCurrentMonitor = null;            mStartTime = SystemClock.uptimeMillis();            //往头部插入message            mHandler.postAtFrontOfQueue(this);        }

message关联runnable被执行

        public void run() {            final int size = mMonitors.size();            //执行monitor回调            for (int i = 0 ; i < size ; i++) {                synchronized (Watchdog.this) {                    mCurrentMonitor = mMonitors.get(i);                }                mCurrentMonitor.monitor();            }            //设置执行完成状态            synchronized (Watchdog.this) {                mCompleted = true;                mCurrentMonitor = null;            }        }

获取执行状态

        public int getCompletionStateLocked() {            if (mCompleted) {                return COMPLETED;            } else {                long latency = SystemClock.uptimeMillis() - mStartTime;                if (latency < mWaitMax/2) {                    return WAITING;                } else if (latency < mWaitMax) {                    return WAITED_HALF;                }            }            return OVERDUE;        }

从上面的代码可以看出，在scheduleCheckLocked()被调用后，能够影响HandlerChecker状态置为COMPLETED就两点

handlerchecker关联的线程阻塞，导致post message关联runnable在超时时间内没被执行
runnable执行了，并配置了monitor回调，monitor回调执行超时了

WatchDog检测逻辑介绍

上头说了，WatchDog自身就是一条线程，在线程启动后触发检测，直接看代码吧

    @Override    public void run() {        boolean waitedHalf = false;        while (true) {            final ArrayList blockedCheckers;            final String subject;            final boolean allowRestart;            int debuggerWasConnected = 0;            synchronized (this) {                //检测间隔，默认半分钟                long timeout = CHECK_INTERVAL;                // Make sure we (re)spin the checkers that have become idle within                // this wait-and-check interval                //遍历handlerchecker，依次触发检测                for (int i=0; i 0) {                    debuggerWasConnected--;                }                // NOTE: We use uptimeMillis() here because we do not want to increment the time we                // wait while asleep. If the device is asleep then the thing that we are waiting                // to timeout on is asleep as well and won't have a chance to run, causing a false                // positive on when to kill things.                long start = SystemClock.uptimeMillis();                while (timeout > 0) {                    if (Debug.isDebuggerConnected()) {                        debuggerWasConnected = 2;                    }                    try {                        //线程等待                        wait(timeout);                    } catch (InterruptedException e) {                        Log.wtf(TAG, e);                    }                    if (Debug.isDebuggerConnected()) {                        debuggerWasConnected = 2;                    }                    timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);                }                final int waitState = evaluateCheckerCompletionLocked();                if (waitState == COMPLETED) {                    // The monitors have returned; reset                    waitedHalf = false;                    continue;                } else if (waitState == WAITING) {                    // still waiting but within their configured intervals; back off and recheck                    continue;                } else if (waitState == WAITED_HALF) {                    if (!waitedHalf) {                        // We've waited half the deadlock-detection interval.  Pull a stack                        // trace and wait another half.                        ArrayList pids = new ArrayList();                        pids.add(Process.myPid());                        ActivityManagerService.dumpStackTraces(true, pids, null, null,                                NATIVE_STACKS_OF_INTEREST);                        waitedHalf = true;                    }                    continue;                }                // 超时了                blockedCheckers = getBlockedCheckersLocked();                subject = describeCheckersLocked(blockedCheckers);                allowRestart = mAllowRestart;            }            // If we got here, that means that the system is most likely hung.            // First collect stack traces from all threads of the system process.            // Then kill this process so that the system will restart.            EventLog.writeEvent(EventLogTags.WATCHDOG, subject);            ArrayList pids = new ArrayList();            pids.add(Process.myPid());            if (mPhonePid > 0) pids.add(mPhonePid);            // Pass !waitedHalf so that just in case we somehow wind up here without having            // dumped the halfway stacks, we properly re-initialize the trace file.            final File stack = ActivityManagerService.dumpStackTraces(                    !waitedHalf, pids, null, null, NATIVE_STACKS_OF_INTEREST);            // Give some extra time to make sure the stack traces get written.            // The system's been hanging for a minute, another second or two won't hurt much.            SystemClock.sleep(2000);            // Pull our own kernel thread stacks as well if we're configured for that            if (RECORD_KERNEL_THREADS) {                dumpKernelStackTraces();            }            // Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel log            doSysRq('w');            doSysRq('l');            // Try to add the error to the dropbox, but assuming that the ActivityManager            // itself may be deadlocked.  (which has happened, causing this statement to            // deadlock and the watchdog as a whole to be ineffective)            Thread dropboxThread = new Thread("watchdogWriteToDropbox") {                    public void run() {                        mActivity.addErrorToDropBox(                                "watchdog", null, "system_server", null, null,                                subject, null, stack, null);                    }                };            dropboxThread.start();            try {                dropboxThread.join(2000);  // wait up to 2 seconds for it to return.            } catch (InterruptedException ignored) {}            IActivityController controller;            synchronized (this) {                controller = mController;            }            if (controller != null) {                Slog.i(TAG, "Reporting stuck state to activity controller");                try {                    Binder.setDumpDisabled("Service dumps disabled due to hung system process.");                    // 1 = keep waiting, -1 = kill system                    int res = controller.systemNotResponding(subject);                    if (res >= 0) {                        Slog.i(TAG, "Activity controller requested to coninue to wait");                        waitedHalf = false;                        continue;                    }                } catch (RemoteException e) {                }            }            // Only kill the process if the debugger is not attached.            if (Debug.isDebuggerConnected()) {                debuggerWasConnected = 2;            }            if (debuggerWasConnected >= 2) {                Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");            } else if (debuggerWasConnected > 0) {                Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");            } else if (!allowRestart) {                Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");            } else {                Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);                for (int i=0; i

   从代码可以很明显的看出整个逻辑
       通过无限循环来达到重复检测
    在每次检测前，遍历所有的HandlerChecker并调用scheduleCheckLocked
    通过调用wait函数并设置超时时间来使线程挂起一段时间
    超时后线程继续执行，通过调用evaluateCheckerCompletionLocked获取各个HandlerChecker的最终执行状态，如果返回overdue，说明存在未完成的情况
    通过调用ActivityManagerService.dumpStackTraces保存堆栈信息
    通过mActivity.addErrorToDropBox将错误日志保存到dropbox
    通过Process.killProcess(Process.myPid())和System.exit(10)杀死system server进程，从而触发Android设备的软重启
   
   参考文献
   Android7.0 Watchdog机制




更多相关文章	

android使用JSBridge机制原理
Android(安卓)网络通信——Volley
Android：常用设定延时的方法
Android控件系列之ProgressBar&在Android中利用Handler处理多线
重拾Android
android开发系列之消息机制
android studio导入Xutils
Android进程调度
启动一个没有界面的Activity

随机推荐	


Android的MediaPlayer架构介绍

android操作sdcard中的多媒体文件（一）——

Android简明开发教程一：概述

Android中的控件

adroid风格和主题

android系统架构解析

Android中LocationManager的简单使用,获

在Android中使用Handler和Thread线程执行

Android(安卓)ADB常用命令以及环境配置

Android优秀开源项目