Android(安卓)Watchdog框架看门狗解析、死锁应用与改造(下)
接着上一篇WTD的介绍 ,看下实际死锁情况下,WTD的功能与改造。
最近遇见Android开机一直停留在动画界面,查看trace文件发现死锁了,简要信息如下:
"main" prio=5 tid=1 MONITOR | group="main" sCount=1 dsCount=0 obj=0x4c20f360 self=0x71e1ade0 | sysTid=519 nice=-2 sched=0/0 cgrp=apps handle=1878216768 | state=S schedstat=( 736667963 56924727 1529 ) utm=62 stm=11 core=0 at com.android.server.am.ActivityManagerService.registerReceiver(ActivityManagerService.java:~13326) - waiting to lock <0x4c6b2630> (a com.android.server.am.ActivityManagerService) held by tid=27 (InputDispatcher) at android.app.ContextImpl.registerReceiverInternal(ContextImpl.java:1473) at android.app.ContextImpl.registerReceiver(ContextImpl.java:1441) at com.android.server.power.PowerManagerService.systemReady(PowerManagerService.java:494) at com.android.server.ServerThread.initAndLoop(SystemServer.java:1050) at com.android.server.SystemServer.main(SystemServer.java:1371) at java.lang.reflect.Method.invokeNative(Native Method) at java.lang.reflect.Method.invoke(Method.java:515) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:794) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:610) at dalvik.system.NativeStart.main(Native Method)"InputDispatcher" prio=10 tid=27 MONITOR | group="main" sCount=1 dsCount=0 obj=0x4c9c7d60 self=0x72010e50 | sysTid=554 nice=-8 sched=0/0 cgrp=apps handle=1912287104 | state=S schedstat=( 1007065539 96683590 71214 ) utm=22 stm=78 core=0 at com.android.server.power.PowerManagerService.setScreenBrightnessOverrideFromWindowManagerInternal(PowerManagerService.java:~2206) - waiting to lock <0x4c6a8af0> (a java.lang.Object) held by tid=1 (main) at com.android.server.power.PowerManagerService.setScreenBrightnessOverrideFromWindowManager(PowerManagerService.java:2199) at com.android.server.wm.WindowManagerService.performLayoutAndPlaceSurfacesLockedInner(WindowManagerService.java:9818) at com.android.server.wm.WindowManagerService.performLayoutAndPlaceSurfacesLockedLoop(WindowManagerService.java:8566) at com.android.server.wm.WindowManagerService.performLayoutAndPlaceSurfacesLocked(WindowManagerService.java:8508) at com.android.server.wm.WindowManagerService.setNewConfiguration(WindowManagerService.java:3847) at com.android.server.am.ActivityManagerService.updateConfigurationLocked(ActivityManagerService.java:14490) at com.android.server.am.ActivityManagerService.updateConfiguration(ActivityManagerService.java:14375) at com.android.server.wm.WindowManagerService.sendNewConfiguration(WindowManagerService.java:6725) at com.android.server.wm.InputMonitor.notifyConfigurationChanged(InputMonitor.java:325) at com.android.server.input.InputManagerService.notifyConfigurationChanged(InputManagerService.java:1275) at dalvik.system.NativeStart.run(Native Method)
trace很清楚的说明了main、InputDispatcher线程发生互相的死锁。从栈信息函数调用上可以看出两个线程都都用了AMS、PMS服务,从上一篇分析来看,AMS、PMS都是已经添加到WTD中进行检测的,为何服务发生死锁了,WTD没有检测到?
回到上一篇看一下有关AMS、PMS的启动流程,还有WTD的启动时间点,如下:
public void initAndLoop() { try { // Wait for installd to finished starting up so that it has a chance to // create critical directories such as /data/user with the appropriate // permissions. We need this to complete before we initialize other services. Slog.i(TAG, "Waiting for installd to be ready."); installer = new Installer(); installer.ping(); Slog.i(TAG, "Power Manager"); power = new PowerManagerService(); ServiceManager.addService(Context.POWER_SERVICE, power); Slog.i(TAG, "Activity Manager"); context = ActivityManagerService.main(factoryTest); } catch (RuntimeException e) { Slog.e("System", "******************************************"); Slog.e("System", "************ Failure starting bootstrap service", e); } // only initialize the power service after we have started the // lights service, content providers and the battery service. power.init(context, lights, ActivityManagerService.self(), battery, BatteryStatsService.getService(), ActivityManagerService.self().getAppOpsService(), display); Slog.i(TAG, "Init Watchdog"); Watchdog.getInstance().init(context, battery, power, alarm, ActivityManagerService.self()); Watchdog.getInstance().addThread(wmHandler, "WindowManager thread"); try { <span style="color:#ff0000;">power.systemReady(twilight, dreamy);</span> } catch (Throwable e) { reportWtf("making Power Manager Service ready", e); } ActivityManagerService.self().systemReady(new Runnable() { public void run() { <span style="color:#cc0000;">Watchdog.getInstance().start();</span>从systemserver.java文件上可以看到WTD线程的启动是在很多service注册之后才启动的,那么如果service注册过程死锁,WTD就没法启动检测了。所以上面trace死锁问题的原因就找到了,接下来想办法如何解决这个问题。我大致觉得办法有三,如下:
一. 提前WTD的运行,即在实例化后马上运行,这样当出现上诉死锁时,WTD将能够检测到并杀死死锁线程
二. 在AMS、PMS中设置ReentrantLock互斥锁,按照trace死锁的位置,设定函数访问互斥锁,当PMS systemready函数持有锁时,setScreenBrightnessOverrideFromWindowManager不去申请锁,访问死锁
三. 服务注册过程中禁止InputManagerService.notifyConfigurationChanged,这种做法我觉得没有办法二恰当,出现这个死锁是因为系统挂着USB输入设备,USB是热插拔设备,注册时间上是不可控的,也就导致了上诉的死锁。
重点说明方法一方法,加速WTD的运行。以下patch就是提前WTD运行的思路。结合WTD源码分析,加速WTD的运行首先要考虑这样做系统的稳定性。尤其是提前的WTD的运行,是否影响后续服务的WTD使用,以及WTD在此过程中,资源的访问是否存在问题。
--- a/frameworks/base/services/java/com/android/server/SystemServer.java+++ b/frameworks/base/services/java/com/android/server/SystemServer.java@@ -351,7 +351,9 @@ class ServerThread { Watchdog.getInstance().init(context, battery, power, alarm, ActivityManagerService.self()); Watchdog.getInstance().addThread(wmHandler, "WindowManager thread");-+ Watchdog.getInstance().start();+ Slog.i(TAG, "Input Manager");@@ -1165,8 +1167,8 @@ class ServerThread { } catch (Throwable e) { reportWtf("making Recognition Service ready", e); }- Watchdog.getInstance().start();-+ //Watchdog.getInstance().start(); // It is now okay to let the various system services start their // third party code...
针对以上问题综合分析,我认为这个过程存在的问题是可以避免的,只是在上诉patch的基础上,需要对watchdog.java文件进行一些额外处理。在此制作简单描述,实现起来比较简单。
1. 取消addMonitor、addThread函数接口中对线程状态的判断,否则WTD启动后不能添加监视器到WTD中
2. WTD启动后,run函数和addMonitor、addThread存在锁竞争,而run函数的执行周期很长,在系统启动过程中需要调节run函数的执行周期
按照上诉注意事项对WTD进行启动时序改造后,系统可以正常运行,WTD运行正常,我进行reboot测试一千次,暂无影响
更多相关文章
- android在一个程序中启动另一个程序
- Android之解决开启热点后跳转页面不稳定问题
- android webView调用js函数的几种方法
- android IPC通信机制中BBinder与BpBinder的区别
- 如何解决Android(安卓)5.0中出现的警告:Service Intent must be e
- Android(安卓)Activity 的详细启动过程分析
- Android(安卓)实现微信,QQ的程序前后台切换:back键切换后台;点击通
- Activity的启动模式和onNewIntent
- Android的存储系统—Vold与MountService分析(三)