Help tracking down kernel panic... - RAZR HD Q&A, Help & Troubleshooting

Help tracking down kernel panic... - RAZR HD Q&A, Help & Troubleshooting

Hi Everyone,
Was hoping I could ask a few people to help me out here. Since Feb 28th(ish), I get random kernel panics on CM11 or any ROMs based on the unified moto8960 tree.
To help me out to see if it's just me or a wider problem, all you need to do is install this app:
https://play.google.com/store/apps/details?id=se.mumu21.bootlog
Which will keep track of your uptime. It needs root privileges in order to copy last_kmsg and trying to view from the logs will FC the app. This is fine, it still copies the logs which you can then view/etc from your filemanager (default is to save to internal sd card)
Once the app is installed, you need not do anything more other than grant it root privs (head into preferences of the app and enable all 4 check marks) -- If your phone reboots, there'll be a record and can capture the last_kmsg file and hopefully push this up to the CM team, if there's an issue.
The app is smart enough to detect between user reboot and unexpected reboots, you'll see it listed as "crash" in the logs.
Thanks to anyone who decides to help out!
EDIT:
For reference, I'm including links to my last_kmsg in the two times I caught it on two separate builds (same error)
http://forum.xda-developers.com/showpost.php?p=51000973&postcount=1567
http://forum.xda-developers.com/showpost.php?p=50707021&postcount=1477
End of the day, for me it appears to be here:
Code:
[11746.588339,0] PM: Preparing system for mem sleep
[11746.588522,0] pm_debug: suspend uah=154646
[11746.591849,0] Freezing user space processes ...
[11746.596732,1] msm_server_control: wait_event error -512 for command10
[11746.596855,1] msm_open send open server failed
[11746.624903,0] ov8820_power_down
[11746.670164,0] msm_open: destroy ion client(elapsed 0.08 seconds) done.
[11746.924399,0] Freezing remaining freezable tasks ...
[11746.925681,0] active wake lock alarm_rtc, time left 73
[11746.926291,0]
[11746.926536,0] Freezing of tasks aborted
[11746.930625,0]
[11746.930869,0] Restarting tasks ...
[11746.949212,0] Unable to handle kernel NULL pointer dereference at virtual address 00000000
Can see logging around going into mem sleep plenty of times before, this time it fails. On AOSP builds, the error also happens (moto_msm8960) but the error is not with msm_server_control but rather:
Code:
[ 5630.051304,1] PM: Preparing system for mem sleep
[ 5630.051548,1] pm_debug: suspend uah=106686
[ 5678.901571,0] [0:986:E :DAT] Set DXE Power state 0
[ 5680.681793,0] [0:986:E :DAT] Set DXE Power state 2
[ 5680.689881,0] [WLAN][987:E :TL ] WLAN TL:No station registered with TL at this point
[ 5690.081214,1] **** Suspend timeout
[ 5690.081366,1] kworker/u:16 D c08131e0 0 22912 2 0x00000000
[ 5690.081671,1] [<c08131e0>] (__schedule+0x780/0x90c) from [<c081457c>] (__mutex_lock_slowpath+0x178/0x1e4)
[ 5690.081824,1] [<c081457c>] (__mutex_lock_slowpath+0x178/0x1e4) from [<c081463c>] (mutex_lock+0x54/0x6c)
[ 5690.082007,1] [<c081463c>] (mutex_lock+0x54/0x6c) from [<c0172480>] (cpu_hotplug_disable_before_freeze+0x8/0x20)
[ 5690.082190,1] [<c0172480>] (cpu_hotplug_disable_before_freeze+0x8/0x20) from [<c01724e0>] (cpu_hotplug_pm_callback+0x28/0x40)
[ 5690.082373,1] [<c01724e0>] (cpu_hotplug_pm_callback+0x28/0x40) from [<c08181c8>] (notifier_call_chain+0x38/0x68)
[ 5690.082526,1] [<c08181c8>] (notifier_call_chain+0x38/0x68) from [<c0193a1c>] (__blocking_notifier_call_chain+0x40/0x54)
[ 5690.082709,1] [<c0193a1c>] (__blocking_notifier_call_chain+0x40/0x54) from [<c0193a44>] (blocking_notifier_call_chain+0x14/0x18)
[ 5690.082892,1] [<c0193a44>] (blocking_notifier_call_chain+0x14/0x18) from [<c01a70d8>] (pm_notifier_call_chain+0x14/0x2c)
[ 5690.083075,1] [<c01a70d8>] (pm_notifier_call_chain+0x14/0x2c) from [<c01a8010>] (enter_state+0x68/0x13c)
[ 5690.083167,1] [<c01a8010>] (enter_state+0x68/0x13c) from [<c01a92f0>] (suspend+0x68/0x180)
[ 5690.083350,1] [<c01a92f0>] (suspend+0x68/0x180) from [<c0189aa8>] (process_one_work+0x228/0x434)
[ 5690.083533,1] [<c0189aa8>] (process_one_work+0x228/0x434) from [<c0189e88>] (worker_thread+0x1a8/0x2c8)
[ 5690.083686,1] [<c0189e88>] (worker_thread+0x1a8/0x2c8) from [<c018e2a8>] (kthread+0x80/0x8c)
[ 5690.083869,1] [<c018e2a8>] (kthread+0x80/0x8c) from [<c0106714>] (kernel_thread_exit+0x0/0x8)
[ 5690.083991,1] Unable to handle kernel NULL pointer dereference at virtual address 00000000
Curious to see if others are hitting the same thing.

Had a random reboot a few hours ago on cm11 (March 11th). Just installed the app, let's see what we can get now.

CWGSM3VO said:
Hi Everyone,
...
Click to expand...
Click to collapse
The issue captured in the second log is long known to us.
It's caused by msm-dcvs CPU governor and despite a few already applied kernel patches, it remains problematic.
So the current recommendation is: do not use msm-dcvs CPU governor.
(for the reference, see https://github.com/razrqcom-dev-team/android_kernel_motorola_msm8960-common/issues/12 )
The issue captured by the first log is something new and I already experienced it myself today (for the first time).
It's obviously a race condition during suspend/resume of camera kernel drivers.
It needs to be investigated. At this point I have no idea which commit brought this issue to surface.

kabaldan said:
The issue captured in the second log is long known to us.
It's caused by msm-dcvs CPU governor and despite a few already applied kernel patches, it remains problematic.
So the current recommendation is: do not use msm-dcvs CPU governor.
(for the reference, see https://github.com/razrqcom-dev-team/android_kernel_motorola_msm8960-common/issues/12 )
The issue captured by the first log is something new and I already experienced it myself today (for the first time).
It's obviously a race condition during suspend/resume of camera kernel drivers.
It needs to be investigated. At this point I have no idea which commit brought this issue to surface.
Click to expand...
Click to collapse
With regards to the second capture, that makes sense and knew Epinter's commits were merged but will stay away from the governor again. And the explanation of the first is muched appreciated. I do know however, I've experienced the panic using interactive as well in effort to stay as "stock" as one can be.
I'll switch back (so far so good on 03-12) to interactive again and go from there.
Thanks again for responding.

Related

[Q] Time resets to 1970 on restart

The Nexus 5 turn to be asked this hehe
I've just now experienced this in the course of fixing another issue.
Normally in spite of how much i change ROMS whenever I set time manually and uncheck network derived time the date/ time maintains across restarts, now it doesnt.
It looks like a 4.4.4 issue? Dunno
Any help / ideas welcome.
What I did so far:
clean ROM not dirty flashed
switching between stock OMNI and Minux OMNI
all combinations of option in date/ time settings while
Each and everytime after reset the date time is 13 Aug 1970. Even if is set to automatic that date briefly appears before a connection to the net is made.

Did you find the solution?

Could be related to the RTC not storing the time. The qpnp-rtc-write device tree option controls whether the time can be set with hwclock. If set to 0 or omitted, you will not be able to set the RTC time through the standard RTC interfaces (hwclock -w). Now, apparently these Qualcomm devices have a proprietary time_daemon program which is supposed to store these values permanently. It is invoked through the TimeService.apk program which is some thin glue that calls into libTimeService.so which invokes the time_genoff_operation function. I guess that this function should ultimately update the RTC time, but have not investigated further.
TL;DR check whether your TimeService.apk is working. It should contain a classes.dex file (deodexed). For Android 5.1, note that the files from the factory images are ART-optimized. You can undo that using Riddle's oat2dex.jar tool.
Edit: apparently these Qualcomm RTCs cannot be written from software (at least, not directly). A Qualcomm consultant mentioned this during patch review:
Anirudh Ghayal (Qualcomm consultant) said:
In some of our MSM/QSD designs, we have a single RTC shared by multiple
processors (other than the ones running Linux). Thus, the need to have a
non-writable RTC using this pdata.
Click to expand...
Click to collapse
Someone already tried to patch the property in the kernel, but writing via hwclock was still not possible. I can confirm this for a Nexus 5 (rev_11, tested with this recovery image patching tool):
Code:
~ # /system/xbin/hwclock -w
hwclock: RTC_SET_TIME: Operation not permitted
<3>[ 25.821247] spmi_pmic_arb fc4cf000.qcom,spmi: pmic_arb_wait_for_done: transaction denied (0x5)
<3>[ 25.821338] qcom,qpnp-rtc qpnp-rtc-ee162000: SPMI write failed
<3>[ 25.821408] qcom,qpnp-rtc qpnp-rtc-ee162000: Write to RTC reg failed
In the end, I decided to accept this problem and be stuck with a RTC set to 1970. Note that this device has not been network-connected since I started experimenting.
As for the TimeService.apk/time_genoff/time_daemon functionality, all these components do is maintaining an offset in the /data/system/time/ats_X files. These are read on startup and applied as offset. Nast hack, and there are alternative opensource alternatives. Really, the only functional thing that this daemon does is reading the RTC time and then applying the offsets over it. It has no code to write back the actual time to the RTC. It is no magic. (Found by reverse engineering the binaries and libraries.)

Lekensteyn said:
Could be related to the RTC not storing the time. The qpnp-rtc-write device tree option controls whether the time can be set with hwclock. If set to 0 or omitted, you will not be able to set the RTC time through the standard RTC interfaces (hwclock -w). Now, apparently these Qualcomm devices have a proprietary time_daemon program which is supposed to store these values permanently. It is invoked through the TimeService.apk program which is some thin glue that calls into libTimeService.so which invokes the time_genoff_operation function. I guess that this function should ultimately update the RTC time, but have not investigated further.
TL;DR check whether your TimeService.apk is working. It should contain a classes.dex file (deodexed). For Android 5.1, note that the files from the factory images are ART-optimized. You can undo that using Riddle's oat2dex.jar tool.
Edit: apparently these Qualcomm RTCs cannot be written from software (at least, not directly). A Qualcomm consultant mentioned this during patch review:
Someone already tried to patch the property in the kernel, but writing via hwclock was still not possible. I can confirm this for a Nexus 5 (rev_11, tested with this recovery image patching tool):
Code:
~ # /system/xbin/hwclock -w
hwclock: RTC_SET_TIME: Operation not permitted
<3>[ 25.821247] spmi_pmic_arb fc4cf000.qcom,spmi: pmic_arb_wait_for_done: transaction denied (0x5)
<3>[ 25.821338] qcom,qpnp-rtc qpnp-rtc-ee162000: SPMI write failed
<3>[ 25.821408] qcom,qpnp-rtc qpnp-rtc-ee162000: Write to RTC reg failed
In the end, I decided to accept this problem and be stuck with a RTC set to 1970. Note that this device has not been network-connected since I started experimenting.
As for the TimeService.apk/time_genoff/time_daemon functionality, all these components do is maintaining an offset in the /data/system/time/ats_X files. These are read on startup and applied as offset. Nast hack, and there are alternative opensource alternatives. Really, the only functional thing that this daemon does is reading the RTC time and then applying the offsets over it. It has no code to write back the actual time to the RTC. It is no magic. (Found by reverse engineering the binaries and libraries.)
Click to expand...
Click to collapse
Hi, great explanation.. I didn't have this issue in stock Marshmallow, but now I have it in custom ROM.. maybe it's something related to some permissions?

bart.found said:
Hi, great explanation.. I didn't have this issue in stock Marshmallow, but now I have it in custom ROM.. maybe it's something related to some permissions?
Click to expand...
Click to collapse
There is a SELinux policy that must allow access, you can check your kernel logs (dmesg or maybe even logcat) for SELinux denials. The /data/system/time/ location must also be accessible for the time daemon.

Battery and touch issues with Android compiled from Nvidia source

I compiled Android from the source Nvidia made available on their website and, while the process went smoothly, after flashing the images to the device I experienced a large number of issues:
1. The tablet does not go to sleep (shown as Awake in the Battery settings) while not plugged in, leading to fast battery drain. If plugged in, the device is shown as not awake.
2. While not connected to Wifi, the screen would become completely unresponsive to touch input, after turning on the device (if it previously was on stand-by).
3. A lot of error messages of missing libraries and general files are shown in Logcat. A few examples:
E/phs:governor(14217): failed to load "libgov_gpucompute.so": No such file or directory
E/SoundPool(913): error loading /system/media/audio/ui/Lock.ogg
E/ConnectivityService(834): Unexpected mtu value: [email protected]
E/TLK_Daemon(176): IOCTL_FILE_NEW_REQ returned 61
E/wpa_supplicant(1435): wpa_driver_nl80211_driver_cmd: failed to issue private commands
I did scavange libgov_gpucompute.so from the factory image, but that did not fix any of the symptoms; the error message disappeared.
I built every version Nvidia made available and had the same results.
The recovery image does not suffer from these issues.
Any suggestion on how to get a handle on this would be greatly appreciated.

[Troubleshooting]No more random Freezes & Reboot

Many of you (and I) experienced random freezes & reboots but found no solutions, even after flashing so many different ROMs.
Hopefully I've found the cause & the SOLUTION.
I've been flashing custom ROM from the first time I own a gadget but recently after I flash Gnabo 8 I experienced something strange: Random Freezes & Reboots.
In my case, random reboots occur after 31 hours of usage.
After I analyzed the problem I concluded that the cause is rogue apps running in the background & hogging the CPU to an extend that the device can't be used anymore.
Then the freezes & reboots occur.
After 1 month of analyzing further, I found 3 processes in my N8000 that become the Prime Suspects:
System Server
dhd_dcp
kswapd0
I consulted to Andi (Lord Boeffla) & Mikyno about this before eliminating the PRIME SUSPECTS to just 1 rogue app.
My knowledge in Linux + Google come handy in this matter & concluded that kswapd0 is the cause(?)
kswapd0 is responsible for creating swap file a.k.a virtual memory in case the real RAM is not enough.
In Linux, there is a separate partition that is dedicated for virtual memory a.k.a swap partition but in android, specially our N80XX, I can't find any swap partition.
How come kswapd0 active & consume so much CPU time & causing freezes+reboots where there is no swap partition?
Further researches bring me these information:
In my current ROM (ARHD 21.0) my sysctl.conf is empty, BUT...
using Android Tuner, I find that vm.swappiness is 130!!
vm.swappiness is a setting value to control kswapd0 on when to write the RAM content into disk/internal memory.
vm.swappiness=0 is to tell to avoid writing to disk/internal memory as much as possible.
vm.swappiness=100 is to tell to write to disk/internal memory as much as possible (avoiding RAM).
I don't know how the sysctl.conf is empty but the vm.swappiness=130 is set, but I modified it to vm.swappiness=0.
Then I reboot my N8000.
Now my N8000 is running without incidents for 3 days & 20 hours straight.
I hope this is the final solution I wanted & I also hope that this information can help others with the same problem too.

Hi Darkknight,
after some amount of time now, can you tell me something about "long term experience"? Has this solved your rebooting issues or do they come back after a few days?
Thanks in advace!

How to determine the cause for unexpected phone shutdown?

Hi all. I have Nokia 8.1 for a few months now. Updated to Android 10 immediately and it has been rock stable. Until today.
At about 11am I heard an incoming message notification but was busy and decided to look at it later.
At 11:55am I picked the phone but could not wake it up - tapping didn't work, fingerprint didn't work. Black screen. So I long-pressed the power button and it booted up.
It is working fine since but I'm somewhat concerned about the issue.
I collected a zip log file through "adb bugreport" command.
As I understand, it's tricky to find the cause after the reboot because all log files are generated fresh after boot. I looked in \FS\data\tombstones folders and found three records there, all complaining about the following issue:
Code:
signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
Abort message: 'FORTIFY: %n not allowed on Android'
But they are almost a month old and I don't remember anything weird happening at that time; except maybe some Android system update and manual reboot.
Is there any place in that log zip or somewhere else on the device (without root) where I could find some info about what happened right before the unexpected shutdown? Kernel panics? Crashes?

Hmm, nobody? Does it mean that everybody else is just ignoring random shutdowns and have never attempted to debug the problem?

AFAIK logcat output isn't persisted over reboot, and maybe that's one reason nobody replied to this thread. But you can always write log to a file every time your phone boots (via shell commands etc) and just check out that file once phone rebooted.
Regarding kernel log, /proc/last_kmsg or /sys/fs/pstore/console-ramoops might be helpful.

Android, badblocks and fixing a defective raw storage

Hi, I need a technical advice before putting my LG G6 H870 to the garbage ...
Since a few weeks, I started getting some random app crashes and a few boot-loops while running an updated stock rom.
A factory reset leads to the same way and even a reset to refurbished stock rom version raises the same errors: so the issue comes from the hardware.
After installing TWRP and formatting system partition I tried to identify if it was an internal storage issue (and spoiler alert: it was :crying.
I knew that I should be able to identify bad blocks to tell the filesystem to not used them, but the problem if the use of a Flash Translation Layer (FTL) by the hardware that is obfuscating real nand memory addresses by design.
If i run a one-pass badblocks:
Code:
/dev/block/platform/soc/624000.ufshc/by-name # badblocks -wsvt 0xFF system
Checking for bad blocks in read-write mode
From block 0 to 5726207
Testing with pattern 0xff: done
Reading and comparing: 103464% done, 0:47 elapsed
103466
103468
103470
288888
288890
288892
288894
3197580 done, 0:54 elapsed
3197582
3197584
3197586
3358788
3358790
3358792
3358794
done
Pass completed, 16 bad blocks found.
And If I run a 4-pass badblocks:
Code:
/dev/block/platform/soc/624000.ufshc/by-name # badblocks -wsv system
Checking for bad blocks in read-write mode
From block 0 to 5726207
Testing with pattern 0xaa: done
Reading and comparing: 110968% done, 0:47 elapsed
110970
110972
110974
303396
303398
303400
303402
3205312 done, 0:54 elapsed
3205314
3205316
3205318
3351108
3351110
3351112
3351114
done
Testing with pattern 0x55: done
Reading and comparing: 109356% done, 1:48 elapsed
109358
109360
109362
306392
306394
306396
306398
3206852 done, 1:55 elapsed
3206854
3206856
3206858
3353208
3353210
3353212
3353214
done
Testing with pattern 0xff: done
Reading and comparing: 110848% done, 2:48 elapsed
110850
110852
110854
304640
304642
304644
304646
3206292 done, 2:55 elapsed
3206294
3206296
3206298
3353228
3353230
3353232
3353234
done
Testing with pattern 0x00: done
Reading and comparing: 111992% done, 3:48 elapsed
111994
111996
111998
304668
304670
304672
304674
3215112 done, 3:55 elapsed
3215114
3215116
3215118
3353828
3353830
3353832
3353834
done
Pass completed, 64 bad blocks found.
Oh surprise (no it's not), bad blocks are never the sames because of the FTL, and that's probably why the badblocks binary is not a part of the standard recovery binaries list.
But since the hardware via the FTL should take care of bad sectors, why it's not the case for my LG G6 ?
Is there a way to enforce this check at low-level ?
My readings before posting:
https://android.stackexchange.com/q...d-sector-on-my-android-with-adb-recovery-mode
https://android.stackexchange.com/q...-bad-ram-addresses-and-fix-bad-storage-blocks

Database Info

welcome