問(wèn)題描述
項(xiàng)目
我正在使用 Java 本機(jī)接口編寫一個(gè) Java 命令行接口到一個(gè)內(nèi)部網(wǎng)絡(luò)和網(wǎng)絡(luò)測(cè)試工具的 C 庫(kù).C 代碼(不是我寫的)復(fù)雜且低級(jí),通常在位級(jí)別操作內(nèi)存,并且專門使用原始套接字.該應(yīng)用程序是多線程的,從 C 端(pthread 在后臺(tái)運(yùn)行)以及 Java 端(ScheduledThreadPoolExecutors 運(yùn)行調(diào)用本機(jī)代碼的線程).也就是說(shuō),C 庫(kù)應(yīng)該大部分是穩(wěn)定的.事實(shí)證明,Java 和 JNI 接口代碼會(huì)導(dǎo)致問(wèn)題.
問(wèn)題
應(yīng)用程序在進(jìn)入原生 C 函數(shù)時(shí)因分段錯(cuò)誤而崩潰.這只發(fā)生在程序處于特定狀態(tài)時(shí)(即成功運(yùn)行特定的本機(jī)函數(shù)會(huì)導(dǎo)致對(duì)另一個(gè)特定本機(jī)函數(shù)的下一次調(diào)用出現(xiàn)段錯(cuò)誤).此外,當(dāng)發(fā)出 quit
命令時(shí),應(yīng)用程序會(huì)因類似的段錯(cuò)誤而崩潰,但同樣是在成功運(yùn)行相同的特定本機(jī)函數(shù)之后.
我是一名缺乏經(jīng)驗(yàn)的 C 開(kāi)發(fā)人員和一名經(jīng)驗(yàn)豐富的 Java 開(kāi)發(fā)人員——我已經(jīng)習(xí)慣了給我一個(gè)特定原因和特定行號(hào)的崩潰.在這種情況下,我所要做的只是 hs_err_pid*.log
輸出和核心轉(zhuǎn)儲(chǔ).我已經(jīng)在這個(gè)問(wèn)題的末尾包含了我能做的.
到目前為止我的工作
- 當(dāng)然,我想找到發(fā)生崩潰的特定代碼行.我在 Java 端的本機(jī)調(diào)用之前放置了一個(gè)
System.out.println()
并在程序崩潰的本機(jī)函數(shù)的第一行放置了一個(gè)printf()
確保之后直接使用fflush(stdout)
.System.out
調(diào)用運(yùn)行而 ?code>printf 調(diào)用沒(méi)有運(yùn)行.這告訴我在進(jìn)入函數(shù)時(shí)發(fā)生了段錯(cuò)誤——這是我以前從未見(jiàn)過(guò)的. - 我對(duì)函數(shù)的參數(shù)進(jìn)行了三次檢查,以確保它們不會(huì)起作用.但是,我只傳遞了一個(gè)參數(shù)(類型為
jint
).其他兩個(gè)(JNIEnv *env, jobject j_object
)是 JNI 構(gòu)造,不受我控制. - 我注釋掉了函數(shù)中的每一行,最后只留下一個(gè)
return 0;
.段錯(cuò)誤仍然發(fā)生.這讓我相信問(wèn)題不在于這個(gè)函數(shù). - 我以不同的順序運(yùn)行命令(以不同的順序有效地運(yùn)行本機(jī)函數(shù)).只有在崩潰函數(shù)調(diào)用之前運(yùn)行一個(gè)特定的本機(jī)函數(shù)時(shí),才會(huì)發(fā)生段錯(cuò)誤.此特定函數(shù)在運(yùn)行時(shí)似乎表現(xiàn)正常.
- 我將
env
指針的值和&j_object
的值打印在另一個(gè)函數(shù)的末尾附近,以確保我沒(méi)有以某種方式破壞它們.我不知道我是否損壞了它們,但在退出函數(shù)時(shí)它們都有非零值. - 編輯 1: 通常,相同的函數(shù)在多個(gè)線程中運(yùn)行(通常不是并發(fā)的,但它應(yīng)該是線程安全的).我在沒(méi)有任何其他線程處于活動(dòng)狀態(tài)的情況下從主線程運(yùn)行該函數(shù),以確保 Java 端的多線程不會(huì)導(dǎo)致問(wèn)題.不是,我遇到了同樣的段錯(cuò)誤.
所有這些都讓我感到困惑.如果我注釋掉整個(gè)函數(shù),為什么它仍然是段錯(cuò)誤,除了 return 語(yǔ)句?如果問(wèn)題出在這個(gè)其他功能上,為什么它不會(huì)在那里失敗?如果是第一個(gè)函數(shù)弄亂了內(nèi)存,而第二個(gè)函數(shù)非法訪問(wèn)了損壞的內(nèi)存的問(wèn)題,為什么不失敗就在非法訪問(wèn)的行上,而不是在進(jìn)入函數(shù)時(shí)?
如果您看到一篇互聯(lián)網(wǎng)文章,其中有人解釋了與我類似的問(wèn)題,請(qǐng)發(fā)表評(píng)論.有很多segfault文章,似乎沒(méi)有一個(gè)包含這個(gè)特定問(wèn)題.SO問(wèn)題同上.問(wèn)題也可能是我沒(méi)有足夠的經(jīng)驗(yàn)來(lái)為這個(gè)問(wèn)題應(yīng)用抽象的解決方案.
我的問(wèn)題
什么會(huì)導(dǎo)致 Java 原生函數(shù)(在 C 中)在這樣的輸入時(shí)出現(xiàn)段錯(cuò)誤?我可以尋找哪些具體的東西來(lái)幫助我解決這個(gè)錯(cuò)誤?我以后如何編寫代碼來(lái)幫助我避免這個(gè)問(wèn)題?
有用的信息
為了記錄,我實(shí)際上無(wú)法發(fā)布代碼.如果您認(rèn)為對(duì)代碼的描述會(huì)有所幫助,請(qǐng)發(fā)表評(píng)論,我會(huì)對(duì)其進(jìn)行編輯.
錯(cuò)誤信息
<代碼>## Java 運(yùn)行時(shí)環(huán)境檢測(cè)到一個(gè)致命錯(cuò)誤:## SIGSEGV (0xb) at pc=0x00002aaaaaf6d9c3, pid=2185, tid=1086892352## JRE 版本:6.0_21-b06# Java 虛擬機(jī):Java HotSpot(TM) 64 位服務(wù)器虛擬機(jī)(17.0-b16 混合模式 linux-amd64)# 有問(wèn)題的框架:# j path.to.my.Object.native_function_name(I)I+0## 包含更多信息的錯(cuò)誤報(bào)告文件保存為:#/path/to/hs_err_pid2185.log## 如果您想提交錯(cuò)誤報(bào)告,請(qǐng)?jiān)L問(wèn):# http://java.sun.com/webapps/bugreport/crash.jsp# 崩潰發(fā)生在 Java 虛擬機(jī)之外的本地代碼中.# 查看有問(wèn)題的框架以了解報(bào)告錯(cuò)誤的位置.#
hs_err_pid*.log
文件的重要部分
--------------- T H R E A D ---------------當(dāng)前線程 (0x000000004fd13800): JavaThread "pool-1-thread-1" [_thread_in_native, id=2198, stack(0x0000000040b8a000,0x0000000040c8b000)]siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), si_addr=0x0000000000000000寄存器:RAX=0x34372e302e3095e1,RBX=0x00002aaaae39dcd0,RCX=0x0000000000000000,RDX=0x0000000000000000RSP=0x0000000040c89870,RBP=0x0000000040c898c0,RSI=0x0000000040c898e8,RDI=0x000000004fd139c8R8 =0x000000004fb631f0,R9 =0x000000004faf5d30,R10=0x00002aaaaaf6d999,R11=0x00002b1243b39580R12=0x00002aaaae3706d0,R13=0x00002aaaae39dcd0,R14=0x0000000040c898e8,R15=0x000000004fd13800RIP=0x00002aaaaaf6d9c3,EFL=0x0000000000010202,CSGSFS=0x0000000000000033,ERR=0x0000000000000000TRAPNO=0x000000000000000d堆棧:[0x0000000040b8a000,0x0000000040c8b000],sp=0x0000000040c89870,可用空間=3fe0000000000000018k本機(jī)幀:(J=編譯的 Java 代碼,j=解釋的,Vv=VM 代碼,C=本機(jī)代碼)j path.to.my.Object.native_function_name(I)I+0j path.to.my.Object$CustomThread.fire()V+18j path.to.my.CustomThreadSuperClass.run()V+1j java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;+4j java.util.concurrent.FutureTask$Sync.innerRun()V+30j java.util.concurrent.FutureTask.run()V+4j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V+1j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V+15j java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Ljava/lang/Runnable;)V+59j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+28j java.lang.Thread.run()V+11v ~StubRoutines::call_stubV [libjvm.so+0x3e756d]V [libjvm.so+0x5f6f59]V [libjvm.so+0x3e6e39]V [libjvm.so+0x3e6eeb]V [libjvm.so+0x476387]V [libjvm.so+0x6ee452]V [libjvm.so+0x5f80df]Java 框架:(J=編譯的 Java 代碼,j=解釋的,Vv=VM 代碼)j path.to.my.Object.native_function_name(I)I+0j path.to.my.Object$CustomThread.fire()V+18j path.to.my.CustomThreadSuperClass.run()V+1j java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;+4j java.util.concurrent.FutureTask$Sync.innerRun()V+30j java.util.concurrent.FutureTask.run()V+4j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V+1j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V+15j java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Ljava/lang/Runnable;)V+59j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+28j java.lang.Thread.run()V+11v ~StubRoutines::call_stub- - - - - - - - 過(guò)程 - - - - - - - -Java 線程:(=> 當(dāng)前線程)0x000000004fabc800 JavaThread "pool-1-thread-6" [_thread_new, id=2203, stack(0x0000000000000000,0x0000000000000000)]0x000000004fbcb000 JavaThread "pool-1-thread-5" [_thread_blocked, id=2202, stack(0x0000000042c13000,0x0000000042d14000)]0x000000004fbc9800 JavaThread "pool-1-thread-4" [_thread_blocked, id=2201, stack(0x0000000042b12000,0x0000000042c13000)]0x000000004fbc7800 JavaThread "pool-1-thread-3" [_thread_blocked, id=2200, stack(0x0000000042a11000,0x0000000042b12000)]0x000000004fc54800 JavaThread "pool-1-thread-2" [_thread_blocked, id=2199, stack(0x0000000042910000,0x0000000042a11000)]=>0x000000004fd13800 JavaThread "pool-1-thread-1" [_thread_in_native, id=2198, stack(0x0000000040b8a000,0x0000000040c8b000)]0x000000004fb04800 JavaThread低內(nèi)存檢測(cè)器"守護(hù)進(jìn)程 [_thread_blocked, id=2194, stack(0x0000000041d0d000,0x0000000041e0e000)]0x000000004fb02000 JavaThread "CompilerThread1" 守護(hù)進(jìn)程 [_thread_blocked, id=2193, stack(0x0000000041c0c000,0x0000000041d0d000)]0x000000004fafc800 JavaThreadCompilerThread0"守護(hù)進(jìn)程 [_thread_blocked, id=2192, stack(0x0000000040572000,0x0000000040673000)]0x000000004fafa800 JavaThread信號(hào)調(diào)度程序"守護(hù)進(jìn)程 [_thread_blocked, id=2191, stack(0x0000000040471000,0x0000000040572000)]0x000000004fad6000 JavaThread終結(jié)器"守護(hù)進(jìn)程 [_thread_blocked, id=2190, stack(0x0000000041119000,0x000000004121a000)]0x000000004fad4000 JavaThread引用處理程序"守護(hù)進(jìn)程 [_thread_blocked, id=2189, stack(0x0000000041018000,0x0000000041119000)]0x000000004fa51000 JavaThread "main" [_thread_in_vm, id=2186, stack(0x00000000418cc000,0x00000000419cd000)]其他主題:0x000000004facf800 VMThread [堆棧:0x0000000040f17000,0x0000000041018000] [id=2188]0x000000004fb0f000 WatcherThread [堆棧:0x0000000041e0e000,0x0000000041f0f000] [id=2195]VM 狀態(tài):不在安全點(diǎn)(正常執(zhí)行)VM Mutex/Monitor 當(dāng)前由一個(gè)線程擁有:無(wú)堆PSYoungGen 總計(jì) 305856K,已使用 31465K [0x00002aaadded0000, 0x00002aaaf3420000, 0x00002aaaf3420000)伊甸園空間 262208K,已使用 12% [0x00002aaadded0000,0x00002aaadfd8a6a8,0x00002aaaeedee0000)從空間 43648K, 0% 使用 [0x00002aaaf0980000,0x00002aaaf0980000,0x00002aaaf3420000)到空間 43648K,使用 0% [0x00002aaaedee0000,0x00002aaaedee0000,0x00002aaaf0980000)PSOldGen 總計(jì) 699072K,已使用 0K [0x00002aaab3420000, 0x00002aaadded0000, 0x00002aaadded0000)對(duì)象空間 699072K,已使用 0% [0x00002aaab3420000,0x00002aaab3420000,0x00002aaadd0000)PSPermGen 總計(jì) 21248K,已使用 3741K [0x00002aaaae020000, 0x00002aaaaf4e0000, 0x00002aaab3420000)對(duì)象空間 21248K,已使用 17% [0x00002aaaae020000,0x00002aaaae3c77c0,0x00002aaaaf4e0000)虛擬機(jī)參數(shù):jvm_args: -Xms1024m -Xmx1024m -XX:+UseParallelGC- - - - - - - - 系統(tǒng) - - - - - - - -操作系統(tǒng):Red Hat Enterprise Linux 客戶端版本 5.5 (Tikanga)uname:Linux 2.6.18-194.8.1.el5 #1 SMP Wed Jun 23 10:52:51 EDT 2010 x86_64libc:glibc 2.5 NPTL 2.5rlimit:堆棧 10240k,核心 102400k,NPROC 10000,NOFILE 1024,AS 無(wú)窮大平均負(fù)載:0.21 0.08 0.05CPU:共 1 個(gè)(每個(gè) CPU 1 個(gè)內(nèi)核,每個(gè)內(nèi)核 1 個(gè)線程)系列 6 型號(hào) 26 步進(jìn) 4、cmov、cx8、fxsr、mmx、sse、sse2、sse3、ssse3、sse4.1、sse4.2、popcnt內(nèi)存:4k 頁(yè),物理 3913532k(1537020k 空閑),交換 1494004k(1494004k 空閑)vm_info:用于 linux-amd64 JRE (1.6.0_21-b06) 的 Java HotSpot(TM) 64 位服務(wù)器 VM (17.0-b16),由java_re"和 gcc 3.2.2 構(gòu)建于 2010 年 6 月 22 日 01:10:00 (SuSE Linux)時(shí)間:2013年10月15日星期二15:08:13經(jīng)過(guò)時(shí)間:13秒
Valgrind 輸出
我真的不知道如何正確使用 Valgrind.這是運(yùn)行 valgrind app arg1
==2184====2184== 堆摘要:==2184== 在退出時(shí)使用:444 個(gè)塊中的 16,914 個(gè)字節(jié)==2184== 總堆使用量:673 分配,229 釋放,32,931 字節(jié)分配==2184====2184== 泄漏摘要:==2184== 肯定丟失:0 個(gè)塊中的 0 個(gè)字節(jié)==2184== 間接丟失:0 個(gè)塊中的 0 個(gè)字節(jié)==2184== 可能丟失:0 個(gè)塊中的 0 個(gè)字節(jié)==2184== 仍然可達(dá):444 個(gè)塊中的 16,914 個(gè)字節(jié)==2184== 抑制:0 個(gè)塊中的 0 個(gè)字節(jié)==2184== 使用 --leak-check=full 重新運(yùn)行以查看泄漏內(nèi)存的詳細(xì)信息==2184====2184== 對(duì)于檢測(cè)到和抑制的錯(cuò)誤計(jì)數(shù),重新運(yùn)行:-v==2184== 錯(cuò)誤摘要:0 個(gè)上下文中的 0 個(gè)錯(cuò)誤(抑制:7 個(gè)來(lái)自 7 個(gè))
編輯 2:
GDB 輸出和回溯
我用 GDB 完成了它.我確保 C 庫(kù)是使用 -g
標(biāo)志編譯的.
$ gdb `which java`GNU gdb (GDB) 紅帽企業(yè) Linux (7.0.1-23.el5)版權(quán)所有 (C) 2009 Free Software Foundation, Inc.許可證 GPLv3+:GNU GPL 版本 3 或更高版本 <http://gnu.org/licenses/gpl.html>這是免費(fèi)軟件:您可以自由更改和重新分發(fā)它.在法律允許的范圍內(nèi),不提供任何保證.輸入顯示復(fù)制"和顯示保修"了解詳情.這個(gè) GDB 被配置為x86_64-redhat-linux-gnu".有關(guān)錯(cuò)誤報(bào)告說(shuō)明,請(qǐng)參閱:<http://www.gnu.org/software/gdb/bugs/>...從/usr/bin/java 讀取符號(hào)...(未找到調(diào)試符號(hào))...完成.(gdb) 運(yùn)行 -jar/opt/scts/scts.jar test.config啟動(dòng)程序:/usr/bin/java -jar/opt/scts/scts.jar test.config[啟用使用 libthread_db 進(jìn)行線程調(diào)試]執(zhí)行新程序:/usr/lib/jvm/java-1.6.0-sun-1.6.0.21.x86_64/jre/bin/java[啟用使用 libthread_db 進(jìn)行線程調(diào)試][新線程 0x4022c940 (LWP 3241)][新線程 0x4032d940 (LWP 3242)][新線程 0x4042e940 (LWP 3243)][新線程 0x4052f940 (LWP 3244)][新線程 0x40630940 (LWP 3245)][新線程 0x40731940 (LWP 3246)][新線程 0x40832940 (LWP 3247)][新線程 0x40933940 (LWP 3248)][新線程 0x40a34940 (LWP 3249)]
...我的程序做了一些工作,并啟動(dòng)了一個(gè)后臺(tái)線程...
[新線程 0x41435940 (LWP 3250)]
...我在下一個(gè)命令中鍵入似乎會(huì)導(dǎo)致段錯(cuò)誤的命令;預(yù)計(jì)會(huì)有新線程...
[新線程 0x41536940 (LWP 3252)][新線程 0x41637940 (LWP 3253)][新線程 0x41738940 (LWP 3254)][新線程 0x41839940 (LWP 3255)][新線程 0x4193a940 (LWP 3256)]
...我鍵入實(shí)際觸發(fā)段錯(cuò)誤的命令.新線程是預(yù)期的,因?yàn)樵摵瘮?shù)在其自己的線程中運(yùn)行.如果它沒(méi)有 segfault,它會(huì)創(chuàng)建與上一個(gè)命令相同數(shù)量的線程...
[新線程 0x41a3b940 (LWP 3257)]程序收到信號(hào) SIGSEGV,分段錯(cuò)誤.[切換到線程 0x41839940 (LWP 3255)]0x00002aaaabcaec45 在??()
...我瘋狂地閱讀了gdb幫助,然后運(yùn)行回溯...
(gdb) bt#0 0x00002aaaabcaec45 在??()#1 0x00002aaaf3ad7800 在??()#2 0x00002aaaf3ad81e8 在??()#3 0x0000000041838600 在??()#4 0x00002aaaeacddcd0 在??()#5 0x0000000041838668 在??()#6 0x00002aaaeace23f0 在??()#7 0x0000000000000000 在 ??()
... 如果我用 -g
編譯,那不應(yīng)該有符號(hào)嗎?根據(jù) make
的輸出,我做到了:
gcc -g -Wall -fPIC -c -I ...gcc -g -shared -W1,soname, ...
看來(lái)我已經(jīng)解決了這個(gè)問(wèn)題,為了其他人的利益,我將在這里概述.
發(fā)生了什么
分段錯(cuò)誤的原因是我使用 sprintf()
將值分配給未分配值的 char *
指針.這是錯(cuò)誤的代碼:
char* ip_to_string(uint32_t ip){無(wú)符號(hào)字符字節(jié)[4];字節(jié)[0] = ip &0xFF;字節(jié)[1] = (ip >> 8) &0xFF;字節(jié)[2] = (ip >> 16) &0xFF;字節(jié)[3] = (ip >> 24) &0xFF;字符 *ip_string;sprintf(ip_string, "%d.%d.%d.%d", bytes[0], bytes[1], bytes[2], bytes[3]);返回ip_string;}
指針ip_string
在這里沒(méi)有值,這意味著它沒(méi)有指向任何東西.但是,這并不完全正確.它指向的是undefined.它可以指向任何地方.因此,在使用 sprintf()
為其賦值時(shí),我無(wú)意中覆蓋了隨機(jī)的內(nèi)存位.我相信奇怪行為的原因(盡管我從未證實(shí)這一點(diǎn))是未定義的指針指向堆棧上的某個(gè)位置.這會(huì)導(dǎo)致計(jì)算機(jī)在調(diào)用某些函數(shù)時(shí)出現(xiàn)混亂.
解決此問(wèn)題的一種方法是分配內(nèi)存,然后將指針指向該內(nèi)存,這可以通過(guò) malloc()
完成.該解決方案看起來(lái)類似于:
char* ip_to_string(uint32_t ip){無(wú)符號(hào)字符字節(jié)[4];字節(jié)[0] = ip &0xFF;字節(jié)[1] = (ip >> 8) &0xFF;字節(jié)[2] = (ip >> 16) &0xFF;字節(jié)[3] = (ip >> 24) &0xFF;字符 *ip_string = malloc(16);sprintf(ip_string, "%d.%d.%d.%d", bytes[0], bytes[1], bytes[2], bytes[3]);返回ip_string;}
這樣做的問(wèn)題是每個(gè) malloc()
都需要通過(guò)調(diào)用 free()
來(lái)匹配,否則就會(huì)發(fā)生內(nèi)存泄漏.如果我在此函數(shù)中調(diào)用 free(ip_string)
,則返回的指針將毫無(wú)用處,如果不這樣做,則必須依靠調(diào)用此函數(shù)的代碼來(lái)釋放內(nèi)存,這很漂亮危險(xiǎn)的.
據(jù)我所知,對(duì)此的正確"解決方案是將已分配的指針傳遞給函數(shù),這樣函數(shù)就有責(zé)任填充指向的內(nèi)存.這樣,可以在代碼塊中調(diào)用 malloc()
和 free()
.安全多了.這是新功能:
char* ip_to_string(uint32_t ip, char *ip_string){無(wú)符號(hào)字符字節(jié)[4];字節(jié)[0] = ip &0xFF;字節(jié)[1] = (ip >> 8) &0xFF;字節(jié)[2] = (ip >> 16) &0xFF;字節(jié)[3] = (ip >> 24) &0xFF;sprintf(ip_string, "%d.%d.%d.%d", bytes[0], bytes[1], bytes[2], bytes[3]);返回ip_string;}
問(wèn)題解答
什么會(huì)導(dǎo)致 Java 原生函數(shù)(在 C 中)在這樣的輸入時(shí)出現(xiàn)段錯(cuò)誤?
如果您為尚未分配內(nèi)存的指針?lè)峙渲担赡軙?huì)意外覆蓋堆棧上的內(nèi)存.這可能不會(huì)導(dǎo)致立即失敗,但可能會(huì)在您稍后調(diào)用其他函數(shù)時(shí)出現(xiàn)問(wèn)題.
我可以尋找哪些具體的東西來(lái)幫助我解決這個(gè)錯(cuò)誤?
像任何其他的一樣尋找分段錯(cuò)誤.諸如為未分配的內(nèi)存分配值或取消引用空指針之類的事情.我不是這方面的專家,但我敢打賭有 許多網(wǎng)絡(luò)資源用于此.p>
我以后如何編寫代碼來(lái)幫助我避免這個(gè)問(wèn)題?
小心使用指針,尤其是當(dāng)您負(fù)責(zé)創(chuàng)建指針時(shí).如果你看到這樣一行代碼:
類型*變量;
...然后尋找看起來(lái)像...的行
變量 = ...;
...并確保此行在寫入指向的內(nèi)存之前出現(xiàn).
The Project
I'm writing a Java command line interface to a C library of internal networking and network testing tools using the Java Native Interface. The C code (which I didn't write) is complex and low level, often manipulates memory at the bit level, and uses raw sockets exclusively. The application is multi-threaded from the C side (pthreads running in the background) as well as the Java side (ScheduledThreadPoolExecutors running threads that call native code). That said, the C library should be mostly stable. The Java and JNI interface code, as it turns out, is causing problems.
The Problem(s)
The application crashes with a segmentation fault upon entry into a native C function. This only happens when the program is in a specific state (i.e. successfully running a specific native function causes the next call to another specific native function to segfault). Additionally, the application crashes with a similar-looking segfault when the quit
command is issued, but again, only after successfully running that same specific native function.
I'm an inexperienced C developer and an experienced Java developer -- I'm used to crashes giving me a specific reason and a specific line number. All I have to work from in this case is the hs_err_pid*.log
output and the core dump. I've included what I could at the end of this question.
My Work So Far
- Naturally, I wanted to find the specific line of code where the crash happened. I placed a
System.out.println()
right before the native call on the Java side and aprintf()
as the first line of the native function where the program crashes being sure to usefflush(stdout)
directly after. TheSystem.out
call ran and theprintf
call didn't. This tells me that the segfault happened upon entry into the function -- something I've never seen before. - I triple checked the parameters to the function, to ensure that they wouldn't act up. However, I only pass one parameter (of type
jint
). The other two (JNIEnv *env, jobject j_object
) are JNI constructs and out of my control. - I commented out every single line in the function, leaving only a
return 0;
at the end. The segfault still happened. This leads me to believe that the problem is not in this function. - I ran the command in different orders (effectively running the native functions different orders). The segfaults only happen when one specific native function is run before the crashing function call. This specific function appears to behave properly when it is run.
- I printed the value of the
env
pointer and the value of&j_object
near the end of this other function, to ensure that I didn't somehow corrupt them. I don't know if I corrupted them, but both have non-zero values upon exiting the function. - Edit 1: Typically, the same function is run in many threads (not usually concurrently, but it should be thread safe). I ran the function from the main thread without any other threads active to ensure that multithreading on the Java side wasn't causing the issue. It wasn't, and I got the same segfault.
All of this perplexes me. Why is does it still segfault if I comment out the whole function, except for the return statement? If the problem is in this other function, why doesn't it fail there? If it's a problem where the first function messes up the memory and the second function illegally accesses the corrupt memory, why doesn't if fail on the line with the illegal access, rather than on entry to the function?
If you see an internet article where someone explains a problem similar to mine, please comment it. There are so many segfault articles, and none seem to contain this specific problem. Ditto for SO questions. The problem may also be that I'm not experienced enough to apply an abstract solution to this problem.
My Question
What can cause a Java native function (in C) to segfault upon entry like this? What specific things can I look for that will help me squash this bug? How can I write code in the future that will help me avoid this problem?
Helpful Info
For the record, I can't actually post the code. If you think a description of the code would be helpful, comment and I'll edit it in.
Error Message
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00002aaaaaf6d9c3, pid=2185, tid=1086892352
#
# JRE version: 6.0_21-b06
# Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode linux-amd64 )
# Problematic frame:
# j path.to.my.Object.native_function_name(I)I+0
#
# An error report file with more information is saved as:
# /path/to/hs_err_pid2185.log
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
The Important Bits of the hs_err_pid*.log
File
--------------- T H R E A D ---------------
Current thread (0x000000004fd13800): JavaThread "pool-1-thread-1" [_thread_in_native, id=2198, stack(0x0000000040b8a000,0x0000000040c8b000)]
siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), si_addr=0x0000000000000000
Registers:
RAX=0x34372e302e3095e1, RBX=0x00002aaaae39dcd0, RCX=0x0000000000000000, RDX=0x0000000000000000
RSP=0x0000000040c89870, RBP=0x0000000040c898c0, RSI=0x0000000040c898e8, RDI=0x000000004fd139c8
R8 =0x000000004fb631f0, R9 =0x000000004faf5d30, R10=0x00002aaaaaf6d999, R11=0x00002b1243b39580
R12=0x00002aaaae3706d0, R13=0x00002aaaae39dcd0, R14=0x0000000040c898e8, R15=0x000000004fd13800
RIP=0x00002aaaaaf6d9c3, EFL=0x0000000000010202, CSGSFS=0x0000000000000033, ERR=0x0000000000000000
TRAPNO=0x000000000000000d
Stack: [0x0000000040b8a000,0x0000000040c8b000], sp=0x0000000040c89870, free space=3fe0000000000000018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
j path.to.my.Object.native_function_name(I)I+0
j path.to.my.Object$CustomThread.fire()V+18
j path.to.my.CustomThreadSuperClass.run()V+1
j java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;+4
j java.util.concurrent.FutureTask$Sync.innerRun()V+30
j java.util.concurrent.FutureTask.run()V+4
j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V+1
j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V+15
j java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Ljava/lang/Runnable;)V+59
j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+28
j java.lang.Thread.run()V+11
v ~StubRoutines::call_stub
V [libjvm.so+0x3e756d]
V [libjvm.so+0x5f6f59]
V [libjvm.so+0x3e6e39]
V [libjvm.so+0x3e6eeb]
V [libjvm.so+0x476387]
V [libjvm.so+0x6ee452]
V [libjvm.so+0x5f80df]
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j path.to.my.Object.native_function_name(I)I+0
j path.to.my.Object$CustomThread.fire()V+18
j path.to.my.CustomThreadSuperClass.run()V+1
j java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;+4
j java.util.concurrent.FutureTask$Sync.innerRun()V+30
j java.util.concurrent.FutureTask.run()V+4
j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V+1
j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V+15
j java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Ljava/lang/Runnable;)V+59
j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+28
j java.lang.Thread.run()V+11
v ~StubRoutines::call_stub
--------------- P R O C E S S ---------------
Java Threads: ( => current thread )
0x000000004fabc800 JavaThread "pool-1-thread-6" [_thread_new, id=2203, stack(0x0000000000000000,0x0000000000000000)]
0x000000004fbcb000 JavaThread "pool-1-thread-5" [_thread_blocked, id=2202, stack(0x0000000042c13000,0x0000000042d14000)]
0x000000004fbc9800 JavaThread "pool-1-thread-4" [_thread_blocked, id=2201, stack(0x0000000042b12000,0x0000000042c13000)]
0x000000004fbc7800 JavaThread "pool-1-thread-3" [_thread_blocked, id=2200, stack(0x0000000042a11000,0x0000000042b12000)]
0x000000004fc54800 JavaThread "pool-1-thread-2" [_thread_blocked, id=2199, stack(0x0000000042910000,0x0000000042a11000)]
=>0x000000004fd13800 JavaThread "pool-1-thread-1" [_thread_in_native, id=2198, stack(0x0000000040b8a000,0x0000000040c8b000)]
0x000000004fb04800 JavaThread "Low Memory Detector" daemon [_thread_blocked, id=2194, stack(0x0000000041d0d000,0x0000000041e0e000)]
0x000000004fb02000 JavaThread "CompilerThread1" daemon [_thread_blocked, id=2193, stack(0x0000000041c0c000,0x0000000041d0d000)]
0x000000004fafc800 JavaThread "CompilerThread0" daemon [_thread_blocked, id=2192, stack(0x0000000040572000,0x0000000040673000)]
0x000000004fafa800 JavaThread "Signal Dispatcher" daemon [_thread_blocked, id=2191, stack(0x0000000040471000,0x0000000040572000)]
0x000000004fad6000 JavaThread "Finalizer" daemon [_thread_blocked, id=2190, stack(0x0000000041119000,0x000000004121a000)]
0x000000004fad4000 JavaThread "Reference Handler" daemon [_thread_blocked, id=2189, stack(0x0000000041018000,0x0000000041119000)]
0x000000004fa51000 JavaThread "main" [_thread_in_vm, id=2186, stack(0x00000000418cc000,0x00000000419cd000)]
Other Threads:
0x000000004facf800 VMThread [stack: 0x0000000040f17000,0x0000000041018000] [id=2188]
0x000000004fb0f000 WatcherThread [stack: 0x0000000041e0e000,0x0000000041f0f000] [id=2195]
VM state:not at safepoint (normal execution)
VM Mutex/Monitor currently owned by a thread: None
Heap
PSYoungGen total 305856K, used 31465K [0x00002aaadded0000, 0x00002aaaf3420000, 0x00002aaaf3420000)
eden space 262208K, 12% used [0x00002aaadded0000,0x00002aaadfd8a6a8,0x00002aaaedee0000)
from space 43648K, 0% used [0x00002aaaf0980000,0x00002aaaf0980000,0x00002aaaf3420000)
to space 43648K, 0% used [0x00002aaaedee0000,0x00002aaaedee0000,0x00002aaaf0980000)
PSOldGen total 699072K, used 0K [0x00002aaab3420000, 0x00002aaadded0000, 0x00002aaadded0000)
object space 699072K, 0% used [0x00002aaab3420000,0x00002aaab3420000,0x00002aaadded0000)
PSPermGen total 21248K, used 3741K [0x00002aaaae020000, 0x00002aaaaf4e0000, 0x00002aaab3420000)
object space 21248K, 17% used [0x00002aaaae020000,0x00002aaaae3c77c0,0x00002aaaaf4e0000)
VM Arguments:
jvm_args: -Xms1024m -Xmx1024m -XX:+UseParallelGC
--------------- S Y S T E M ---------------
OS:Red Hat Enterprise Linux Client release 5.5 (Tikanga)
uname:Linux 2.6.18-194.8.1.el5 #1 SMP Wed Jun 23 10:52:51 EDT 2010 x86_64
libc:glibc 2.5 NPTL 2.5
rlimit: STACK 10240k, CORE 102400k, NPROC 10000, NOFILE 1024, AS infinity
load average:0.21 0.08 0.05
CPU:total 1 (1 cores per cpu, 1 threads per core) family 6 model 26 stepping 4, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt
Memory: 4k page, physical 3913532k(1537020k free), swap 1494004k(1494004k free)
vm_info: Java HotSpot(TM) 64-Bit Server VM (17.0-b16) for linux-amd64 JRE (1.6.0_21-b06), built on Jun 22 2010 01:10:00 by "java_re" with gcc 3.2.2 (SuSE Linux)
time: Tue Oct 15 15:08:13 2013
elapsed time: 13 seconds
Valgrind Output
I don't really know how to use Valgrind properly. This is what came up when running valgrind app arg1
==2184==
==2184== HEAP SUMMARY:
==2184== in use at exit: 16,914 bytes in 444 blocks
==2184== total heap usage: 673 allocs, 229 frees, 32,931 bytes allocated
==2184==
==2184== LEAK SUMMARY:
==2184== definitely lost: 0 bytes in 0 blocks
==2184== indirectly lost: 0 bytes in 0 blocks
==2184== possibly lost: 0 bytes in 0 blocks
==2184== still reachable: 16,914 bytes in 444 blocks
==2184== suppressed: 0 bytes in 0 blocks
==2184== Rerun with --leak-check=full to see details of leaked memory
==2184==
==2184== For counts of detected and suppressed errors, rerun with: -v
==2184== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 7 from 7)
Edit 2:
GDB Output and Backtrace
I ran it through with GDB. I made sure that the C library was compiled with the -g
flag.
$ gdb `which java`
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/java...(no debugging symbols found)...done.
(gdb) run -jar /opt/scts/scts.jar test.config
Starting program: /usr/bin/java -jar /opt/scts/scts.jar test.config
[Thread debugging using libthread_db enabled]
Executing new program: /usr/lib/jvm/java-1.6.0-sun-1.6.0.21.x86_64/jre/bin/java
[Thread debugging using libthread_db enabled]
[New Thread 0x4022c940 (LWP 3241)]
[New Thread 0x4032d940 (LWP 3242)]
[New Thread 0x4042e940 (LWP 3243)]
[New Thread 0x4052f940 (LWP 3244)]
[New Thread 0x40630940 (LWP 3245)]
[New Thread 0x40731940 (LWP 3246)]
[New Thread 0x40832940 (LWP 3247)]
[New Thread 0x40933940 (LWP 3248)]
[New Thread 0x40a34940 (LWP 3249)]
... my program does some work, and starts a background thread ...
[New Thread 0x41435940 (LWP 3250)]
... I type the command that seems to cause the segfault on the next command; the new threads are expected ...
[New Thread 0x41536940 (LWP 3252)]
[New Thread 0x41637940 (LWP 3253)]
[New Thread 0x41738940 (LWP 3254)]
[New Thread 0x41839940 (LWP 3255)]
[New Thread 0x4193a940 (LWP 3256)]
... I type the command that actually triggers the segfault. The new thread is expected, since the function is run in its own thread. If it did not segfault, it would have created the same number of thread as the previous command ...
[New Thread 0x41a3b940 (LWP 3257)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x41839940 (LWP 3255)]
0x00002aaaabcaec45 in ?? ()
... I furiously read through the gdb help, then run the backtrace ...
(gdb) bt
#0 0x00002aaaabcaec45 in ?? ()
#1 0x00002aaaf3ad7800 in ?? ()
#2 0x00002aaaf3ad81e8 in ?? ()
#3 0x0000000041838600 in ?? ()
#4 0x00002aaaeacddcd0 in ?? ()
#5 0x0000000041838668 in ?? ()
#6 0x00002aaaeace23f0 in ?? ()
#7 0x0000000000000000 in ?? ()
... Shouldn't that have symbols if I compiled with -g
? I did, according to the lines from the output of make
:
gcc -g -Wall -fPIC -c -I ...
gcc -g -shared -W1,soname, ...
Looks like I've solved the issue, which I'll outline here for the benefit of others.
What Happened
The cause of the segmentation fault was that I used sprintf()
to assign a value to a char *
pointer which had not been assigned a value. Here is the bad code:
char* ip_to_string(uint32_t ip)
{
unsigned char bytes[4];
bytes[0] = ip & 0xFF;
bytes[1] = (ip >> 8) & 0xFF;
bytes[2] = (ip >> 16) & 0xFF;
bytes[3] = (ip >> 24) & 0xFF;
char *ip_string;
sprintf(ip_string, "%d.%d.%d.%d", bytes[0], bytes[1], bytes[2], bytes[3]);
return ip_string;
}
The pointer ip_string
does not have a value here, which means it points to nothing. Except, that's not entirely true. What it points to is undefined. It could point anywhere. So in assigning a value to it with sprintf()
, I inadvertently overwrote a random bit of memory. I believe that the reason for the odd behaviour (though I never confirmed this) was that the undefined pointer was pointing to somewhere on the stack. This caused the computer to be confused when certain functions were called.
One way to fix this is to allocate memory and then point the pointer to that memory, which can be accomplished with malloc()
. That solution would look similar to this:
char* ip_to_string(uint32_t ip)
{
unsigned char bytes[4];
bytes[0] = ip & 0xFF;
bytes[1] = (ip >> 8) & 0xFF;
bytes[2] = (ip >> 16) & 0xFF;
bytes[3] = (ip >> 24) & 0xFF;
char *ip_string = malloc(16);
sprintf(ip_string, "%d.%d.%d.%d", bytes[0], bytes[1], bytes[2], bytes[3]);
return ip_string;
}
The problem with this is that every malloc()
needs to be matched by a call to free()
, or you have a memory leak. If I call free(ip_string)
inside this function the returned pointer will be useless, and if I don't then I have to rely on the code that's calling this function to release the memory, which is pretty dangerous.
As far as I can tell, the "right" solution to this is to pass an already allocated pointer to the function, such that it is the function's responsibility to fill pointed to memory. That way, calls to malloc()
and free()
can be made in the block of code. Much safer. Here's the new function:
char* ip_to_string(uint32_t ip, char *ip_string)
{
unsigned char bytes[4];
bytes[0] = ip & 0xFF;
bytes[1] = (ip >> 8) & 0xFF;
bytes[2] = (ip >> 16) & 0xFF;
bytes[3] = (ip >> 24) & 0xFF;
sprintf(ip_string, "%d.%d.%d.%d", bytes[0], bytes[1], bytes[2], bytes[3]);
return ip_string;
}
Answers to the Questions
What can cause a Java native function (in C) to segfault upon entry like this?
If you assign a value to a pointer that hasn't been allocated memory, you may accidentally overwrite memory on the stack. This may not cause an immediate failure, but will probably cause problems when you call other functions later.
What specific things can I look for that will help me squash this bug?
Look for a segmentation fault like any other. Things like assigning a value to unallocated memory or dereferencing a null pointer. I'm not an expert on this, but I'm willing to bet that there are many web resources for this.
How can I write code in the future that will help me avoid this problem?
Be careful with pointers, especially when you are responsible for creating them. If you see a line of code that looks like this:
type *variable;
... then look for a line that looks like ...
variable = ...;
... and make sure that this line comes before writing to the pointed to memory.
這篇關(guān)于什么會(huì)導(dǎo)致 Java 本機(jī)函數(shù)(在 C 中)在進(jìn)入時(shí)出現(xiàn)段錯(cuò)誤?的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!