問題描述
項目
我正在使用 Java 本機接口編寫一個 Java 命令行接口到一個內部網絡和網絡測試工具的 C 庫.C 代碼(不是我寫的)復雜且低級,通常在位級別操作內存,并且專門使用原始套接字.該應用程序是多線程的,從 C 端(pthread 在后臺運行)以及 Java 端(ScheduledThreadPoolExecutors 運行調用本機代碼的線程).也就是說,C 庫應該大部分是穩定的.事實證明,Java 和 JNI 接口代碼會導致問題.
問題
應用程序在進入原生 C 函數時因分段錯誤而崩潰.這只發生在程序處于特定狀態時(即成功運行特定的本機函數會導致對另一個特定本機函數的下一次調用出現段錯誤).此外,當發出 quit
命令時,應用程序會因類似的段錯誤而崩潰,但同樣是在成功運行相同的特定本機函數之后.
我是一名缺乏經驗的 C 開發人員和一名經驗豐富的 Java 開發人員——我已經習慣了給我一個特定原因和特定行號的崩潰.在這種情況下,我所要做的只是 hs_err_pid*.log
輸出和核心轉儲.我已經在這個問題的末尾包含了我能做的.
到目前為止我的工作
- 當然,我想找到發生崩潰的特定代碼行.我在 Java 端的本機調用之前放置了一個
System.out.println()
并在程序崩潰的本機函數的第一行放置了一個printf()
確保之后直接使用fflush(stdout)
.System.out
調用運行而 ?code>printf 調用沒有運行.這告訴我在進入函數時發生了段錯誤——這是我以前從未見過的. - 我對函數的參數進行了三次檢查,以確保它們不會起作用.但是,我只傳遞了一個參數(類型為
jint
).其他兩個(JNIEnv *env, jobject j_object
)是 JNI 構造,不受我控制. - 我注釋掉了函數中的每一行,最后只留下一個
return 0;
.段錯誤仍然發生.這讓我相信問題不在于這個函數. - 我以不同的順序運行命令(以不同的順序有效地運行本機函數).只有在崩潰函數調用之前運行一個特定的本機函數時,才會發生段錯誤.此特定函數在運行時似乎表現正常.
- 我將
env
指針的值和&j_object
的值打印在另一個函數的末尾附近,以確保我沒有以某種方式破壞它們.我不知道我是否損壞了它們,但在退出函數時它們都有非零值. - 編輯 1: 通常,相同的函數在多個線程中運行(通常不是并發的,但它應該是線程安全的).我在沒有任何其他線程處于活動狀態的情況下從主線程運行該函數,以確保 Java 端的多線程不會導致問題.不是,我遇到了同樣的段錯誤.
所有這些都讓我感到困惑.如果我注釋掉整個函數,為什么它仍然是段錯誤,除了 return 語句?如果問題出在這個其他功能上,為什么它不會在那里失敗?如果是第一個函數弄亂了內存,而第二個函數非法訪問了損壞的內存的問題,為什么不失敗就在非法訪問的行上,而不是在進入函數時?
如果您看到一篇互聯網文章,其中有人解釋了與我類似的問題,請發表評論.有很多segfault文章,似乎沒有一個包含這個特定問題.SO問題同上.問題也可能是我沒有足夠的經驗來為這個問題應用抽象的解決方案.
我的問題
什么會導致 Java 原生函數(在 C 中)在這樣的輸入時出現段錯誤?我可以尋找哪些具體的東西來幫助我解決這個錯誤?我以后如何編寫代碼來幫助我避免這個問題?
有用的信息
為了記錄,我實際上無法發布代碼.如果您認為對代碼的描述會有所幫助,請發表評論,我會對其進行編輯.
錯誤信息
<代碼>## Java 運行時環境檢測到一個致命錯誤:## SIGSEGV (0xb) at pc=0x00002aaaaaf6d9c3, pid=2185, tid=1086892352## JRE 版本:6.0_21-b06# Java 虛擬機:Java HotSpot(TM) 64 位服務器虛擬機(17.0-b16 混合模式 linux-amd64)# 有問題的框架:# j path.to.my.Object.native_function_name(I)I+0## 包含更多信息的錯誤報告文件保存為:#/path/to/hs_err_pid2185.log## 如果您想提交錯誤報告,請訪問:# http://java.sun.com/webapps/bugreport/crash.jsp# 崩潰發生在 Java 虛擬機之外的本地代碼中.# 查看有問題的框架以了解報告錯誤的位置.#
hs_err_pid*.log
文件的重要部分
--------------- T H R E A D ---------------當前線程 (0x000000004fd13800): JavaThread "pool-1-thread-1" [_thread_in_native, id=2198, stack(0x0000000040b8a000,0x0000000040c8b000)]siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), si_addr=0x0000000000000000寄存器:RAX=0x34372e302e3095e1,RBX=0x00002aaaae39dcd0,RCX=0x0000000000000000,RDX=0x0000000000000000RSP=0x0000000040c89870,RBP=0x0000000040c898c0,RSI=0x0000000040c898e8,RDI=0x000000004fd139c8R8 =0x000000004fb631f0,R9 =0x000000004faf5d30,R10=0x00002aaaaaf6d999,R11=0x00002b1243b39580R12=0x00002aaaae3706d0,R13=0x00002aaaae39dcd0,R14=0x0000000040c898e8,R15=0x000000004fd13800RIP=0x00002aaaaaf6d9c3,EFL=0x0000000000010202,CSGSFS=0x0000000000000033,ERR=0x0000000000000000TRAPNO=0x000000000000000d堆棧:[0x0000000040b8a000,0x0000000040c8b000],sp=0x0000000040c89870,可用空間=3fe0000000000000018k本機幀:(J=編譯的 Java 代碼,j=解釋的,Vv=VM 代碼,C=本機代碼)j path.to.my.Object.native_function_name(I)I+0j path.to.my.Object$CustomThread.fire()V+18j path.to.my.CustomThreadSuperClass.run()V+1j java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;+4j java.util.concurrent.FutureTask$Sync.innerRun()V+30j java.util.concurrent.FutureTask.run()V+4j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V+1j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V+15j java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Ljava/lang/Runnable;)V+59j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+28j java.lang.Thread.run()V+11v ~StubRoutines::call_stubV [libjvm.so+0x3e756d]V [libjvm.so+0x5f6f59]V [libjvm.so+0x3e6e39]V [libjvm.so+0x3e6eeb]V [libjvm.so+0x476387]V [libjvm.so+0x6ee452]V [libjvm.so+0x5f80df]Java 框架:(J=編譯的 Java 代碼,j=解釋的,Vv=VM 代碼)j path.to.my.Object.native_function_name(I)I+0j path.to.my.Object$CustomThread.fire()V+18j path.to.my.CustomThreadSuperClass.run()V+1j java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;+4j java.util.concurrent.FutureTask$Sync.innerRun()V+30j java.util.concurrent.FutureTask.run()V+4j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V+1j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V+15j java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Ljava/lang/Runnable;)V+59j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+28j java.lang.Thread.run()V+11v ~StubRoutines::call_stub- - - - - - - - 過程 - - - - - - - -Java 線程:(=> 當前線程)0x000000004fabc800 JavaThread "pool-1-thread-6" [_thread_new, id=2203, stack(0x0000000000000000,0x0000000000000000)]0x000000004fbcb000 JavaThread "pool-1-thread-5" [_thread_blocked, id=2202, stack(0x0000000042c13000,0x0000000042d14000)]0x000000004fbc9800 JavaThread "pool-1-thread-4" [_thread_blocked, id=2201, stack(0x0000000042b12000,0x0000000042c13000)]0x000000004fbc7800 JavaThread "pool-1-thread-3" [_thread_blocked, id=2200, stack(0x0000000042a11000,0x0000000042b12000)]0x000000004fc54800 JavaThread "pool-1-thread-2" [_thread_blocked, id=2199, stack(0x0000000042910000,0x0000000042a11000)]=>0x000000004fd13800 JavaThread "pool-1-thread-1" [_thread_in_native, id=2198, stack(0x0000000040b8a000,0x0000000040c8b000)]0x000000004fb04800 JavaThread低內存檢測器"守護進程 [_thread_blocked, id=2194, stack(0x0000000041d0d000,0x0000000041e0e000)]0x000000004fb02000 JavaThread "CompilerThread1" 守護進程 [_thread_blocked, id=2193, stack(0x0000000041c0c000,0x0000000041d0d000)]0x000000004fafc800 JavaThreadCompilerThread0"守護進程 [_thread_blocked, id=2192, stack(0x0000000040572000,0x0000000040673000)]0x000000004fafa800 JavaThread信號調度程序"守護進程 [_thread_blocked, id=2191, stack(0x0000000040471000,0x0000000040572000)]0x000000004fad6000 JavaThread終結器"守護進程 [_thread_blocked, id=2190, stack(0x0000000041119000,0x000000004121a000)]0x000000004fad4000 JavaThread引用處理程序"守護進程 [_thread_blocked, id=2189, stack(0x0000000041018000,0x0000000041119000)]0x000000004fa51000 JavaThread "main" [_thread_in_vm, id=2186, stack(0x00000000418cc000,0x00000000419cd000)]其他主題:0x000000004facf800 VMThread [堆棧:0x0000000040f17000,0x0000000041018000] [id=2188]0x000000004fb0f000 WatcherThread [堆棧:0x0000000041e0e000,0x0000000041f0f000] [id=2195]VM 狀態:不在安全點(正常執行)VM Mutex/Monitor 當前由一個線程擁有:無堆PSYoungGen 總計 305856K,已使用 31465K [0x00002aaadded0000, 0x00002aaaf3420000, 0x00002aaaf3420000)伊甸園空間 262208K,已使用 12% [0x00002aaadded0000,0x00002aaadfd8a6a8,0x00002aaaeedee0000)從空間 43648K, 0% 使用 [0x00002aaaf0980000,0x00002aaaf0980000,0x00002aaaf3420000)到空間 43648K,使用 0% [0x00002aaaedee0000,0x00002aaaedee0000,0x00002aaaf0980000)PSOldGen 總計 699072K,已使用 0K [0x00002aaab3420000, 0x00002aaadded0000, 0x00002aaadded0000)對象空間 699072K,已使用 0% [0x00002aaab3420000,0x00002aaab3420000,0x00002aaadd0000)PSPermGen 總計 21248K,已使用 3741K [0x00002aaaae020000, 0x00002aaaaf4e0000, 0x00002aaab3420000)對象空間 21248K,已使用 17% [0x00002aaaae020000,0x00002aaaae3c77c0,0x00002aaaaf4e0000)虛擬機參數:jvm_args: -Xms1024m -Xmx1024m -XX:+UseParallelGC- - - - - - - - 系統 - - - - - - - -操作系統:Red Hat Enterprise Linux 客戶端版本 5.5 (Tikanga)uname:Linux 2.6.18-194.8.1.el5 #1 SMP Wed Jun 23 10:52:51 EDT 2010 x86_64libc:glibc 2.5 NPTL 2.5rlimit:堆棧 10240k,核心 102400k,NPROC 10000,NOFILE 1024,AS 無窮大平均負載:0.21 0.08 0.05CPU:共 1 個(每個 CPU 1 個內核,每個內核 1 個線程)系列 6 型號 26 步進 4、cmov、cx8、fxsr、mmx、sse、sse2、sse3、ssse3、sse4.1、sse4.2、popcnt內存:4k 頁,物理 3913532k(1537020k 空閑),交換 1494004k(1494004k 空閑)vm_info:用于 linux-amd64 JRE (1.6.0_21-b06) 的 Java HotSpot(TM) 64 位服務器 VM (17.0-b16),由java_re"和 gcc 3.2.2 構建于 2010 年 6 月 22 日 01:10:00 (SuSE Linux)時間:2013年10月15日星期二15:08:13經過時間:13秒
Valgrind 輸出
我真的不知道如何正確使用 Valgrind.這是運行 valgrind app arg1
==2184====2184== 堆摘要:==2184== 在退出時使用:444 個塊中的 16,914 個字節==2184== 總堆使用量:673 分配,229 釋放,32,931 字節分配==2184====2184== 泄漏摘要:==2184== 肯定丟失:0 個塊中的 0 個字節==2184== 間接丟失:0 個塊中的 0 個字節==2184== 可能丟失:0 個塊中的 0 個字節==2184== 仍然可達:444 個塊中的 16,914 個字節==2184== 抑制:0 個塊中的 0 個字節==2184== 使用 --leak-check=full 重新運行以查看泄漏內存的詳細信息==2184====2184== 對于檢測到和抑制的錯誤計數,重新運行:-v==2184== 錯誤摘要:0 個上下文中的 0 個錯誤(抑制:7 個來自 7 個)
編輯 2:
GDB 輸出和回溯
我用 GDB 完成了它.我確保 C 庫是使用 -g
標志編譯的.
$ gdb `which java`GNU gdb (GDB) 紅帽企業 Linux (7.0.1-23.el5)版權所有 (C) 2009 Free Software Foundation, Inc.許可證 GPLv3+:GNU GPL 版本 3 或更高版本 <http://gnu.org/licenses/gpl.html>這是免費軟件:您可以自由更改和重新分發它.在法律允許的范圍內,不提供任何保證.輸入顯示復制"和顯示保修"了解詳情.這個 GDB 被配置為x86_64-redhat-linux-gnu".有關錯誤報告說明,請參閱:<http://www.gnu.org/software/gdb/bugs/>...從/usr/bin/java 讀取符號...(未找到調試符號)...完成.(gdb) 運行 -jar/opt/scts/scts.jar test.config啟動程序:/usr/bin/java -jar/opt/scts/scts.jar test.config[啟用使用 libthread_db 進行線程調試]執行新程序:/usr/lib/jvm/java-1.6.0-sun-1.6.0.21.x86_64/jre/bin/java[啟用使用 libthread_db 進行線程調試][新線程 0x4022c940 (LWP 3241)][新線程 0x4032d940 (LWP 3242)][新線程 0x4042e940 (LWP 3243)][新線程 0x4052f940 (LWP 3244)][新線程 0x40630940 (LWP 3245)][新線程 0x40731940 (LWP 3246)][新線程 0x40832940 (LWP 3247)][新線程 0x40933940 (LWP 3248)][新線程 0x40a34940 (LWP 3249)]
...我的程序做了一些工作,并啟動了一個后臺線程...
[新線程 0x41435940 (LWP 3250)]
...我在下一個命令中鍵入似乎會導致段錯誤的命令;預計會有新線程...
[新線程 0x41536940 (LWP 3252)][新線程 0x41637940 (LWP 3253)][新線程 0x41738940 (LWP 3254)][新線程 0x41839940 (LWP 3255)][新線程 0x4193a940 (LWP 3256)]
...我鍵入實際觸發段錯誤的命令.新線程是預期的,因為該函數在其自己的線程中運行.如果它沒有 segfault,它會創建與上一個命令相同數量的線程...
[新線程 0x41a3b940 (LWP 3257)]程序收到信號 SIGSEGV,分段錯誤.[切換到線程 0x41839940 (LWP 3255)]0x00002aaaabcaec45 在??()
...我瘋狂地閱讀了gdb幫助,然后運行回溯...
(gdb) bt#0 0x00002aaaabcaec45 在??()#1 0x00002aaaf3ad7800 在??()#2 0x00002aaaf3ad81e8 在??()#3 0x0000000041838600 在??()#4 0x00002aaaeacddcd0 在??()#5 0x0000000041838668 在??()#6 0x00002aaaeace23f0 在??()#7 0x0000000000000000 在 ??()
... 如果我用 -g
編譯,那不應該有符號嗎?根據 make
的輸出,我做到了:
gcc -g -Wall -fPIC -c -I ...gcc -g -shared -W1,soname, ...
看來我已經解決了這個問題,為了其他人的利益,我將在這里概述.
發生了什么
分段錯誤的原因是我使用 sprintf()
將值分配給未分配值的 char *
指針.這是錯誤的代碼:
char* ip_to_string(uint32_t ip){無符號字符字節[4];字節[0] = ip &0xFF;字節[1] = (ip >> 8) &0xFF;字節[2] = (ip >> 16) &0xFF;字節[3] = (ip >> 24) &0xFF;字符 *ip_string;sprintf(ip_string, "%d.%d.%d.%d", bytes[0], bytes[1], bytes[2], bytes[3]);返回ip_string;}
指針ip_string
在這里沒有值,這意味著它沒有指向任何東西.但是,這并不完全正確.它指向的是undefined.它可以指向任何地方.因此,在使用 sprintf()
為其賦值時,我無意中覆蓋了隨機的內存位.我相信奇怪行為的原因(盡管我從未證實這一點)是未定義的指針指向堆棧上的某個位置.這會導致計算機在調用某些函數時出現混亂.
解決此問題的一種方法是分配內存,然后將指針指向該內存,這可以通過 malloc()
完成.該解決方案看起來類似于:
char* ip_to_string(uint32_t ip){無符號字符字節[4];字節[0] = ip &0xFF;字節[1] = (ip >> 8) &0xFF;字節[2] = (ip >> 16) &0xFF;字節[3] = (ip >> 24) &0xFF;字符 *ip_string = malloc(16);sprintf(ip_string, "%d.%d.%d.%d", bytes[0], bytes[1], bytes[2], bytes[3]);返回ip_string;}
這樣做的問題是每個 malloc()
都需要通過調用 free()
來匹配,否則就會發生內存泄漏.如果我在此函數中調用 free(ip_string)
,則返回的指針將毫無用處,如果不這樣做,則必須依靠調用此函數的代碼來釋放內存,這很漂亮危險的.
據我所知,對此的正確"解決方案是將已分配的指針傳遞給函數,這樣函數就有責任填充指向的內存.這樣,可以在代碼塊中調用 malloc()
和 free()
.安全多了.這是新功能:
char* ip_to_string(uint32_t ip, char *ip_string){無符號字符字節[4];字節[0] = ip &0xFF;字節[1] = (ip >> 8) &0xFF;字節[2] = (ip >> 16) &0xFF;字節[3] = (ip >> 24) &0xFF;sprintf(ip_string, "%d.%d.%d.%d", bytes[0], bytes[1], bytes[2], bytes[3]);返回ip_string;}
問題解答
什么會導致 Java 原生函數(在 C 中)在這樣的輸入時出現段錯誤?
如果您為尚未分配內存的指針分配值,您可能會意外覆蓋堆棧上的內存.這可能不會導致立即失敗,但可能會在您稍后調用其他函數時出現問題.
我可以尋找哪些具體的東西來幫助我解決這個錯誤?
像任何其他的一樣尋找分段錯誤.諸如為未分配的內存分配值或取消引用空指針之類的事情.我不是這方面的專家,但我敢打賭有 許多網絡資源用于此.p>
我以后如何編寫代碼來幫助我避免這個問題?
小心使用指針,尤其是當您負責創建指針時.如果你看到這樣一行代碼:
類型*變量;
...然后尋找看起來像...的行
變量 = ...;
...并確保此行在寫入指向的內存之前出現.
The Project
I'm writing a Java command line interface to a C library of internal networking and network testing tools using the Java Native Interface. The C code (which I didn't write) is complex and low level, often manipulates memory at the bit level, and uses raw sockets exclusively. The application is multi-threaded from the C side (pthreads running in the background) as well as the Java side (ScheduledThreadPoolExecutors running threads that call native code). That said, the C library should be mostly stable. The Java and JNI interface code, as it turns out, is causing problems.
The Problem(s)
The application crashes with a segmentation fault upon entry into a native C function. This only happens when the program is in a specific state (i.e. successfully running a specific native function causes the next call to another specific native function to segfault). Additionally, the application crashes with a similar-looking segfault when the quit
command is issued, but again, only after successfully running that same specific native function.
I'm an inexperienced C developer and an experienced Java developer -- I'm used to crashes giving me a specific reason and a specific line number. All I have to work from in this case is the hs_err_pid*.log
output and the core dump. I've included what I could at the end of this question.
My Work So Far
- Naturally, I wanted to find the specific line of code where the crash happened. I placed a
System.out.println()
right before the native call on the Java side and aprintf()
as the first line of the native function where the program crashes being sure to usefflush(stdout)
directly after. TheSystem.out
call ran and theprintf
call didn't. This tells me that the segfault happened upon entry into the function -- something I've never seen before. - I triple checked the parameters to the function, to ensure that they wouldn't act up. However, I only pass one parameter (of type
jint
). The other two (JNIEnv *env, jobject j_object
) are JNI constructs and out of my control. - I commented out every single line in the function, leaving only a
return 0;
at the end. The segfault still happened. This leads me to believe that the problem is not in this function. - I ran the command in different orders (effectively running the native functions different orders). The segfaults only happen when one specific native function is run before the crashing function call. This specific function appears to behave properly when it is run.
- I printed the value of the
env
pointer and the value of&j_object
near the end of this other function, to ensure that I didn't somehow corrupt them. I don't know if I corrupted them, but both have non-zero values upon exiting the function. - Edit 1: Typically, the same function is run in many threads (not usually concurrently, but it should be thread safe). I ran the function from the main thread without any other threads active to ensure that multithreading on the Java side wasn't causing the issue. It wasn't, and I got the same segfault.
All of this perplexes me. Why is does it still segfault if I comment out the whole function, except for the return statement? If the problem is in this other function, why doesn't it fail there? If it's a problem where the first function messes up the memory and the second function illegally accesses the corrupt memory, why doesn't if fail on the line with the illegal access, rather than on entry to the function?
If you see an internet article where someone explains a problem similar to mine, please comment it. There are so many segfault articles, and none seem to contain this specific problem. Ditto for SO questions. The problem may also be that I'm not experienced enough to apply an abstract solution to this problem.
My Question
What can cause a Java native function (in C) to segfault upon entry like this? What specific things can I look for that will help me squash this bug? How can I write code in the future that will help me avoid this problem?
Helpful Info
For the record, I can't actually post the code. If you think a description of the code would be helpful, comment and I'll edit it in.
Error Message
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00002aaaaaf6d9c3, pid=2185, tid=1086892352
#
# JRE version: 6.0_21-b06
# Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode linux-amd64 )
# Problematic frame:
# j path.to.my.Object.native_function_name(I)I+0
#
# An error report file with more information is saved as:
# /path/to/hs_err_pid2185.log
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
The Important Bits of the hs_err_pid*.log
File
--------------- T H R E A D ---------------
Current thread (0x000000004fd13800): JavaThread "pool-1-thread-1" [_thread_in_native, id=2198, stack(0x0000000040b8a000,0x0000000040c8b000)]
siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), si_addr=0x0000000000000000
Registers:
RAX=0x34372e302e3095e1, RBX=0x00002aaaae39dcd0, RCX=0x0000000000000000, RDX=0x0000000000000000
RSP=0x0000000040c89870, RBP=0x0000000040c898c0, RSI=0x0000000040c898e8, RDI=0x000000004fd139c8
R8 =0x000000004fb631f0, R9 =0x000000004faf5d30, R10=0x00002aaaaaf6d999, R11=0x00002b1243b39580
R12=0x00002aaaae3706d0, R13=0x00002aaaae39dcd0, R14=0x0000000040c898e8, R15=0x000000004fd13800
RIP=0x00002aaaaaf6d9c3, EFL=0x0000000000010202, CSGSFS=0x0000000000000033, ERR=0x0000000000000000
TRAPNO=0x000000000000000d
Stack: [0x0000000040b8a000,0x0000000040c8b000], sp=0x0000000040c89870, free space=3fe0000000000000018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
j path.to.my.Object.native_function_name(I)I+0
j path.to.my.Object$CustomThread.fire()V+18
j path.to.my.CustomThreadSuperClass.run()V+1
j java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;+4
j java.util.concurrent.FutureTask$Sync.innerRun()V+30
j java.util.concurrent.FutureTask.run()V+4
j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V+1
j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V+15
j java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Ljava/lang/Runnable;)V+59
j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+28
j java.lang.Thread.run()V+11
v ~StubRoutines::call_stub
V [libjvm.so+0x3e756d]
V [libjvm.so+0x5f6f59]
V [libjvm.so+0x3e6e39]
V [libjvm.so+0x3e6eeb]
V [libjvm.so+0x476387]
V [libjvm.so+0x6ee452]
V [libjvm.so+0x5f80df]
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j path.to.my.Object.native_function_name(I)I+0
j path.to.my.Object$CustomThread.fire()V+18
j path.to.my.CustomThreadSuperClass.run()V+1
j java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;+4
j java.util.concurrent.FutureTask$Sync.innerRun()V+30
j java.util.concurrent.FutureTask.run()V+4
j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V+1
j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V+15
j java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Ljava/lang/Runnable;)V+59
j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+28
j java.lang.Thread.run()V+11
v ~StubRoutines::call_stub
--------------- P R O C E S S ---------------
Java Threads: ( => current thread )
0x000000004fabc800 JavaThread "pool-1-thread-6" [_thread_new, id=2203, stack(0x0000000000000000,0x0000000000000000)]
0x000000004fbcb000 JavaThread "pool-1-thread-5" [_thread_blocked, id=2202, stack(0x0000000042c13000,0x0000000042d14000)]
0x000000004fbc9800 JavaThread "pool-1-thread-4" [_thread_blocked, id=2201, stack(0x0000000042b12000,0x0000000042c13000)]
0x000000004fbc7800 JavaThread "pool-1-thread-3" [_thread_blocked, id=2200, stack(0x0000000042a11000,0x0000000042b12000)]
0x000000004fc54800 JavaThread "pool-1-thread-2" [_thread_blocked, id=2199, stack(0x0000000042910000,0x0000000042a11000)]
=>0x000000004fd13800 JavaThread "pool-1-thread-1" [_thread_in_native, id=2198, stack(0x0000000040b8a000,0x0000000040c8b000)]
0x000000004fb04800 JavaThread "Low Memory Detector" daemon [_thread_blocked, id=2194, stack(0x0000000041d0d000,0x0000000041e0e000)]
0x000000004fb02000 JavaThread "CompilerThread1" daemon [_thread_blocked, id=2193, stack(0x0000000041c0c000,0x0000000041d0d000)]
0x000000004fafc800 JavaThread "CompilerThread0" daemon [_thread_blocked, id=2192, stack(0x0000000040572000,0x0000000040673000)]
0x000000004fafa800 JavaThread "Signal Dispatcher" daemon [_thread_blocked, id=2191, stack(0x0000000040471000,0x0000000040572000)]
0x000000004fad6000 JavaThread "Finalizer" daemon [_thread_blocked, id=2190, stack(0x0000000041119000,0x000000004121a000)]
0x000000004fad4000 JavaThread "Reference Handler" daemon [_thread_blocked, id=2189, stack(0x0000000041018000,0x0000000041119000)]
0x000000004fa51000 JavaThread "main" [_thread_in_vm, id=2186, stack(0x00000000418cc000,0x00000000419cd000)]
Other Threads:
0x000000004facf800 VMThread [stack: 0x0000000040f17000,0x0000000041018000] [id=2188]
0x000000004fb0f000 WatcherThread [stack: 0x0000000041e0e000,0x0000000041f0f000] [id=2195]
VM state:not at safepoint (normal execution)
VM Mutex/Monitor currently owned by a thread: None
Heap
PSYoungGen total 305856K, used 31465K [0x00002aaadded0000, 0x00002aaaf3420000, 0x00002aaaf3420000)
eden space 262208K, 12% used [0x00002aaadded0000,0x00002aaadfd8a6a8,0x00002aaaedee0000)
from space 43648K, 0% used [0x00002aaaf0980000,0x00002aaaf0980000,0x00002aaaf3420000)
to space 43648K, 0% used [0x00002aaaedee0000,0x00002aaaedee0000,0x00002aaaf0980000)
PSOldGen total 699072K, used 0K [0x00002aaab3420000, 0x00002aaadded0000, 0x00002aaadded0000)
object space 699072K, 0% used [0x00002aaab3420000,0x00002aaab3420000,0x00002aaadded0000)
PSPermGen total 21248K, used 3741K [0x00002aaaae020000, 0x00002aaaaf4e0000, 0x00002aaab3420000)
object space 21248K, 17% used [0x00002aaaae020000,0x00002aaaae3c77c0,0x00002aaaaf4e0000)
VM Arguments:
jvm_args: -Xms1024m -Xmx1024m -XX:+UseParallelGC
--------------- S Y S T E M ---------------
OS:Red Hat Enterprise Linux Client release 5.5 (Tikanga)
uname:Linux 2.6.18-194.8.1.el5 #1 SMP Wed Jun 23 10:52:51 EDT 2010 x86_64
libc:glibc 2.5 NPTL 2.5
rlimit: STACK 10240k, CORE 102400k, NPROC 10000, NOFILE 1024, AS infinity
load average:0.21 0.08 0.05
CPU:total 1 (1 cores per cpu, 1 threads per core) family 6 model 26 stepping 4, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt
Memory: 4k page, physical 3913532k(1537020k free), swap 1494004k(1494004k free)
vm_info: Java HotSpot(TM) 64-Bit Server VM (17.0-b16) for linux-amd64 JRE (1.6.0_21-b06), built on Jun 22 2010 01:10:00 by "java_re" with gcc 3.2.2 (SuSE Linux)
time: Tue Oct 15 15:08:13 2013
elapsed time: 13 seconds
Valgrind Output
I don't really know how to use Valgrind properly. This is what came up when running valgrind app arg1
==2184==
==2184== HEAP SUMMARY:
==2184== in use at exit: 16,914 bytes in 444 blocks
==2184== total heap usage: 673 allocs, 229 frees, 32,931 bytes allocated
==2184==
==2184== LEAK SUMMARY:
==2184== definitely lost: 0 bytes in 0 blocks
==2184== indirectly lost: 0 bytes in 0 blocks
==2184== possibly lost: 0 bytes in 0 blocks
==2184== still reachable: 16,914 bytes in 444 blocks
==2184== suppressed: 0 bytes in 0 blocks
==2184== Rerun with --leak-check=full to see details of leaked memory
==2184==
==2184== For counts of detected and suppressed errors, rerun with: -v
==2184== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 7 from 7)
Edit 2:
GDB Output and Backtrace
I ran it through with GDB. I made sure that the C library was compiled with the -g
flag.
$ gdb `which java`
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/java...(no debugging symbols found)...done.
(gdb) run -jar /opt/scts/scts.jar test.config
Starting program: /usr/bin/java -jar /opt/scts/scts.jar test.config
[Thread debugging using libthread_db enabled]
Executing new program: /usr/lib/jvm/java-1.6.0-sun-1.6.0.21.x86_64/jre/bin/java
[Thread debugging using libthread_db enabled]
[New Thread 0x4022c940 (LWP 3241)]
[New Thread 0x4032d940 (LWP 3242)]
[New Thread 0x4042e940 (LWP 3243)]
[New Thread 0x4052f940 (LWP 3244)]
[New Thread 0x40630940 (LWP 3245)]
[New Thread 0x40731940 (LWP 3246)]
[New Thread 0x40832940 (LWP 3247)]
[New Thread 0x40933940 (LWP 3248)]
[New Thread 0x40a34940 (LWP 3249)]
... my program does some work, and starts a background thread ...
[New Thread 0x41435940 (LWP 3250)]
... I type the command that seems to cause the segfault on the next command; the new threads are expected ...
[New Thread 0x41536940 (LWP 3252)]
[New Thread 0x41637940 (LWP 3253)]
[New Thread 0x41738940 (LWP 3254)]
[New Thread 0x41839940 (LWP 3255)]
[New Thread 0x4193a940 (LWP 3256)]
... I type the command that actually triggers the segfault. The new thread is expected, since the function is run in its own thread. If it did not segfault, it would have created the same number of thread as the previous command ...
[New Thread 0x41a3b940 (LWP 3257)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x41839940 (LWP 3255)]
0x00002aaaabcaec45 in ?? ()
... I furiously read through the gdb help, then run the backtrace ...
(gdb) bt
#0 0x00002aaaabcaec45 in ?? ()
#1 0x00002aaaf3ad7800 in ?? ()
#2 0x00002aaaf3ad81e8 in ?? ()
#3 0x0000000041838600 in ?? ()
#4 0x00002aaaeacddcd0 in ?? ()
#5 0x0000000041838668 in ?? ()
#6 0x00002aaaeace23f0 in ?? ()
#7 0x0000000000000000 in ?? ()
... Shouldn't that have symbols if I compiled with -g
? I did, according to the lines from the output of make
:
gcc -g -Wall -fPIC -c -I ...
gcc -g -shared -W1,soname, ...
Looks like I've solved the issue, which I'll outline here for the benefit of others.
What Happened
The cause of the segmentation fault was that I used sprintf()
to assign a value to a char *
pointer which had not been assigned a value. Here is the bad code:
char* ip_to_string(uint32_t ip)
{
unsigned char bytes[4];
bytes[0] = ip & 0xFF;
bytes[1] = (ip >> 8) & 0xFF;
bytes[2] = (ip >> 16) & 0xFF;
bytes[3] = (ip >> 24) & 0xFF;
char *ip_string;
sprintf(ip_string, "%d.%d.%d.%d", bytes[0], bytes[1], bytes[2], bytes[3]);
return ip_string;
}
The pointer ip_string
does not have a value here, which means it points to nothing. Except, that's not entirely true. What it points to is undefined. It could point anywhere. So in assigning a value to it with sprintf()
, I inadvertently overwrote a random bit of memory. I believe that the reason for the odd behaviour (though I never confirmed this) was that the undefined pointer was pointing to somewhere on the stack. This caused the computer to be confused when certain functions were called.
One way to fix this is to allocate memory and then point the pointer to that memory, which can be accomplished with malloc()
. That solution would look similar to this:
char* ip_to_string(uint32_t ip)
{
unsigned char bytes[4];
bytes[0] = ip & 0xFF;
bytes[1] = (ip >> 8) & 0xFF;
bytes[2] = (ip >> 16) & 0xFF;
bytes[3] = (ip >> 24) & 0xFF;
char *ip_string = malloc(16);
sprintf(ip_string, "%d.%d.%d.%d", bytes[0], bytes[1], bytes[2], bytes[3]);
return ip_string;
}
The problem with this is that every malloc()
needs to be matched by a call to free()
, or you have a memory leak. If I call free(ip_string)
inside this function the returned pointer will be useless, and if I don't then I have to rely on the code that's calling this function to release the memory, which is pretty dangerous.
As far as I can tell, the "right" solution to this is to pass an already allocated pointer to the function, such that it is the function's responsibility to fill pointed to memory. That way, calls to malloc()
and free()
can be made in the block of code. Much safer. Here's the new function:
char* ip_to_string(uint32_t ip, char *ip_string)
{
unsigned char bytes[4];
bytes[0] = ip & 0xFF;
bytes[1] = (ip >> 8) & 0xFF;
bytes[2] = (ip >> 16) & 0xFF;
bytes[3] = (ip >> 24) & 0xFF;
sprintf(ip_string, "%d.%d.%d.%d", bytes[0], bytes[1], bytes[2], bytes[3]);
return ip_string;
}
Answers to the Questions
What can cause a Java native function (in C) to segfault upon entry like this?
If you assign a value to a pointer that hasn't been allocated memory, you may accidentally overwrite memory on the stack. This may not cause an immediate failure, but will probably cause problems when you call other functions later.
What specific things can I look for that will help me squash this bug?
Look for a segmentation fault like any other. Things like assigning a value to unallocated memory or dereferencing a null pointer. I'm not an expert on this, but I'm willing to bet that there are many web resources for this.
How can I write code in the future that will help me avoid this problem?
Be careful with pointers, especially when you are responsible for creating them. If you see a line of code that looks like this:
type *variable;
... then look for a line that looks like ...
variable = ...;
... and make sure that this line comes before writing to the pointed to memory.
這篇關于什么會導致 Java 本機函數(在 C 中)在進入時出現段錯誤?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!