問題描述
所以,我正在嘗試解析一些包含多行文本的文本文件.我的工作是瀏覽所有單詞并將它們打印在文件中.
So, I'm trying to parse some text file which has multiple lines of text. My job is to go through all words and print them out in file.
所以,我閱讀了所有的行,我正在循環(huán)它們并用空格分隔每一行,如下所示:
So, I read all lines, I'm looping through them and splitting every line by spaces, like this:
line.split("\s+");
現(xiàn)在,問題是在某些情況下 Java 看不到兩個(gè)單詞之間的空格...
Now, the problem is that in some cases Java does not see space between two words...
我也試圖遍歷有空格但 Java 看不到它的字符串,并且 Character.isSpaceChar(char)
返回 true...
I was also trying to loop through string which has space but Java doesn't see it, and Character.isSpaceChar(char)
returned true...
現(xiàn)在我完全糊涂了……
代碼如下:
public void createMap(String inputPath, String outputPath)
throws IOException {
File f = new File(inputPath);
FileWriter fw = new FileWriter(outputPath);
List<String> lines = Files.readAllLines(f.toPath(),
StandardCharsets.UTF_8);
for (String l : lines) {
for (String w : l.split("\s+")) {
if (isNotRubbish(w.trim())) {
fw.write(w.trim() + "
");
}
}
}
fw.close();
}
private boolean isNotRubbish(String w) {
Pattern p = Pattern.compile("@?\p{L}+",
Pattern.UNICODE_CHARACTER_CLASS);
Matcher m = p.matcher(w);
return m.matches();
}
推薦答案
我懷疑你的文本字符中有類似于 non-breakable-space 不是空白,因此無法通過 \s
進(jìn)行匹配.
I suspect that you have in your text character which is similar to non-breakable-space which is not white space so it can't be matched via \s
.
在這種情況下,請(qǐng)嘗試使用 p{Zs}
而不是 s
.
In that case try to use p{Zs}
instead of s
.
如 http://www.regular-expressions.info/unicode.html 中所述一個(gè)>
p{Zs}
將匹配任何類型的空格字符
p{Zs}
will match any kind of space character
順便說一句,如果您還想包含除空格之外的其他分隔符,例如制表符
或換行符
您可以組合p{Zs}
與 s
類似 [p{Zs}s]
BTW if you would also like to include other separators than spaces like tabulators
or line breaks
you can combine p{Zs}
with s
like [p{Zs}s]
這篇關(guān)于Java 在字符串中看不到空格的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!