問題描述
我在數據庫中有 html 編碼的字符串,但許多字符實體不僅僅是標準的 &
和 <
.“
和 —
等實體.不幸的是,我們需要將這些數據提供給基于 flash 的 rss 閱讀器,而 flash 不會讀取這些實體,但它們會讀取等效的 unicode(例如 “
).
I have html encoded strings in a database, but many of the character entities are not just the standard &
and <
. Entities like “
and —
. Unfortunately we need to feed this data into a flash based rss reader and flash doesn't read these entities, but they do read the unicode equivalent (ex “
).
使用 .Net 4.0,是否有任何實用方法可以將 html 編碼的字符串轉換為使用 unicode 編碼的字符實體?
Using .Net 4.0, is there any utility method that will convert the html encoded string to use unicode encoded character entities?
這是我需要的一個更好的例子.該數據庫具有 html 字符串,例如: John &莎拉去看 $ldquo;Scream 4$rdquo;.</p>
而我需要在 rss/xml 文檔中用
標簽輸出的是: <p>John &#38;莎拉去看了&#8220;Scream 4&#8221;.</p>
Here is a better example of what I need. The db has html strings like: <p>John & Sarah went to see $ldquo;Scream 4$rdquo;.</p>
and what I need to output in the rss/xml document with in the <description>
tag is: <p>John &#38; Sarah went to see &#8220;Scream 4&#8221;.</p>
我正在使用 XmlTextWriter 從類似于此示例代碼的數據庫記錄創建 xml 文檔 http://www.dotnettutorials.com/tutorials/advanced/rss-feed-asp-net-csharp.aspx
I'm using an XmlTextWriter to create the xml document from the database records similar to this example code http://www.dotnettutorials.com/tutorials/advanced/rss-feed-asp-net-csharp.aspx
所以我需要用他們的 unicode equivilant 替換來自 db 的 html 字符串中的所有字符實體,因為基于 flash 的 rss 閱讀器無法識別任何實體,而不是最常見的實體,例如 &代碼>.
So I need to replace all of the character entities within the html string from the db with their unicode equivilant because the flash based rss reader doesn't recognize any entities beyond the most common like &
.
推薦答案
我的第一個想法是,你的 RSS 閱讀器能接受實際的字符嗎?如果是這樣,您可以使用 HtmlDecode 和提要直接進去.
My first thought is, can your RSS reader accept the actual characters? If so, you can use HtmlDecode and feed it directly in.
如果確實需要將其轉換為數字表示,則可以解析出每個實體,HtmlDecode
,然后將其轉換為 int
以獲得基數-10 Unicode 值.然后重新插入到字符串中.
If you do need to convert it to the numeric representations, you could parse out each entity, HtmlDecode
it, and then cast it to an int
to get the base-10 unicode value. Then re-insert it into the string.
下面是一些代碼來演示我的意思(未經測試,但可以理解):
Here's some code to demonstrate what I mean (it is untested, but gets the idea across):
string input = "Something with — or other character entities.";
StringBuilder output = new StringBuilder(input.Length);
for (int i = 0; i < input.Length; i++)
{
if (input[i] == '&')
{
int startOfEntity = i; // just for easier reading
int endOfEntity = input.IndexOf(';', startOfEntity);
string entity = input.Substring(startOfEntity, endOfEntity - startOfEntity);
int unicodeNumber = (int)(HttpUtility.HtmlDecode(entity)[0]);
output.Append("&#" + unicodeNumber + ";");
i = endOfEntity; // continue parsing after the end of the entity
}
else
output.Append(input[i]);
}
我可能在某個地方有一個逐一錯誤,但應該很接近.
I may have an off-by-one error somewhere in there, but it should be close.
這篇關于將字符實體轉換為其 Unicode 等效項的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!