問題描述
我有一個與來自世界各地的客戶打交道的應用程序,當然,我希望進入我的數據庫的所有內容都采用 UTF-8 編碼.
I have an application that deals with clients from all over the world, and, naturally, I want everything going into my databases to be UTF-8 encoded.
對我來說的主要問題是我不知道任何字符串的來源將是什么編碼 - 它可能來自文本框(使用 <form accept-charset="utf-8">
只有在用戶實際提交表單時才有用),或者它可能來自上傳的文本文件,所以我真的無法控制輸入.
The main problem for me is that I don't know what encoding the source of any string is going to be - it could be from a text box (using <form accept-charset="utf-8">
is only useful if the user is actually submitted the form), or it could be from an uploaded text file, so I really have no control over the input.
我需要的是一個函數或類,以確保進入我的數據庫的內容盡可能采用 UTF-8 編碼.我試過 iconv(mb_detect_encoding($text), "UTF-8", $text);
但這有問題(如果輸入是未婚夫",則返回未婚夫").我已經嘗試了很多東西 =/
What I need is a function or class that makes sure the stuff going into my database is, as far as is possible, UTF-8 encoded. I've tried iconv(mb_detect_encoding($text), "UTF-8", $text);
but that has problems (if the input is 'fiancée' it returns 'fianc'). I've tried a lot of things =/
對于文件上傳,我喜歡要求最終用戶指定他們使用的編碼,并向他們展示輸出的預覽,但這無助于抵御討厭的黑客(事實上,它可以讓他們的生活更輕松).
For file uploads, I like the idea of asking the end user to specify the encoding they use, and show them previews of what the output will look like, but this doesn't help against nasty hackers (in fact, it could make their life a little easier).
我已經閱讀了有關該主題的其他 SO 問題,但它們似乎都有細微的差異,例如我需要解析 RSS 提要"或我從網站上抓取數據"(或者,實際上,您不能").
I've read the other SO questions on the subject, but they seem to all have subtle differences like "I need to parse RSS feeds" or "I scrape data from websites" (or, indeed, "You can't").
但必須有一些東西至少有一個很好的嘗試!
But there must be something that at least has a good try!
推薦答案
您的要求非常困難.如果可能,最好讓用戶指定編碼.以這種方式防止攻擊不應該更容易或更難.
What you're asking for is extremely hard. If possible, getting the user to specify the encoding is the best. Preventing an attack shouldn't be much easier or harder that way.
但是,您可以嘗試這樣做:
However, you could try doing this:
iconv(mb_detect_encoding($text, mb_detect_order(), true), "UTF-8", $text);
將其設置為嚴格可能會幫助您獲得更好的結果.
Setting it to strict might help you get a better result.
這篇關于PHP:在不知道原始字符集的情況下將任何字符串轉換為 UTF-8,或者至少嘗試的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!