Utf8 How Unicode can be easy...
Quite some time since I last wrote something. I recently started attending a University, so that took all of my time past weeks :)
Anyway, during a very easy lecture about programming I decided to actually program something myself, which turned out to be Unicode support for x64dbg!
Theory
At first I thought I would have to rewrite the command parser and whatnot, but that turned out to be not needed at all…
At first I wanted to convert every char
pointer and constant to the Windows-supported wchar_t
type, but this would take far too long to execute + it would break plugin compatibility and the complete internal API, which just sucks. Various discussions and a blogpost later I decided to use UTF-8 internally and call WinAPI with conversion functions.
Basically it required three things:
1) A C++ class like QString
that allows string operations on UTF-8 string;
2) Conversion functions from UTF-8 to UTF-16 and the other way around;
3) ‘Converting’ all external ASCII calls to their UNICODE variant (WinAPI, TitanEngine, dbghelp, etc).
Practice
The first step was quite easy, I grabbed this GPLv3 UTF::String class and changed it to my needs. This became the UString class.
The second step was also very easy, the blogpost I mentioned earlier had two ready-to-use functions called ConvertFromUtf16ToUtf8
and ConvertFromUtf8ToUtf16
. Those worked great, except that they would crash when fed with null
as argument. Wrapping them in UString
solved that issue without having to think :)
The third step seemed easy at first, I could debug a random application with a Chinese path within minutes. After that however, came a small moment of confusion, because Qt appears to be interpreting const char*
strings as Latin1 per default. The following code solved this and after that the log etc. were working correctly:
// Set QString codec to UTF-8
QTextCodec::setCodecForLocale(QTextCodec::codecForName("UTF-8"));
QTextCodec::setCodecForCStrings(QTextCodec::codecForName("UTF-8"));
QTextCodec::setCodecForTr(QTextCodec::codecForName("UTF-8"));
Now all that is left is the tedious task of snooping through the code looking for incompatible GetModuleFileNameExA
functions calls and convert them.
Concerns
The main concern will be that plugins will need to support UTF-8 and that new developers for x64dbg will have to adapt their coding a little. For plugin coders there will be conversion functions in the Bridge, but the conversion functions from the blogpost are really easy to copy-paste.
Conclusion
In overall adding UTF-8 support turned out to be quite easy and the work involved is just tedious, not really hard or very annoying. It can be done in little free time by almost anyone, so feel free to submit pull-requests :)
blog comments powered by Disqus