[ircd-ratbox] ircd-ratbox-3.0.8-unicode1

Daniel Corbe corbe at corbe.net
Mon May 6 05:37:47 UTC 2013


Jeremy Chadwick <ratbox at jdc.parodius.com> writes:

> On Sat, May 04, 2013 at 07:25:36PM +0300, Daniel Corbe wrote:
>> 
>> Jilles Tjoelker <jilles at stack.nl> writes:
>> 
>> > On Sat, May 04, 2013 at 05:43:28PM +0300, Daniel Corbe wrote:
>> >> As part of an ongoing effort to create a stand-alone Jabber MUC which
>> >> uses an IRC server as a back end, I've created a very simple patch to
>> >> ircd-ratbox which enables unicode nick support.
>> >
>> >> As it turns out, it was quite easy because the server is pretty much
>> >> agnostic to encoding.
>> >
>> >> The patch is available at
>> >> http://www.corbe.net/static/ircd-ratbox-3.0.8-unicode.patch
>> >
>> >> A production server is up at irc.corbe.net.
>> >
>> >> The working repo can be tracked at
>> >> git://apollo.corbe.net/ircd-ratbox.git
>> >
>> > Allowing characters like '#', '$', '&', '*' and ':' breaks the protocol
>> > (this list is not exhaustive). For example, PRIVMSG interprets things
>> > starting with '#' or '&' as channels and things starting with '$' as
>> > globals. The asterisk and question mark are wildcard characters, which
>> > is "fun" for miscreants who will make it likely that everyone will be
>> > banned when an attempt is made to ban them. Parameters starting with a
>> > colon are special to the framing mechanism; clients and servers alike
>> > will get confused when a nickname starts with a colon.
>> 
>> The proto-breaking characters are disabled in the update version of the
>> patch.  
>
> The patch just says "Unicode".  Unicode means a lot of different things;
> are we talking about UTF-8, UTF-16, or UTF-32?  I have to assume UTF-8
> (from briefly examining the patch).

It's a misnomer because the server doesn't (and shouldn't) care about
the encoding being used.  It's called -unicode because the intent is to
do the minimum amount of work possible to support UTF-8 on the client.

Most clients are either capable of some sort of auto-detection of UTF-8
or explicitly require the user to set the encoding anyways.  

irssi and xchat just sort of work out of the box and most xmpp clients
(remembering that the original intent of this project is to marry Jabber
MUCs to an IRC server) support UTF by default because that's what the
XMPP specification says they need to do.

>
> So here's a question for you: what's to stop someone from using a
> nickname that contains characters that visually look nearly identical to
> delimiters or non-permitted (protocol-violating) characters?  While
> these won't break parsers, they will cause mass confusion for end-users.
> Examples include, but are not limited to:
>
> - U+0x02f8 -- Raised colon (:)
> - U+0xfe30 -- Vertical two-dot (looks like colon) (:)
> - U+0xfe55 -- Small colon (:)
> - U+0xfe5f -- Small hash symbol (#)
> - U+0xfe60 -- Small ampersand (&)
> - U+0xfe61 -- Small asterisk (*)
> - U+0xfe69 -- Small dollar symbol ($)
> - U+0xfe6b -- Small at symbol (@)
> - U+0xff03 -- (Japanese) Full-width hash symbol (#)
> - U+0xff04 -- (Japanese) Full-width dollar symbol ($)
> - U+0xff06 -- (Japanese) Full-width ampersand (&)
> - U+0xff0a -- (Japanese) Full-width asterisk (*)
> - U+0xff1a -- (Japanese) Full-width colon (:)
> - U+0xff20 -- (Japanese) Full-width at symbol (@)
>
> And one I do not care to look up in depth (because there are tons of
> UTF-8 entries for these) are all forms of spaces.
>
> While I support UTF-8 given its ASCII backwards-compatibility, when it
> comes to existing chat protocols one must be very careful.
>
> I recommend you go through all UTF-8 blocks/pages (0x01 to 0xff) and
> examine just how many similarities there are across the board.  I sure
> as hell wouldn't want to be the one to have to write a "UTF-8 parser"
> just to filter out all of this.

None of this is really relevant.  Confusion for the sake of lulz may be
annoying but it won't break the server.  In corner cases where this
becomes an issue, it can be dealt with on an individual basis.

-Daniel


More information about the ircd-ratbox mailing list