Tuesday, February 23, 2010

Experimental Dada Mail w/unicode ¡Support! Released

(this is a repost from here, 'cause I'm pretty stoked on it)


This is the first step in the localization project, since we can't very well translate Dada Mail if Dada Mail can't use the translations available.

I have to let this project rest for a little bit (and collect my wits - it was a very difficult step!) but any and all feedback is welcome, if you'd like to give this a spin - bug reports/problems of any kind are very much appreciated.

This version of Dada Mail should basically be able to support any language that can in the unicode characters set and UTF-8 encoding. Which, should be, well, a lot of them. It doesn't (Dada Mail), but where it fails? I don't know - but it's a good time to test and see where it's wrong.

For simple Euro-centric stuff, like this:

Je peux manger du verre, ça ne me fait pas mal.

It should be fine. For something a little more wild:

أنا قادر على أكل الزجاج و هذا لا يؤلمني.

(which should be Arabic)

Well, I can only go on if something visually looks correct :) Even this email is sort of a test - I don't know if it's going to work, or not - so, fingers crossed! If it does - we're on a good track, since Dada Bridge taking a random email, having it go through the system that's mostly tested using a very specific way of creating emails and coming out readable on the other side is a great big step - not even talking about the online archive, rss/atom feeds, twitter thingie, etc, etc, etc.

Here's the download to the version I'm now running at the Dada Mail support site:

http://github.com/downloads/justingit/dada-mail/dada-4_0_2-unicode.zip

http://github.com/downloads/justingit/dada-mail/dada-4_0_2-unicode.tar.gz

If you want to check it out via github, the branch is at:

http://github.com/justingit/dada-mail/tree/charset_work

To grab it with git, you have to do this:


git clone git://github.com/justingit/dada-mail.git
cd dada-mail
git fetch
git checkout --track -b your_local_branch_name origin/charset_work


Here's the explanation of all that:

http://groups.google.com/group/github/browse_thread/thread/71f944b925467ab6

There's a guide of what to expect with Dada Mail and unicode/UTF-8 you can read here:

http://dadamailproject.com/support/documentation-4_0_2-unicode/features-UTF-8.pod.html

Which I'll paste the contents of at the end of this message - but you may also want to compare it to the version of this doc for 4.0.2 STABLE:

http://dadamailproject.com/support/documentation-4_0_2/features-UTF-8.pod.html

(Long story short: 4.0.2 UTF-8/Unicode Support: "uhh...")

And, that's about it. This was a hard part of the project, since this is a 10+ y/o codebase - it very much pre-dates even unicode/UTF-8 support in Perl itself, so there's a reason, I guess, why the program was in such bad shape when it came to support it. Many,

many many bugs showed themselves, once this feature was asked for. I think a great majority of them have been solved.

Give it a spin if this interests you and if I can help out with anything, let me know,


--
Introduction

Dada Mail can speak UTF-8 and almost expects that everything else around it does, too.

That means:

• It treats everything it handles as UTF-8
• Everything it returns is in UTF-8
How To Have a Pleasant Experience

If you're installing Dada Mail for the first time, there's nothing you'll need to do, but below are some great guidelines on how to keep your lists configured, so you continue to have a good experience.

If you're upgrading, make sure your configuration reflects the advice below.

It's heavily advised to keep everything in Dada Mail speaking UTF-8 without any real exceptions.

Config Variable: $HTML_CHARSET

By default, the config variable, $HTML_CHARSET is set to, UTF-8

Keep it that way, same case (UTF-8) - same everything.

Dada Mail is only tested with the charset set this way.

Advanced Sending Preferences

Default Character Set

Set this as, UTF-8 UTF-8

Default Plain Text/HTML Message Encoding

There's really only a few choices recommended for Dada Mail.

• 8bit
Should work.

• quoted-printable

If you have any trouble with 8bit, try quoted-printable. Because of the amount of time that Dada Mail creates, tweaks, formats and templates out email messages, the encoding can potentially get mucked up.

This potential mucking-up is mitigated when Dada Mail uses quoted-printable encoding internally. This should be the default for email messages.

Encode Message Headers

Have this option checked.

SQL Backends

Database

PostgreSQL

Encoding for PostgreSQL databases is done when the database is created - make sure to create your database with a, UTF-8 encoding, like so:

CREATE DATABASE dadamail WITH ENCODING 'UTF-8'
MySQL

Nothing you'll have to do.

SQLite

Nothing you'll have to do.

DBM Files

DBM Files have no encoding support, but Dada Mail knows this and compensates.

Schema

MySQL

The MySQL schemas are set to create tables with an encoding of, UTF-8

PostgreSQL

Nothing has changed.

SQLite

Nothing has changed.

Drivers

The current support SQL backends, mysql (MySQL), Pg (PostgreSQL) and SQLite all have different ways to somewhat, "enable" their UTF-8 support.

• MySQL
add,

mysql_enable_utf8 => 1,
has been added to the $DBI_PARAMS hashref.

• PostgreSQL
add,

pg_enable_utf8 => 1,
has been added to the $DBI_PARAMS hashref.

• SQLite
add,

sqlite_unicode => 1
has been added to the $DBI_PARAMS hashref.

No explicit encoding/decoding is done in Dada Mail when saving/retrieving data. Hopefully, the drivers are UTF-8-aware enough.

Plugins/Extensions

The Plugins and Extensions that come with Dada Mail have not been as thoroughly tested as the main program. There's still warts.

Dada Bridge

Dada Bridge has a unique position needing to handle a lot of different stuff thown at it and deal with it gracefully. Dada Mail does, in fact, handle, any realistic character set/encoding you throw at it, but Dada Mail will convert messages it receives to its internal format, before it resends it out to your list.

This means the encoding of your choice (8bit or quoted-printable) and the charset of your choice (as long as your charset is, UTF-8)

Upgrading

You are potentially going to have problems.

Its possible that, since List Settings were never decoded/encoded correctly in past versions, they'll show up the program (once you've upgrade) incorrectly. The easiest thing to do is to edit the mistakes and resave the information. For most of the program, you're going to have to manually export the information and re-import it with the correct encoding, sadly. Dada Mail will probably fail gracefully with old information, but it's possible that you'll see squiggly charaters, instead of what you want to see. There's nothing in Dada Mail that will stop this from happening. If you experience it (from old information), we're not going to count it as a bug, but rather a known issue.

Problems?

Please let us know via the Support Boards:

http://dadamailproject.com/support/boards/

Or the developer mailing list:

http://dadamailproject.com/cgi-bin/dada/mail.cgi/list/dadadev/

Thanks!

See Also:

• The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

http://www.joelonsoftware.com/articles/Unicode.html

• perlunitut - Perl Unicode Tutorial

http://perldoc.perl.org/perlunitut.html

• perlunifaq - Perl Unicode FAQ

http://perldoc.perl.org/perlunifaq.html



--

Post:
mailto:dadadev@dadamailproject.com

Unsubscribe:
http://dadamailproject.com/cgi-bin/dada/mail.cgi/u/dadadev/

List Information:
http://dadamailproject.com/cgi-bin/dada/mail.cgi/list/dadadev

Archive:
http://dadamailproject.com/cgi-bin/dada/mail.cgi/archive/dadadev

Developer Info:
http://dev.dadamailproject.com

No comments: