Starting an MDMP Unicode conversion

This post offers some advice to people embarking upon their Unicode process, it is definitely not an exhaustive list of things to do.
Your company or client has decided to do either an Upgrade and Unicode conversion or just a straight Unicode conversion of your MDMP System, the first piece of advice I will give you, my technical colleagues, is be prepared for a wild ride and to be saddled with many responsibilities.
An MDMP system is a system which serves many different countries where the languages cannot be displayed using the default SAP 1100 code page, as a result different codepages were introduced to expand the number or characters applications could support. Unicode is capable of displaying every character in every language, so it simplifies many of the system operations.
Due to the data intensive nature of the process it is necessary to have a cross discipline team to be responsible for this part of the process.
Having been through this process, I would define my project dream team as having the following members
1. Basis consultant - responsible for the running of the Unicode conversion process
2. Data migration consultant - responsible for ensuring that the language scans and processes are done properly and vocabularies are properly maintained. (this could also be an internationalisation expert (I18N), but there are not many around)
3. A client representative - responsible for talking to the business to determine the data flow of processes, as well as how the tables and data is being used
4. Language assignment team - responsible for assigning the unknown words in the vocabulary to a language and a codepage.
5. Good team lead with a strong technical background.
This is a wish list, and is only based on a single project, but I probably got much heavily involved than many basis consultants. Effectively I took on the roles of 1,2 and 5, from the list above.
I am going to break down the timeline for our Unicode conversion process by each system.
First we ran the process against DEV, this was challenging because 4.6C does not support the full pre-Unicode tools, there is no SPUMG or UCCHECK - instead there is a limited tool called SPUM4 which is used to scan every word in every record in every table to ensure that it has a language assigned to it. If a word in a record is detected as not being present in a vocabulary, then it is flagged as needing assignment.

We engaged SAP and received assistance from one of their I18N experts, it would be fair to say he wrote the book on the SAP Unicode process. With Nils we ran the scans throughout the system, and started the data analysis of the results. We found a very nasty surprise within the vocabulary - users from Russian had been entering data in a non-standard way, the users had been using the I18N settings as shown below. This meant that the data in the system was effectively using Microsoft Russian codepage ASCII values, not SAP codepage ASCII values. This meant that if a word was not assigned correctly between the codepages, the ASCII value of the letters will be wrongly converted and the word will be corrupted.

After much deliberation, we established that the data within the DEV system was not great, and we needed to know the scale of the problem, so we completed the language assignment to within 10000 unknown unique words and ran the CUUC process. Once the conversion was completed we found massive levels of corruption in the database, far too much to fix using SUMG.
We learnt a valuable lesson about the Russian data entry, that it was going to be one of the major challenges throughout the process. We also decided to use copies of Production to improve our data quality and provide a more iterative approach to the vocabulary conversion.
The project team obtained a copy of Production and copied it back to create a new QAS system. We began the process anew, but this time we did three new things,
Introduced a new codepage, this codepage is designed to accommodate the Microsoft Russian words (English/Russian)
Repurposed a language using the same codepage as SAP Russian (codepage 1500), in this case BG - this became our SAP Russian (Russian/Russian)
Add both SAP Russian (RU) and English (EN) to the ambiguous language list. This means any word the system recognises as being either language is placed in the vocabulary to be checked.

Once we got within our comfort zone of 10000 unique words, we executed the CUUC with worse results than the previous run. This was because a table UMGCCTL (the Unicode control table) became corrupted during process and meant that all the tables were converted using codepage 1100 as the R3load process could not determine the correct codepage for each record. This was a horrible turn of events as the technical team gave up much of their Christmas to complete the CUUC process, but there was a silver lining.
We had another items to check to ensure a correctly running export and conversion. It also prompted the project to grant another 2 attempts at conversion, and also a repeat of the Unicode process on QAS. This time it was completed successfully, but we had chosen the wrong road when assigning English and Russian/Russian as ambiguous.

At this point the project team were a little bruised and battered, but we had learnt a great deal about the process, these lessons would give us a great deal of confidence in the later phases, because you can learn more from your mistakes than you can from your successes.

SAP Developer Network Latest Updates