During my last blog post, I talked about how my previous project started an MDMP Unicode conversion. At this point in the story, we had completed two Unicode conversions, but without much success on converting the data correctly.
With the help of Nils from SAP, the project team sat down and took a long hard look at the problem, as well as where our pain points lay. We had enough data to establish that our big pain points were Financial accounting tables for the Russian codepage issues, also the client had some custom tables which were pain points.
The project team took a number of decisions from this analysis
1. Another person needed to be responsible for the Unicode conversion language assignment, because I was becoming a Single Point of Failure - I understood the process better than anyone else on the team. This also reduced the pressure of two heavily technical roles to a single role. (We actually ended up with two people, which I have to say is better.)
2. We would not attempt any archiving or data migration steps to try and correct the data, because we knew where we had problems. If we had done any data migrations, we ran the risk of changing our known problems into unknown problems.
3.Setting only English as ambiguous
4. Executing two more dry runs
5. Using SUMG to repair data, not the reprocessing logs from SPUM4, this was because the Reprocessing logs were not as controllable as SUMG.
We had set the stage for another attempt, this time we were confident we could get it right. So we ran the CUUC process, and ended up with a system that was actually quite useable :-)
There was a lot of stuff to fix in SUMG, but we had accomplished a Unicode conversion and most of the data was intact.
This was for three main reasons
1. I stopped trying to do too much at once, during the CUUC process I executed all the technical steps. Also I stopped trying to optimise the Unicode process for speed, we were much more concerned with data quality - so we decided to take the hit on the performance.
2. We executed a lot of work on the vocabulary assignment for unknown words and also we delved deeply into the Reprocessing scans. Using the reports um4_analyse_replog and um4_replog_stats, myself and the language team worked hard to resolve collisions and unknown words.
It is very important to note here that quite often this process does not yield a correct assignment for a word, especially with related languages like Polish and Russian. The best that someone can hope for is to achieve the least wrong answer, which does not create too many reprocessing logs.
3. Better use of table comparison tools to automate comparison of data from pre and post conversion extracts
Once we delivered the 3rd system, the response from the business and the project team was one of greater confidence, we now had a process that worked and we knew the data well enough to go for our final dry run before Production.
We set out very clear criteria for our final dry run, it had to meet strict conversion completeness standards as well as run to time without major incident.
The process started with a copy of Production, upon which we executed a final round of language assignment, again bringing the total of unique words below 10000. We then tackled the reprocessing logs, here we did not want tables with over 100000 entries in them as this would produce too many reprocessing logs for SUMG to handle.
Next we ran the CUUC process just like we would on Production, which completed successfully. Following extensive testing, it was found that a few of the tables had failed their conversion threshold - but the team was confident that these could be fixed in Production.
At this point we were flying high and the Steering committee had given permission to start preparations for PRD, which I will address in my next blog post.
With the help of Nils from SAP, the project team sat down and took a long hard look at the problem, as well as where our pain points lay. We had enough data to establish that our big pain points were Financial accounting tables for the Russian codepage issues, also the client had some custom tables which were pain points.
The project team took a number of decisions from this analysis
1. Another person needed to be responsible for the Unicode conversion language assignment, because I was becoming a Single Point of Failure - I understood the process better than anyone else on the team. This also reduced the pressure of two heavily technical roles to a single role. (We actually ended up with two people, which I have to say is better.)
2. We would not attempt any archiving or data migration steps to try and correct the data, because we knew where we had problems. If we had done any data migrations, we ran the risk of changing our known problems into unknown problems.
3.Setting only English as ambiguous
4. Executing two more dry runs
5. Using SUMG to repair data, not the reprocessing logs from SPUM4, this was because the Reprocessing logs were not as controllable as SUMG.
We had set the stage for another attempt, this time we were confident we could get it right. So we ran the CUUC process, and ended up with a system that was actually quite useable :-)
There was a lot of stuff to fix in SUMG, but we had accomplished a Unicode conversion and most of the data was intact.
This was for three main reasons
1. I stopped trying to do too much at once, during the CUUC process I executed all the technical steps. Also I stopped trying to optimise the Unicode process for speed, we were much more concerned with data quality - so we decided to take the hit on the performance.
2. We executed a lot of work on the vocabulary assignment for unknown words and also we delved deeply into the Reprocessing scans. Using the reports um4_analyse_replog and um4_replog_stats, myself and the language team worked hard to resolve collisions and unknown words.
It is very important to note here that quite often this process does not yield a correct assignment for a word, especially with related languages like Polish and Russian. The best that someone can hope for is to achieve the least wrong answer, which does not create too many reprocessing logs.
3. Better use of table comparison tools to automate comparison of data from pre and post conversion extracts
Once we delivered the 3rd system, the response from the business and the project team was one of greater confidence, we now had a process that worked and we knew the data well enough to go for our final dry run before Production.
We set out very clear criteria for our final dry run, it had to meet strict conversion completeness standards as well as run to time without major incident.
The process started with a copy of Production, upon which we executed a final round of language assignment, again bringing the total of unique words below 10000. We then tackled the reprocessing logs, here we did not want tables with over 100000 entries in them as this would produce too many reprocessing logs for SUMG to handle.
Next we ran the CUUC process just like we would on Production, which completed successfully. Following extensive testing, it was found that a few of the tables had failed their conversion threshold - but the team was confident that these could be fixed in Production.
At this point we were flying high and the Steering committee had given permission to start preparations for PRD, which I will address in my next blog post.