From the Bangkok Post:
The world’s largest literacy project has been launched starting from Thailand doubling the size of the Thai language Internet almost overnight with a focus to expand to other countries in Asia using an advanced form of machine translation to translate freely available content into local language.
AsiaOnline has launched a portal, asiaonline.net, which translates much of the knowledge and information on the Internet first into Thai, and later into other South-East Asian languages.
At the heart of the project is a particular approach to machine translation championed by the project’s founder:
At launch, with one million pages, the engine is expected to be 70 percent understandable with 30 percent good quality translation. With five million pages, its translations should be 100 percent understandable and 50 percent considered high quality. The system will learn from its mistakes and will not make the same mistake twice.
This is where the nation building exercise comes in. Wiggins explains that other SMT systems on the market get their learning material from the open Internet on the premise that the good data will float up above the bad. However, in practice what these systems are getting is bad data often from bad machine translation that the SMT learns from.
AsiaOnline takes a different approach with strict control over the learning material. New material will be proofread three times by volunteers with the system handing out the same material three times over different days to prevent collusion. The best translation (two out of three or three out of three) is then selected and forwarded to a final proofreader to approve.
AsiaOnline will offer the material for free, but does appear to be a for-profit entity, funded by ad revenue. I’ll try to find out more about this venture in the coming days and post the information here.