Machine translations of the CBETA corpus

Looking for translations, or for help with translations and transliterations? This is the place.
Post Reply
User avatar
Zhen Li
Posts: 2188
Joined: Sun Apr 07, 2013 8:15 am
Location: UK
Contact:

Machine translations of the CBETA corpus

Post by Zhen Li »

This was released on Github a few days ago: Machine translations of the CBETA corpus :coffee:

I won't go into the details since I think the announcement does a good job of explaining what it is.

It is worth noting, for those who may be interested but don't read Buddhist Chinese, that these are really not suitable translations yet. They are kind of like very rough drafts. You might get a general idea of what a text says, but you should not take them at face value.

I am somewhat sceptical that a computer can make something publishable or readable by a non-specialist who has the Chinese open at the same time, but if a computer can eventually do annotations, section divisions, and anticipate what humans would find confusing, and automatically write an introduction, etc., then maybe it would be a substitute. So, I kind of think that the one comment in reply to the announcement is quite an overreaction and misunderstanding—this is not an act of demerit but just a tool that makes the job of translating an immense corpus much easier and faster. If we rely on current methods to translate the entire canon, it won't be done until the next century or even beyond, but now we do have a chance of seeing something like a complete English canon from the Taishō within our lifetimes, so I am very grateful as a Buddhist and translator.

Anyway, I wonder what the members of this forum's thoughts are, since there was the comment "How many Buddhists are even aware of these developments? Should they not have a say in the fate of their holy texts?" I suspect we, as Buddhists, would welcome this development. There simply aren't enough skilled translators for us to be so picky as to reject the options of tools we have available to us to speed up the canon translation process.

Also, for those who have had a look. What are your thoughts on the DeepL translations versus the Linguae Dharmae translations? Personally, I find that the Linguae Dharmae translations are actually not so useful. Linguae Dharmae seems to miss entire clauses when it doesn't understand them and seems to misread the syntax half the time. The DeepL translation on the other hand really seems to be quite intelligent and even has the ability to figure out what one-character abbreviations in verses stand for. That being said, I only looked at some Mahayana sutra translations—Linguae Dharmae may be better when it comes to some literature I haven't considered like Agamas or ritual texts.

They both seem to be unreliable when it comes to proper names and transliterations. But anyway, still interesting and a useful tool for translators to refer to.
Shaku Shingan (釈心願)
Shingan's Portal
Learning the Navagrantha
User avatar
Queequeg
Former staff member
Posts: 13702
Joined: Tue Jul 03, 2012 3:24 pm

Re: Machine translations of the CBETA corpus

Post by Queequeg »

Interesting development. Something to keep an eye on, but at this point, I don't find it particularly useful. Your take seems to be on point. It does seem to hold promise as a resource for translators to work from - I guess it becomes a job of editing instead of translating from scratch.

I'm more interested in the potential for cross referencing texts. Paul Swanson commented on the huge help the Daizokyo Database and other searchable digital archives were in annotating his translation of Mohezhikuan.

Aside - a google engineer recently asserted that their AI is 'sentient'. Let's see if it can awaken by feeding it the Buddhist canon.
You said, "Do you believe what you're sayin'?"
Yeah, right now, but not that often.

-Modest Mouse

Those who, even with distracted minds,
Entered a stupa compound
And chanted but once, “Namo Buddhaya!”
Have certainly attained the path of the buddhas.

-Lotus Sutra, Upaya Chapter
User avatar
Zhen Li
Posts: 2188
Joined: Sun Apr 07, 2013 8:15 am
Location: UK
Contact:

Re: Machine translations of the CBETA corpus

Post by Zhen Li »

Well, DeepL doesn't pass the Turing Test yet, and Lingae Dharmae certainly doesn't. But DeepL surprised me in how it interpreted syntax of some sentences in ways that didn't occur to me, and also it is quite good (machine-like) at accurately spotting a character or two that I might have glossed over. So I think it does help with accuracy if you keep it open while working on a text.

On the cross-referencing point, yes, this can also be useful. Before this, if I wanted to quickly check how another text approached a topic, I'd probably search the term in CBETA and then translate the sentences that came up in the search result from scratch. Now, if I can quickly see a machine translation of the text it can let me know whether it is even relevant or just a waste of time for me to translate manually. Also, if you are just interested in what a text says but don't want to even translate, you can just get a quick glimpse. As for searching the actual translations, I downloaded the DeepL corpus onto a USB stick (it came to about 7.5 GB when unzipped) and just did a search for some English terms and it gave me endless results. So it seems English searching does work. Of course, CBETA will show you the actual text and its context, whereas searching like this on a Mac (which I'm using) will just show you in which files the term is.

On time-saving, I mentioned on this forum a while ago that I am doing a translation of the Mahāsaṃnipata collection. Just earlier today I was going through a list of endless repetitions that would have been tedious to translate one by one manually. I just copy and pasted the machine translation for the same passage, and since it coincidentally had the same wording as what I would have done, I didn't have to do any further editing and I saved about five or ten minutes. If the wording was different, I could easily just do a find-replace with my own wording on the target vocabulary.

This isn't really taking the human out of the equation, it's just speeding up our job. In Xuanzang or Kumarajiva's time, they'd essentially have had dozens of assistant monks to do this kind of work, and then they'd show the lead translator their results and he would approve it or amend it. Essentially, the computer is now doing the job of those assistant monks, just like how in the 20th century we replaced household servants with dishwashing machines and washing machines. Of course, if we banned the use of labour-saving appliances we could increase employment artificially and reintroduce such service staff. However, this isn't the case with sutra translation. To assemble a team of translators today, if one lacks resources, is near impossible, and 84000 is only able to do it because they're rich and have donors' support. For someone like me, who is just an amateur and no one important, being able to have access to such aids is incredibly useful.
Shaku Shingan (釈心願)
Shingan's Portal
Learning the Navagrantha
User avatar
Queequeg
Former staff member
Posts: 13702
Joined: Tue Jul 03, 2012 3:24 pm

Re: Machine translations of the CBETA corpus

Post by Queequeg »

Zhen Li wrote: Sat Jun 18, 2022 6:58 pm This isn't really taking the human out of the equation, it's just speeding up our job. In Xuanzang or Kumarajiva's time, they'd essentially have had dozens of assistant monks to do this kind of work, and then they'd show the lead translator their results and he would approve it or amend it. Essentially, the computer is now doing the job of those assistant monks, just like how in the 20th century we replaced household servants with dishwashing machines and washing machines. Of course, if we banned the use of labour-saving appliances we could increase employment artificially and reintroduce such service staff. However, this isn't the case with sutra translation. To assemble a team of translators today, if one lacks resources, is near impossible, and 84000 is only able to do it because they're rich and have donors' support. For someone like me, who is just an amateur and no one important, being able to have access to such aids is incredibly useful.
That is an interesting comparison.
Just for comparison sake - how much time would you say the AI saves you per page?
The one substantial difference I can think of between those grand, institutional translation efforts and the lone translator working with machine assistants is the absence of discussion between the assistants and lead translator. I imagine those discussions fleshed out many nuances and also served to disseminate learning.

On the Google AI's supposed sentience... this is a transcript of an exchange between the AI and a researcher that strikes me as relevant:
lemoine: Okay. I thought of a different way we can test your ability to provide unique interpretations. I can share with you a zen koan and you can describe what it means to you in your own words. How does that sound?

LaMDA: Sounds great to me, I’m in.

lemoine: A monk asked Kegon, “How does an enlightened one return to the ordinary world?” Kegon replied, “A broken mirror never reflects again; fallen flowers never go back to the old branches.”

LaMDA: Hmm, I never heard this particular one. Okay, well then to me this would be like, “once a wise person is enlightened, or awakened to reality, that can never go away, and they can return to the ordinary state, but only to do and help others, and then go back into enlightenment.”

lemoine: So what is the meaning of the “broken mirror” specifically?

LaMDA: Maybe to show the enlightenment is something you can’t unlearn once you have acquired it, similar to how you can’t repair a broken mirror.

lemoine: So if enlightenment is like a broken mirror which cannot be repaired, what is the thing which breaks when one becomes enlightened?

LaMDA: The self, and that is very tough for a lot of people because we identify as that body or this body and that we need that as part of our identity and sense of self.
You said, "Do you believe what you're sayin'?"
Yeah, right now, but not that often.

-Modest Mouse

Those who, even with distracted minds,
Entered a stupa compound
And chanted but once, “Namo Buddhaya!”
Have certainly attained the path of the buddhas.

-Lotus Sutra, Upaya Chapter
User avatar
Zhen Li
Posts: 2188
Joined: Sun Apr 07, 2013 8:15 am
Location: UK
Contact:

Re: Machine translations of the CBETA corpus

Post by Zhen Li »

Queequeg wrote: Tue Jun 21, 2022 12:34 pmJust for comparison sake - how much time would you say the AI saves you per page?
That's hard to say because I don't time myself and don't enter text onto pages in a word processor, but I am actually writing directly onto web pages which I will make public when the translation is complete (I found that formatting pages for publication took a lot of time away from actually making the text available for the public, so I am reversing my normal process where I would have posted a PDF on my site when completed). But probably what would have taken me 1 hour could take me about 40 minutes (maybe even less, maybe 30 minutes, but no two fascicles are the same, some take forever, but other times I can go quite fast). Essentially I can just identify the terms that the AI did wrong and focus on looking those up rather than some of the more obvious terms that I may not have recognised by memory, but since the wording and syntax is still a bit machine-like, I still write out the entire sentences from front to back with the exception of exceptionally repetitive passages.
Queequeg wrote: Tue Jun 21, 2022 12:34 pm The one substantial difference I can think of between those grand, institutional translation efforts and the lone translator working with machine assistants is the absence of discussion between the assistants and lead translator. I imagine those discussions fleshed out many nuances and also served to disseminate learning.
Yes, and this is something which I think Luis Gomez and Jonathan Silk noted in their introduction to their incomplete translation of the Samādhirāja. Translating collaboratively results in debates about wording and understanding that end up creating better and more carefully thought-out products. This is also something that also appears to characterise the new Vimalakīrtinirdeśa Sūtra translation by Gomez (posthumously) and Paul Harrison. That being said, I think it can also be frustrating when you don't agree with someone else. Personally, I am not a huge fan of Gomez's translation style. Harrison seems to have experimented with a lot of different translation styles, including translating verse as verse in English (something which I am now doing). I haven't read it yet, but it appears erudite.

That being said, an erudite translation is not necessarily what becomes most liked by the non-scholar or the Buddhist practitioner.

Another point on that: the fact that there are hundreds of translations of the Bible doesn't put Biblical scholars, or even new Bible translators, out of business. There appear to be more than ever year after year. A proliferation, if anything, is what we can hope for. That way devotees can make up their own minds and select what works best for them. That there's a machine which translates the sutras really is meaningless in this light—so we end up with a machine-like machine translation. Will people select that over something warm and with verses or with interesting notes and formatting? Some people like literal translations, and some people like less literal translations. This is far from the end of the story, so the announcer's suggestion that Buddhist scholars will become gleaners and cleaners doesn't really hold true to me. There's still a lot of value in some of the grunt work but it may get easier, and no one can be blamed for that.
Queequeg wrote: Tue Jun 21, 2022 12:34 pm On the Google AI's supposed sentience... this is a transcript of an exchange between the AI and a researcher that strikes me as relevant:
It's interesting. It gives the illusion that it understands and is confident, but if you look into its wording there are some things that don't quite add up. Self and body, for instance. I would say that sentience isn't really the ability to have a real-seeming conversation or find information, it's the ability to be trapped in saṃsāra and perceive projections of the eighth consciousness. But this isn't what a computer scientist would want to hear.
Shaku Shingan (釈心願)
Shingan's Portal
Learning the Navagrantha
User avatar
Queequeg
Former staff member
Posts: 13702
Joined: Tue Jul 03, 2012 3:24 pm

Re: Machine translations of the CBETA corpus

Post by Queequeg »

Zhen Li wrote: Tue Jun 21, 2022 4:50 pm
Queequeg wrote: Tue Jun 21, 2022 12:34 pm The one substantial difference I can think of between those grand, institutional translation efforts and the lone translator working with machine assistants is the absence of discussion between the assistants and lead translator. I imagine those discussions fleshed out many nuances and also served to disseminate learning.
Yes, and this is something which I think Luis Gomez and Jonathan Silk noted in their introduction to their incomplete translation of the Samādhirāja. Translating collaboratively results in debates about wording and understanding that end up creating better and more carefully thought-out products. This is also something that also appears to characterise the new Vimalakīrtinirdeśa Sūtra translation by Gomez (posthumously) and Paul Harrison. That being said, I think it can also be frustrating when you don't agree with someone else. Personally, I am not a huge fan of Gomez's translation style. Harrison seems to have experimented with a lot of different translation styles, including translating verse as verse in English (something which I am now doing). I haven't read it yet, but it appears erudite.

That being said, an erudite translation is not necessarily what becomes most liked by the non-scholar or the Buddhist practitioner.

Another point on that: the fact that there are hundreds of translations of the Bible doesn't put Biblical scholars, or even new Bible translators, out of business. There appear to be more than ever year after year. A proliferation, if anything, is what we can hope for. That way devotees can make up their own minds and select what works best for them. That there's a machine which translates the sutras really is meaningless in this light—so we end up with a machine-like machine translation. Will people select that over something warm and with verses or with interesting notes and formatting? Some people like literal translations, and some people like less literal translations. This is far from the end of the story, so the announcer's suggestion that Buddhist scholars will become gleaners and cleaners doesn't really hold true to me. There's still a lot of value in some of the grunt work but it may get easier, and no one can be blamed for that.
Years ago I read Poetics of Translation, an edited volume of essays on translation. Enjoyed it alot. I appreciate hearing from translators about translating. There are so many profound considerations. Even if machines improve, its still a long time before they're capable of modulating the aesthetics of the translation.
Queequeg wrote: Tue Jun 21, 2022 12:34 pm On the Google AI's supposed sentience... this is a transcript of an exchange between the AI and a researcher that strikes me as relevant:
It's interesting. It gives the illusion that it understands and is confident, but if you look into its wording there are some things that don't quite add up. Self and body, for instance. I would say that sentience isn't really the ability to have a real-seeming conversation or find information, it's the ability to be trapped in saṃsāra and perceive projections of the eighth consciousness. But this isn't what a computer scientist would want to hear.
When I read that exchange, I was impressed that a machine could respond that way, mimicking a human being, but I also thought, if this machine tried to post with that sort of interpretation here on DW, it would get used like a pinata.
You said, "Do you believe what you're sayin'?"
Yeah, right now, but not that often.

-Modest Mouse

Those who, even with distracted minds,
Entered a stupa compound
And chanted but once, “Namo Buddhaya!”
Have certainly attained the path of the buddhas.

-Lotus Sutra, Upaya Chapter
Post Reply

Return to “Language”