Jump to content
Chinese-Forums

Translating a novel in pdf form with Chatgpt / AI into Chinese


Recommended Posts

Posted

There are a lot of books that I have wanted to read in my own language(English) over the last several years, but since I like to use my free time on Chinese, I've have put them to the side. There is a particular novel that I really couldn't wait any longer to read, but I couldn't find this book anywhere translated into Chinese. So it dawned on me last week that I could feed it into ChatGpt or Deepseek and ask it to translate it for me. I decided to give it a try with Chatgpt(because it's on my laptop) and instructed it to grade it to Hsk 5 - ish level. It's working pretty swimmingly. I can hover over new vocabulary for pinyin and English, ask it questions about grammar or words, have it ask me comprehension questions, discuss the book, etc. 

 

I'm a novice with AI. I was afraid to stick the whole 470 page pdf in at once because: 1) I don't know if it will tell me there are copywrite issues with translating a book. 2) I don't know if a big document like that will have ChatGpt say I've used up all my subscription for the month. So what I've been doing, to err on the side of caution, is to screen shot and feed in 5 pages at a time. This is not too hard, but it is a bit of extra work I'd rather not do.

 

So my question is: Would just feeding a whole pdf novel in and telling it to translate pages 1-5, read those and then tell it translate 6-10 etc, work well?

  • Like 1
Posted

I think you're quite correct on multiple fronts here: it's not going to handle a 470-page pdf well, for the reasons you list, and also if you have really long chats with ChatGPT, the window becomes very sluggish to the point of not being usable.  And you'll have to re-upload the 470-page pdf again and again.

 

There'll be some terminal command you'll be able to copy/paste to automatically split the 470-page pdf into smaller pdfs of 5-pages in length.  (Ask ChatGPT; it's good at this stuff.)  If you're able to do this, it saves the legwork, and you can upload whichever part of the pdf you're up to in a fresh chat with ChatGPT.  (I do "Projects" so they're all together.)

 

I tried Doubao, and it's far less restricted as ChatGPT in terms of volume (although the writing style will be more "native speaker who grew up in China" Chinese, rather than "native speaker who grew up in the USA" Chinese, if that makes sense).  With the 帮我写作 mode, it will write in a kind of document that resembles Word, and the AI will see those edits in the chat.  So you can read, and highlight in bold the bits you want help with (without interrupting the flow of reading), and then ask the Doubao to explain them when you're ready.

 

  • Like 2
  • Helpful 1
Posted
On 10/20/2025 at 6:46 AM, suMMit said:

1) I don't know if it will tell me there are copywrite issues with translating a book

Speaking as a translator: there are no copyright issues with translating a book for your own use, as long as you don't publish it. But there are serious copyright issues with feeding a book into ChatGPT (or any LLM) without the author's and other right holders' permission. As I understand LLMs, they will use anything you feed into it as training data, and it's simply not allowed and morally wrong to give away a text that is not yours to a LLM. This also goes if you feed it into the LLM five pages at the time. If you don't hold the rights to give the entire text to the LLM's company, you are basically stealing it on behalf of the LLM's owner.

 

I am aware people will do things like this anyway (book pirating isn't new either), but now you know.

  • Like 2
  • Helpful 1
Posted

I've done paragraph-by-paragraph translation using Gemini API with a small script, it couldn't have been easier. First I converted the PDF to text (using command "pdftotext -layout input.pdf output.txt" in Ubuntu). The text output was a bit messy so I spent some time cleaning it up manually. Then I asked Gemini assistant to create a Python script that reads a textfile paragraph-by-paragraph, translating each paragraph using the API and printing out the result. Finally I configured my API key as instructed by the assistant and ran the script locally on my Ubuntu laptop. The whole process took maybe 20 minutes and half of that was cleaning up the pdftotext output. Works very similarly with any other major AI provider I believe.

  • 2 weeks later...
Posted

Yes, feeding the whole PDF and asking for pages 1–5, then 6–10, etc., works perfectly—and you can even turn it into a reusable app or script.

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...