I hate it when the internet goes down, or I can’t get cell reception. We tend to take it for granted that we have access to insane amounts of information from our phones, and only appreciate it when it’s gone. I wanted to create something that would let me access some of that information even when I was offline, so I decided to turn my old Kindle into a Wikipedia archive.
Figuring out what offline Wikipedia actually meant
I wanted a true offline archive
The first thing I had to do was decide what I actually wanted. It would be possible to create an archive of Wikipedia and host it on my local network. I could then access all that information from my Kindle, even if my ISP was down and I couldn’t get online.
This still wasn’t a true offline Wikipedia archive, however. It would only work over my local network, and if I made it accessible from anywhere, it would still be reliant on having an internet connection.
Instead, I decided that I wanted to fit as much of Wikipedia as I could directly onto my Kindle. That way, I could read it when the internet was down or in the middle of a forest with no phone signal. It would be like having a small slice of the internet that I could carry around with me.
- Storage
-
16GB
- Screen Size
-
6-inches
Even in the budget department, the Amazon Kindle is a stellar value, from its light and compact design, to its adjustable front light and 6-inch display.
Wikipedia is just too big for my Kindle
10,000 articles was the sweet spot
The problem is that my ancient Kindle 4 only has 2 GB of storage, with Amazon listing approximately 1.4 GB available to the user. Once the OS, KOReader, and my other files had taken up their share, that left me just over 1 GB to play with. Even a stripped-down, text-only copy of the whole of Wikipedia would be too big to fit.
Thankfully, Wikipedia has a project called Vital articles, which ranks all the most essential topics into nested tiers. Level 1 contains the ten most vital articles; Level 2 has 100; Level 3 has 1,000; Level 4 has 10,000; and Level 5 has 50,000. Level 4 turned out to be the best fit at around 400 MB without images, with Level 5 too big to fit.
Having 10,000 of the most important Wikipedia articles on my Kindle seemed like a reasonable compromise. Level 4 includes pages on People, History, Geography, Arts, Everyday life, Philosophy and religion, Society and social sciences, Biological and health sciences, Physical sciences, Mathematics, and Technology, which covers a lot of bases. At six minutes per article, it would take me around 1,000 hours to read it all, so that seemed like plenty to me.
Fetching the content
Pulling the full list of 10,000 articles
As yet, I didn’t know what those 10,000 articles were, so using the MediaWiki Action API, I was able to get the full list of article titles. The next challenge was to get the content of those 10,000 articles so I could put them onto my Kindle.
Fetching the content using the /page/{title}/html endpoint worked out at nearly 1 MB per article, so I switched to using the /page/mobile-html/{title} endpoint, which cut the file sizes by more than half. The problem was that the format is designed for mobile, so the sections were expandable, meaning the main content was hidden inside collapsed sections. By stripping out the hidden attribute, I was able to download the full content of each article.
Turning HTML into an eBook
A few very big books didn’t work
The final stage was turning all of the HTML from those 10,000 articles into something I could read on my Kindle. I used the command-line version of the Calibre eBook management software to convert the HTML files into EPUB format, which I could read on my jailbroken Kindle using KOReader.
My first attempt was to try to convert each section of the Vital Articles (such as People) into its own EPUB file, but this took a long time, and my Kindle struggled to open the large files. I went back to the drawing board and instead made each individual article into its own tiny eBook, sorted into folders that matched Wikipedia’s categories.
I uploaded the entire folder of files to my Kindle, and now I have the full set available to read even when I’m offline. I can navigate through the folders to the article I want, open it up, and read Wikipedia without touching the internet at all.
An offline archive can be very useful
We don’t always have access to the internet, so relying on it seems like an unwise choice. It’s good to know that even when AI becomes self-aware and destroys all of humanity’s web servers, I’ll still be able to read Wikipedia’s Artificial Intelligence article offline, so I can figure out how to save humankind. I imagine I just have to make it play tic-tac-toe against itself or something.



