Namuwiki

From Archiveteam
Jump to navigation Jump to search

namuwiki (나무위키, means 'tree wiki') is the largest Korean language wiki. Due to its cultural significance and value it's worth preserving, but there are many anti-bot measures in place making this difficult.

History

Rigveda Wiki (previously named Enha Wiki) founded in 2007 was originally the most widely used wiki on the Korean web. However, in 2015, it was found that Rigveda Wiki's admins had violated the terms of CCL 2.0 by secretly changing their ToS, privatizing users' contribution. This led to the creation of namuwiki, which at that time promised transparent and non-profit operation. Unlike other new wikis at the time, namuwiki crawled the entire Rigveda Wiki including its edit history. (Rigveda Wiki lost most active contributors since, and shut down in 2023 without announcements)

In 2016 umanle S.R.L., a paper company in Paraguay acquired the wiki from its creator namu. (namu still develop/maintains the wiki software to this day) At the time, umanle promised to continue operating namuwiki as a non-profit without direct intervention, while also attempting to secure profitability by launching forum website arcalive(then namulive). However in 2017 they discontinued their democratic administrator election program, and started censoring specific documents and blocking users who complained. As of 2018 their initial missions of transparency was long gone, and advertisements were added.

Unlike Rigveda Wiki which operated inside South Korea and complied with its regulations, namuwiki is operated in Paraguay by a paper company 'umanle S.R.L'. Because of this, namuwiki also contains some legally questionable (via South Korean law) information. This is the primary reason why its long term stability cannot be guaranteed.

Since 2023 there's en.namu.wiki and ja.namu.wiki which provides English/Japanese machine translated version for all documents. The URL still remains Korean, there are missing features (history/edit/raw/...) and the translation is a mess.

Anti-bot measures

Since about 2020, the site is filled with anti-bot measures, making backups extremely difficult.

  • Entire site is an heavily obfuscated webapp, dynamically rendered
  • Using facebook bot user agent "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)", it's possible to get server-side rendered pages
    • GoogleBot and TwitterBot also used to work but stopped working circa 2023
    • this only works for normal wiki pages and edit history or raw doesn't work
    • Some documents can't be accessed, since these UAs get perm:bot and the wiki's ACL feature can prevents access for certain perm groups.
  • Even when using automated web browser, it rate limits and blocks IP temporarily after about 30 pages in a minute
  • Tor and VPN is blocked, some non-korean residential IPs are also blocked
  • Captchas are frequent, almost always shows when accessing /raw/. reCAPTCHA and hCaptcha is used. Logging in doesn't fix this, though it may be less frequent?
  • Cloudflare enabled

Dumps, other crawling attempts

Until 2021, namuwiki officially provided dump files of the whole wiki in json format. These were originally provided inside namuwiki along with a few mirrors, but the official page is now gone. Link to mirror

An unofficial dump is still being made (most recently June 2025) by thewiki operator derCSyong, however it is not publicly available. (Locked with a password, needs "contribution points" to unlock, not sure of its details)

  • Looking at the author's Github, it seems like they used an Android device connected through ADB and automated Chrome on it to fetch /raw/ URLs

There's a mirror site operated by an unknown person at namu.moe. Similar sites have existed in the past, but this is the only one still surviving. It does not keep edit history, and only shows the most recent revision. Edits from namuwiki seem to be applied within 5 minutes. (Seems like it's watching RecentChanges and somehow grabs raw versions of all documents without being throttled) Since a few years ago the site no longer mirrors images, and shows broken HTML due to not supporting recent namuwiki syntax.

What to grab

Ideally, grabbing /raw/ pages and diff/blame for all edit revisions for all documents along with images would be ideal, but this is likely very difficult because of their anti-bot measures. Though, it seems like finding the list of documents to grab wouldn't be much problematic since the list of shortest and longest documents appears to be infinite and not heavily rate limited. Plus, there are normal links and backlinks.

Temporary measures

Temporary measures send email to umanel>admin change document into redirect to each report document>no countributer of document make appeal for one month,the document will be go to trash can namespace and only admin can read it

rarena?

There are suspicions that the management blocks people who don't like them indefinitely for obstruction of operation. document on namu's other wiki

Don't use outside communities

When it was revealed that the private managers' meeting passed a revision of the regulation that defines the identification of the wiki and the activities outside as external intervention and obstruction of operation that abused the identification of the wiki, external communities were overturned. In the end, instead of permanent nicknames, anti-subpermanent nicknames and vpn,tor,isp ip s flows are the main, erasing the Gravata to post the picture and prohibiting writings such as that they are ~ing themselves. In addition, some sensitive topics such as politics and AI obscenity were transferred to the 대헌장 anonymous mini gallery. Related community announcements

See also