From Archiveteam
Jump to navigation Jump to search

ReactJS is a popular client-side JavaScript-based web app framework. It was first deployed to Instagram[1], a site notoriously known to be inconvenient to crawl. Sites using ReactJS are inefficient to archive, since the content is only loaded after the scripts and style sheets rather than immediately on HTML-based sites, and some sites use HTTP POST to load the essential content[2], meaning API queries can not be directly submitted to web archives. Sites using ReactJS usually need to be inefficiently rendered through a headless browser or crawled in a web archive format such as WARC or HAR. Machine-readable JSON code returned by API queries can usually be retreived through browsers' web developer tools, but since the code may be oddly formatted, such as on Everipedia with one property per paragraph, it could be necessary to hide junk using regular expressions to make the material consumable for reading.