[00:37] *** ndiddy has quit IRC (Quit: WeeChat 1.4) [00:38] *** ndiddy has joined #archiveteam-bs [01:12] *** RichardG_ has joined #archiveteam-bs [01:18] *** RichardG has quit IRC (Read error: Operation timed out) [01:36] *** RichardG_ is now known as RichardG [01:45] *** ShellyRol has joined #archiveteam-bs [01:57] *** kiskabak has quit IRC (Remote host closed the connection) [01:57] *** kiskabak has joined #archiveteam-bs [01:57] *** Fusl__ sets mode: +o kiskabak [01:57] *** Fusl sets mode: +o kiskabak [01:57] *** Fusl_ sets mode: +o kiskabak [02:36] *** killsushi has quit IRC (Quit: Leaving) [03:21] *** DogsRNice has quit IRC (Read error: Connection reset by peer) [03:48] *** wyatt8740 has quit IRC (Read error: Operation timed out) [03:49] *** wyatt8740 has joined #archiveteam-bs [03:52] *** qw3rty119 has joined #archiveteam-bs [03:56] *** qw3rty118 has quit IRC (Read error: Operation timed out) [04:11] *** wyatt8740 has quit IRC (Read error: Operation timed out) [04:14] *** wyatt8740 has joined #archiveteam-bs [05:01] *** wyatt8740 has quit IRC (Read error: Operation timed out) [07:45] *** Flashfloo has quit IRC (Remote host closed the connection) [07:45] *** Flashfire has quit IRC (Remote host closed the connection) [07:45] *** kiska has quit IRC (Remote host closed the connection) [07:45] *** Flashfloo has joined #archiveteam-bs [07:45] *** kiska has joined #archiveteam-bs [07:45] *** Fusl__ sets mode: +o kiska [07:45] *** Fusl sets mode: +o kiska [07:45] *** Fusl_ sets mode: +o kiska [07:45] *** Flashfire has joined #archiveteam-bs [08:08] *** VADemon_ has joined #archiveteam-bs [08:08] *** JH88 has quit IRC (Read error: Connection reset by peer) [08:08] *** wyatt8740 has joined #archiveteam-bs [08:09] *** JH88 has joined #archiveteam-bs [08:14] *** VADemon has quit IRC (Read error: Operation timed out) [09:18] *** schbirid has joined #archiveteam-bs [09:24] *** m007a83 has quit IRC (Read error: Operation timed out) [09:49] *** DigiDigi has quit IRC (Remote host closed the connection) [10:03] *** DigiDigi has joined #archiveteam-bs [12:24] *** BlueMax has quit IRC (Read error: Connection reset by peer) [13:43] *** Dragnog has quit IRC (Quit: The Lounge - https://thelounge.chat) [13:50] *** Dragnog has joined #archiveteam-bs [14:47] *** benjinsmi has quit IRC (Remote host closed the connection) [14:48] *** benjinsmi has joined #archiveteam-bs [14:56] Thanks @arkiver. Would it be useful if I made a list of the forums that have been migrated and send that over or?. [14:56] yes [14:56] :) [14:56] * arkiver is back in 30 mins [14:56] arkiver: I don't see how an AB job would work for this since the forum listing redirects to the new forums instead. [14:57] Need to bruteforce the thread IDs probably. [14:57] Dragnog is going to try get a list [14:57] and [14:58] we can try to get a list of subforums from for example https://web.archive.org/web/20190601121723/https://us.battle.net/forums/en/wow/ [14:58] than do an !a < job including the https://us.battle.net/forums/en/wow/ URL, and then it should I think get all pages? [14:58] I could be wrong, I´m not very much into archievbot [14:58] archivebot* [15:02] Oh, the subforum listings are still available, right. Yeah, might work. [15:06] Actually, no, --no-parent will still interfere. [15:26] Do our tracker stats get archived anywhere? Trying to figure out how big Vidme was, but the trackers for it (vidme and vidme2) were deleted at some point. It would be quite ironic if we don't still have that data somewhere... [15:35] (For Vidme specifically, there are some snapshots in the WBM, but the stats.json is missing for vidme2, so that's still not helping.) [15:38] (And the stats.json for vidme is from while the project was still active, so that's useless as well.) [15:38] So yeah, who archives the archivists? [15:45] The Overwatch forums roots are currently not redirecting. Here is the list of those forums. [15:46] *** Dragnog_ has joined #archiveteam-bs [15:46] Overwatch https://us.battle.net/forums/en/overwatch/ https://eu.battle.net/forums/en/overwatch/ https://eu.battle.net/forums/de/overwatch/ https://eu.battle.net/forums/fr/overwatch/ https://eu.battle.net/forums/ru/overwatch/ https://eu.battle.net/forums/es/overwatch/ https://eu.battle.net/forums/it/overwatch/ https://kr.battle.net/forums/ko/overwatch/ https://tw.battle.net/forums/zh/overwatch/ https://us.battle.ne [15:46] Sorry still getting use to IRC. Last one got cut off https://eu.battle.net/forums/pl/overwatch/ [16:04] Dragnog_: It cut off before that. Your message ended with "/zh/overwatch/ https://us.battle.ne". [16:04] The webchat thingy is awful. Consider using a proper client instead. [16:12] Ah sorry. I installed The Lounge on aws but vnc wouldn't let me copy from my local machine. I've stuck them in a pastebin instead. https://pastebin.com/BCib0ZXP [16:13] *** Dragnog_ has quit IRC (Quit: Page closed) [16:22] Hearthstone links https://pastebin.com/4QhGYM1h [17:04] Here is the complete list. It appears its only the wow forums which are redirecting. The rest seem to be in tact atm https://pastebin.com/Q2Y3Q2FL [17:48] *** VerifiedJ has joined #archiveteam-bs [19:11] I'm currently using grab-site to archive a site. The URL of the site has a string of randomly generated characters that keep changing so the DUPE detector keeps thinking it's grabbing a new page when it is not. Here's an example URL: http://www.novaworld2.com/index.php?idtag=5d3df300d3ebd&do=/public/forums/ Has anyone dealt with something like this before? [19:12] Not sure if this goes in -bs or -ot [19:24] ShellyRol: if they return the same data, add them to ignores, if they are different, thats okay since they are different anyways [19:28] But some links have unique IDs such as the forums for example: http://www.novaworld2.com/index.php?idtag=5d3df70f29757&do=/public/forums/display_topic/id_13078 but my concern is the ?idtag keeps changing. It would be hard to whitelist every unique link since there are so many [19:42] where does that idtag come from? maybe its a session id and you can get rid of it by allowing cookies? [19:51] I am using cookies to grab the site since a login is required to view most of it. Here's the command I'm using: grab-site "http://www.novaworld2.com/index.php?idtag=5d3cf5e46b94c&do=/public/forums/" --wpull-args=--load-cookies=/home/user/share/Novaworld2/cookies.txt --ua "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36" --no-offsite-links [19:51] --import-ignores /home/user/share/Novaworld2/ignore --import-ignores /home/user/share/Novaworld2/ignore [20:41] *** DogsRNice has joined #archiveteam-bs [20:48] *** schbirid has quit IRC (Remote host closed the connection) [22:30] *** VerifiedJ has quit IRC (Quit: Leaving) [23:12] *** BlueMax has joined #archiveteam-bs