[00:01] *** JesseW has quit IRC (Ping timeout: 370 seconds) [00:14] *** DoomTay has joined #archiveteam [00:22] *** Coderjoe has joined #archiveteam [00:31] *** DoomTay has quit IRC (Ping timeout: 268 seconds) [00:42] *** JesseW has joined #archiveteam [01:09] *** BlueMaxim has joined #archiveteam [01:46] *** ravetcofx has quit IRC (Remote host closed the connection) [01:50] *** ravetcofx has joined #archiveteam [02:05] *** kristian_ has joined #archiveteam [02:21] *** Aranje has quit IRC (Ping timeout: 260 seconds) [02:31] *** yipdw_ is now known as yipdw [03:15] *** VADemon has quit IRC (Quit: left4dead) [03:49] *** redlob_ has joined #archiveteam [03:50] *** redlob has quit IRC (Read error: Operation timed out) [04:24] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:29] *** JesseW has quit IRC (Ping timeout: 370 seconds) [04:31] *** Sk1d has joined #archiveteam [04:43] *** JesseW has joined #archiveteam [04:52] *** tfgbd_znc has quit IRC (Ping timeout: 633 seconds) [04:56] *** DoomTay has joined #archiveteam [05:06] *** tfgbd_znc has joined #archiveteam [05:06] *** Honno has joined #archiveteam [05:20] *** JesseW has quit IRC (Ping timeout: 370 seconds) [05:28] *** DoomTay has quit IRC (Quit: Page closed) [05:51] *** TC02 has quit IRC (Read error: Connection reset by peer) [05:51] *** TC02 has joined #archiveteam [05:55] *** vitzli has joined #archiveteam [06:03] *** dashcloud has quit IRC (Read error: Operation timed out) [06:07] *** dashcloud has joined #archiveteam [06:41] *** tomwsmf has quit IRC (Read error: Operation timed out) [07:13] *** midas1 is now known as midas [07:30] *** signius has quit IRC (Read error: Operation timed out) [07:31] *** vitzli has quit IRC (Quit: Leaving) [07:35] *** brayden has joined #archiveteam [07:35] *** swebb sets mode: +o brayden [07:37] *** brayden_ has quit IRC (Read error: Operation timed out) [07:45] *** signius has joined #archiveteam [07:45] *** redlob_ has quit IRC (Read error: Operation timed out) [07:51] *** redlob has joined #archiveteam [08:16] *** kristian_ has quit IRC (Leaving) [08:58] *** z00nx has joined #archiveteam [09:05] *** schbirid has joined #archiveteam [09:29] *** WinterFox has joined #archiveteam [09:58] *** dashcloud has quit IRC (Read error: Operation timed out) [10:07] *** dashcloud has joined #archiveteam [11:17] hey guys, does anyone know of anyone archiving old .plan files from famous-ish people? e.g. John Carmack's .plan is widely available, but apparently some other ID people published them too [11:18] i have a lot [11:20] buuuut i am on and off working on turning them into a nice interface or twitter bot so i am too keen on sharing =( [11:22] bluesnews has a nice archive [11:27] schbirid: thanks [11:27] schbirid: please don't die or something before sharing :P [11:27] :D [11:28] yeah schbirid, dont do that. [11:28] midas has no rights to say anything about this kind of stuff until the jamendo stuff is up [11:28] ill shut up [11:28] :p [11:29] next week ill be switching back to my old ISP, so fast internet again [11:38] *** Stiletto has quit IRC (Read error: Connection reset by peer) [11:39] *** Stiletto has joined #archiveteam [11:44] aww sorry for the failed switch [11:55] *** dashcloud has quit IRC (Read error: Operation timed out) [11:59] *** dashcloud has joined #archiveteam [12:06] *** espes__ has quit IRC (Read error: Connection reset by peer) [12:06] *** yeoldeto1 has quit IRC (Read error: Connection reset by peer) [12:17] *** yeoldetoa has joined #archiveteam [12:18] *** espes__ has joined #archiveteam [12:22] *** Sanqui has quit IRC (Remote host closed the connection) [12:27] *** Sanqui has joined #archiveteam [12:56] *** BlueMaxim has quit IRC (Quit: Leaving) [13:00] *** WinterFox has quit IRC (Ping timeout: 492 seconds) [13:42] *** Froggypwn has joined #archiveteam [13:50] *** sep332 has quit IRC (Quit: konversation out) [14:00] *** sep332 has joined #archiveteam [14:39] *** JesseW has joined #archiveteam [15:01] The sites described here are probably worth a look by ArchiveTeam (ping godane): http://listheory.prattsils.org/cataloging-plunder-thoughts-on-the-digital-text-sharing-underground/ [15:04] We archived UbuWeb some time ago. [15:15] cool [15:15] excellent [15:18] i put this url in archivebot https://www.memoryoftheworld.org/ [15:19] we only have one page of that site archive based on status [15:25] *** tomwsmf has joined #archiveteam [15:29] godane: thanks! [15:32] *** JesseW has quit IRC (Quit: Leaving.) [15:32] *** JesseW has joined #archiveteam [15:51] *** Aranje has joined #archiveteam [15:57] *** JesseW has quit IRC (Ping timeout: 370 seconds) [15:58] *** K4k has quit IRC (Quit: WeeChat 1.5) [15:58] *** K4k has joined #archiveteam [16:06] (It's 6.2tb of material left on the machine, although I have no idea how much of that is, say, some backups) [16:06] *** DoomTay has joined #archiveteam [16:07] Or the aforementioned "Pipes". For example, the Hip-Hop pipe is 257gb for some reason. [16:07] Ha ha, I see why. [16:08] (There's a 230gb project in it, subdirectory.) [16:16] Oh see, 13gb of the 27gb remaining were hip-hop albums I forgot to shove in! (They're portions of the albums that an ill-advised combining of two archives happened) [16:32] *** redlob_ has joined #archiveteam [16:33] *** redlob has quit IRC (Read error: Operation timed out) [16:36] So Lego CHIMA basically concluded back in 2015, so there's no telling how much longer http://www.lego.com/en-us/chima will stay up. The thing about sticking it in ArchiveBot is that it seems it just won't download the actual video files despite giving it PhantomJS AND youtube-dl AND UA spoofing [16:46] *** JW_work1 has quit IRC (Quit: Leaving.) [16:52] *** bauruine has joined #archiveteam [17:02] *** AlexLehm has joined #archiveteam [17:04] *** JW_work has joined #archiveteam [18:30] *** DoomTay has quit IRC (Quit: Page closed) [19:17] *** nicolas17 has joined #archiveteam [19:18] hi archivers [19:18] the forum is the only part of the OpenStreetMap infrastructure that isn't managed by the main OSM operations team, and its administrator seems to be Missing In Action, I think nobody else has server access [19:19] they are at the point of considering setting up a new forum and pointing the forum.openstreetmap.org hostname to it [19:20] but that means losing the existing data [19:37] nicolas17: is there a link to the discussion about setting up a new forum? [19:42] sec [19:43] gee it'd be nice if gmane was up [19:43] ha. it's being worked on [19:43] https://lists.openstreetmap.org/pipermail/talk/2016-August/076580.html last post [19:45] if I'm going to scrape, looks like I can get the raw bbcode by logging in and trying to quote a post [19:50] we'll do a scrape in WARC format [19:50] it should be available in a couple of days [19:51] I once archived a small forum with plain old wget -r, and it got a *lot* of redundant stuff, like following links that returned the same thread in a different order, or thread?id=1 and post?id=2 giving pretty much the same content [19:53] isn't there often a url parameter you can pass to those sorts of forums to get the crawler-friendly page, with less links, all canonicalized? [19:53] or maybe based on googlebot useragent [19:53] hm maybe [19:54] that crap I archived with wget was an old phpbb [20:03] bai: I just tried setting googlebot UA and I get the same page [20:08] yeah I think there's a url parameter, like when you click a forum post link on google and you get the black-and-grey-on-white printer friendly view [20:09] not having much luck searching for what that option is though [20:09] are you sure this software supports such thing? :P [20:09] also that may be specifically phpBB or one of the other popular ones, dunno about fluxBB [20:09] ah [20:23] nicolas17: yeah we have a set of options to make crawling forums work much better [20:32] *** DoomTay has joined #archiveteam [20:52] i archived the forums some months ago, can't find any trace of it though =) [20:52] iirc it was a bit harder than usual [20:52] but otehrwise standard forum stuff [20:53] "Digital objects last forever – or 5 years, whichever comes first" [20:56] if no one else raises their hand, i will start a wget for it right now [20:56] schbirid: I interpreted JW_work's message as hand raising already? [20:57] schbirid: go for it [20:57] schbirid: I just started a #archivebot job [20:57] but duplicate is probably fine [20:57] yeah :) [20:57] esspecially as the archivebot job seems to have failed... [20:58] on it [21:00] man, fluxbb is nice and clean [21:06] JW_work: yes, there's a reason why it failed [21:06] !a http://http://forum.openstreetmap.org/ --ignore-sets=forums [21:07] yeah, I see that now :-/ [21:07] runs well here [21:09] I wonder what would happen if the site gets overwhelmed. BZPower would say "The servers are too busy to handle your request". No idea what status code it would return though [21:12] *** MMovie has quit IRC (Read error: Connection reset by peer) [21:18] *** MMovie has joined #archiveteam [21:21] *** RichardG has quit IRC (Read error: Operation timed out) [21:28] *** RichardG has joined #archiveteam [21:35] running nice and smooth, good night for now [21:35] *** schbirid has quit IRC (Quit: Leaving) [21:39] *** RichardG has quit IRC (Ping timeout: 370 seconds) [21:57] *** Honno has quit IRC (Read error: Operation timed out) [22:17] *** Gfy has quit IRC (Read error: Operation timed out) [22:27] *** Gfy has joined #archiveteam [22:45] *** redlob_ has quit IRC (Read error: Operation timed out) [22:46] *** RichardG has joined #archiveteam [22:51] *** redlob has joined #archiveteam [23:02] *** DoomTay has quit IRC (Ping timeout: 268 seconds) [23:39] *** AlexLehm has quit IRC (Ping timeout: 260 seconds) [23:40] *** RichardG_ has joined #archiveteam [23:41] *** RichardG has quit IRC (Ping timeout: 370 seconds) [23:42] *** ZeoNet has joined #archiveteam [23:47] *** RichardG_ has quit IRC (Ping timeout: 370 seconds) [23:48] *** RichardG has joined #archiveteam [23:58] *** ZeoNet has quit IRC (Ping timeout: 244 seconds)