[00:15] *** pizzaiolo has quit IRC (Read error: Connection reset by peer) [01:08] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [01:15] *** Sk1d has joined #archiveteam-bs [01:36] *** j08nY has quit IRC (Quit: Leaving) [02:40] *** trs80 has quit IRC (Remote host closed the connection) [04:09] *** sheaf has joined #archiveteam-bs [04:19] GLaDOS: did you start up ipernity too on tracker? [04:20] *** SadDM has quit IRC (Read error: Operation timed out) [04:20] *** jspiros has quit IRC (Read error: Operation timed out) [04:21] because the last time it was halted on 4/8 or 4/9, it was due to a bunch of 403s [04:21] *** dboard has quit IRC (Read error: Operation timed out) [04:23] nah, i dont believe i did [04:26] well someone set it off again [04:26] because items are flowing [04:26] and they're all 0 MB, just like the last time [04:27] it was about 1300 ut today that they started flowing again [04:29] and the urls do still seem to be 403s [04:31] *** ndiddy has quit IRC () [05:21] *** jspiros has joined #archiveteam-bs [05:21] *** dboard has joined #archiveteam-bs [05:25] *** DFJustin has quit IRC (Remote host closed the connection) [05:25] *** SadDM has joined #archiveteam-bs [05:25] *** swebb sets mode: +o SadDM [06:11] *** DFJustin has joined #archiveteam-bs [06:38] *** yakfish has quit IRC (Remote host closed the connection) [06:51] *** sheaf has quit IRC (Quit: sheaf) [07:03] *** sheaf has joined #archiveteam-bs [07:33] *** sheaf has quit IRC (Quit: sheaf) [07:47] *** Gfy has joined #archiveteam-bs [08:17] *** schbirid has joined #archiveteam-bs [08:27] *** BlueMaxim has quit IRC (Quit: Leaving) [08:30] *** icedice has joined #archiveteam-bs [09:19] *** vitzli has joined #archiveteam-bs [09:33] *** GE has joined #archiveteam-bs [09:38] *** DFJustin has quit IRC (Remote host closed the connection) [09:41] *** DFJustin has joined #archiveteam-bs [10:06] *** icedice has quit IRC (Quit: Leaving) [10:35] *** j08nY has joined #archiveteam-bs [10:52] *** GE has quit IRC (Remote host closed the connection) [11:11] I set it off again [11:11] wp494 ^ [11:11] the 403 are good [11:11] previously all items with a 403 failed, which left us with a bunch of items that would return 403 [11:11] and now that items with 403 are not failing we have to go through them [12:54] *** GE has joined #archiveteam-bs [13:05] *** sheaf has joined #archiveteam-bs [13:53] *** pizzaiolo has joined #archiveteam-bs [14:38] *** yakfish has joined #archiveteam-bs [15:35] *** pizzaiolo has quit IRC (Ping timeout: 506 seconds) [15:40] *** fie has quit IRC (Read error: Operation timed out) [16:15] *** powerKitt has joined #archiveteam-bs [16:48] *** vitzli has quit IRC (Quit: Leaving) [16:48] *** powerKitt has quit IRC (Quit: Page closed) [17:03] Who's uploading "localnewsarchive" to FOS? [17:09] *** RichardG has quit IRC (Ping timeout: 370 seconds) [17:35] *** RichardG has joined #archiveteam-bs [17:37] what is wget "error -6" and is there anything I can do to fix/avoid it? [17:38] *** Akiva has joined #archiveteam-bs [17:47] https://archive.org/details/localnewsarchive [18:03] *** icedice has joined #archiveteam-bs [18:37] *** GE has quit IRC (Remote host closed the connection) [18:41] *** rocode has quit IRC (Ping timeout: 600 seconds) [18:41] *** rocode has joined #archiveteam-bs [18:53] *** sheaf has quit IRC (Quit: sheaf) [19:56] SketchCow: for https://archive.org/details/pdfy-QPCSwTWiFz1u9WU_ i technically own the copyright to that document (and its way obsolete; the original version was a text file), it had a weird genesis being converted to html without permission, uploaded to scribd without permission, downloaded as 'david.pdf' and shoved on pdfY at some point. While i don't mind it being there for historical reasons, it is obsolete [19:56] and is superseded by a newer version of said document at this point [19:57] i actually DMCA'd the copy at scribd because i hate scribd with a passion, since they're making ad/subscription money off my work without my permission [19:57] i don't mind it being hosted at IA [19:58] *** sheaf has joined #archiveteam-bs [19:59] https://www.dropbox.com/s/z334fcat4jal5qu/S14001A9.txt?dl=0 is the original version of that file before it got html-ified by some anonymous person [19:59] https://www.dropbox.com/s/j1lkjtkwjzlg6ko/S14001A10.txt?dl=0 is the latest version [20:04] https://www.dropbox.com/s/z334fcat4jal5qu/S14001A9.txt?dl=0 was maybe once located at http://www.netaxs.com/~gevaryah/S14001A9.txt but that would have had to be in 2006ish, I'm not even sure I uploaded it there before netaxs went bust [20:07] heh, if you actually want copies if the not-archived files visible on https://web-beta.archive.org/web/20050225051103/http://www.netaxs.com:80/~gevaryah/ i have them saved somewhere [20:31] SketchCow: Please don't delete my 'Godane VHS Capture' folder [20:31] also you localnewsarchive item uploads is a complete mess in my mind [20:31] normally your using the full item name [20:31] *** Akiva has quit IRC (Remote host closed the connection) [20:32] *file name for metadata [20:32] *** GE has joined #archiveteam-bs [20:32] not like this: https://archive.org/details/localnewsarchive_ABC [20:36] i only complain cause you told me that with the vhs vault stuff [20:42] *** DFJustin has quit IRC (hub.efnet.us hub.dk) [20:42] *** bwn has quit IRC (hub.efnet.us hub.dk) [20:42] *** alfie has quit IRC (hub.efnet.us hub.dk) [20:42] *** acridAxid has quit IRC (hub.efnet.us hub.dk) [20:42] *** SpaffGarg has quit IRC (hub.efnet.us hub.dk) [20:42] *** Selavi has quit IRC (hub.efnet.us hub.dk) [20:42] *** kevinr has quit IRC (hub.efnet.us hub.dk) [20:42] *** tephra_ has quit IRC (hub.efnet.us hub.dk) [20:42] *** tsr has quit IRC (hub.efnet.us hub.dk) [20:42] *** ThisAsYou has quit IRC (hub.efnet.us hub.dk) [20:42] *** davidar has quit IRC (hub.efnet.us hub.dk) [20:42] *** Ctrl-S___ has quit IRC (hub.efnet.us hub.dk) [20:42] *** Sanqui has quit IRC (hub.efnet.us hub.dk) [20:42] *** deathy has quit IRC (hub.efnet.us hub.dk) [20:42] *** alembic has quit IRC (hub.efnet.us hub.dk) [20:42] *** BartoCH has quit IRC (hub.efnet.us hub.dk) [20:42] *** HCross2 has quit IRC (hub.efnet.us hub.dk) [20:42] *** hook54321 has quit IRC (hub.efnet.us hub.dk) [20:42] *** tuluu has quit IRC (hub.efnet.us hub.dk) [20:42] *** Famicoman has quit IRC (hub.efnet.us hub.dk) [20:42] *** Yoshimura has quit IRC (hub.efnet.us hub.dk) [20:42] *** zhongfu has quit IRC (hub.efnet.us hub.dk) [20:42] *** Kaz has quit IRC (hub.efnet.us hub.dk) [20:42] *** JSharp___ has quit IRC (hub.efnet.us hub.dk) [20:42] *** tklk has quit IRC (hub.efnet.us hub.dk) [20:42] *** floogulin has quit IRC (hub.efnet.us hub.dk) [20:42] *** jiphex has quit IRC (hub.efnet.us hub.dk) [20:42] *** FalconK has quit IRC (hub.efnet.us hub.dk) [20:42] *** t2t2 has quit IRC (hub.efnet.us hub.dk) [20:42] *** K4k has quit IRC (hub.efnet.us hub.dk) [20:42] *** Muad-Dib has quit IRC (hub.efnet.us hub.dk) [20:42] *** Meroje has quit IRC (hub.efnet.us hub.dk) [20:42] *** raphidae has quit IRC (hub.efnet.us hub.dk) [20:42] *** icedice has quit IRC (hub.efnet.us hub.dk) [20:42] *** JensRex has quit IRC (hub.efnet.us hub.dk) [20:42] *** Simpbrain has quit IRC (hub.efnet.us hub.dk) [20:42] *** antomatic has quit IRC (hub.efnet.us hub.dk) [20:42] *** Hecatz has quit IRC (hub.efnet.us hub.dk) [20:42] *** medowar has quit IRC (hub.efnet.us hub.dk) [20:42] *** Aoede has quit IRC (hub.efnet.us hub.dk) [20:42] *** Rai-chan has quit IRC (hub.efnet.us hub.dk) [20:42] *** Frogging has quit IRC (hub.efnet.us hub.dk) [20:42] *** Riviera has quit IRC (hub.efnet.us hub.dk) [20:42] *** SN4T14 has quit IRC (hub.efnet.us hub.dk) [20:42] *** i0npulse has quit IRC (hub.efnet.us hub.dk) [20:42] *** purplebot has quit IRC (hub.efnet.us hub.dk) [20:42] *** yuitimoth has quit IRC (hub.efnet.us hub.dk) [20:42] *** nyany has quit IRC (hub.efnet.us hub.dk) [20:42] *** Madchen has quit IRC (hub.efnet.us hub.dk) [20:42] *** PurpleSym has quit IRC (hub.efnet.us hub.dk) [20:42] *** altlabel has quit IRC (hub.efnet.us hub.dk) [20:42] *** RichardG has quit IRC (hub.efnet.us hub.dk) [20:42] *** j08nY has quit IRC (hub.efnet.us hub.dk) [20:42] *** brayden has quit IRC (hub.efnet.us hub.dk) [20:42] *** GLaDOS has quit IRC (hub.efnet.us hub.dk) [20:42] *** joepie91 has quit IRC (hub.efnet.us hub.dk) [20:42] *** cf has quit IRC (hub.efnet.us hub.dk) [20:42] *** chfoo has quit IRC (hub.efnet.us hub.dk) [20:42] *** eprillios has quit IRC (hub.efnet.us hub.dk) [20:42] *** tapedrive has quit IRC (hub.efnet.us hub.dk) [20:42] *** antonizoo has quit IRC (hub.efnet.us hub.dk) [20:42] *** Odd0002 has quit IRC (hub.efnet.us hub.dk) [20:42] *** HP has quit IRC (hub.efnet.us hub.dk) [20:42] *** dashcloud has quit IRC (hub.efnet.us hub.dk) [20:42] *** w0rp has quit IRC (hub.efnet.us hub.dk) [20:42] *** Kenshin has quit IRC (hub.efnet.us hub.dk) [20:42] *** Jon- has quit IRC (hub.efnet.us hub.dk) [20:42] *** SilSte has quit IRC (hub.efnet.us hub.dk) [20:42] *** espes__ has quit IRC (hub.efnet.us hub.dk) [20:42] *** kvieta has quit IRC (hub.efnet.us hub.dk) [20:42] *** Lord_Nigh has quit IRC (hub.efnet.us hub.dk) [20:42] *** kurt has quit IRC (hub.efnet.us hub.dk) [20:42] *** Fletcher has quit IRC (hub.efnet.us hub.dk) [20:42] *** yuitimoth has joined #archiveteam-bs [20:42] *** nyany has joined #archiveteam-bs [20:42] *** Madchen has joined #archiveteam-bs [20:42] *** PurpleSym has joined #archiveteam-bs [20:42] *** altlabel has joined #archiveteam-bs [20:42] *** irc.homelien.no sets mode: +o PurpleSym [20:48] *** icedice has joined #archiveteam-bs [20:48] *** JensRex has joined #archiveteam-bs [20:48] *** Simpbrain has joined #archiveteam-bs [20:48] *** antomatic has joined #archiveteam-bs [20:48] *** Hecatz has joined #archiveteam-bs [20:48] *** medowar has joined #archiveteam-bs [20:48] *** Aoede has joined #archiveteam-bs [20:48] *** Rai-chan has joined #archiveteam-bs [20:48] *** Frogging has joined #archiveteam-bs [20:48] *** Riviera has joined #archiveteam-bs [20:48] *** SN4T14 has joined #archiveteam-bs [20:48] *** i0npulse has joined #archiveteam-bs [20:48] *** purplebot has joined #archiveteam-bs [20:48] *** irc.underworld.no sets mode: +o antomatic [20:48] *** swebb sets mode: +o antomatic [20:48] *** powerKitt has joined #archiveteam-bs [20:51] http://www.dmoz.org/ apparently shutdown 2017/03/17 and left a static mirror at http://dmoztools.net/ [20:53] I'm gonna throw the static mirror into ArchiveBot with the "--no-offsite-links" parameter to prevent it from making a massive WARC. [20:54] Yes, someone from here grabbed it back then. Haven't heard about the static mirror though. [20:54] powerKitt: Use --large for this one. [20:54] Alright. [20:54] IIRC, that grab in March was something like 3M URLs. [20:56] Someone needs to add --large to the ArchiveBot documentation. [20:56] Here's the relevant announcement regarding dmoztools.net, by the way: https://www.facebook.com/DMOZ/posts/10155889279717542 [21:00] Hmm, I can't find the archive from March in the IA. masterX244 (the guy who grabbed it) hasn't been here since 22 March, it seems. :-/ [21:08] *** DFJustin has joined #archiveteam-bs [21:08] *** swebb sets mode: +o DFJustin [21:11] *** DFJustin has quit IRC (Read error: Connection reset by peer) [21:13] *** powerKitt has quit IRC (Quit: Page closed) [21:16] *** DFJustin has joined #archiveteam-bs [21:16] *** bwn has joined #archiveteam-bs [21:16] *** alfie has joined #archiveteam-bs [21:16] *** acridAxid has joined #archiveteam-bs [21:16] *** SpaffGarg has joined #archiveteam-bs [21:16] *** Selavi has joined #archiveteam-bs [21:16] *** kevinr has joined #archiveteam-bs [21:16] *** tephra_ has joined #archiveteam-bs [21:16] *** tsr has joined #archiveteam-bs [21:16] *** davidar has joined #archiveteam-bs [21:16] *** Ctrl-S___ has joined #archiveteam-bs [21:16] *** ThisAsYou has joined #archiveteam-bs [21:16] *** Sanqui has joined #archiveteam-bs [21:16] *** alembic has joined #archiveteam-bs [21:16] *** deathy has joined #archiveteam-bs [21:16] *** BartoCH has joined #archiveteam-bs [21:16] *** HCross2 has joined #archiveteam-bs [21:16] *** hook54321 has joined #archiveteam-bs [21:16] *** tuluu has joined #archiveteam-bs [21:16] *** Famicoman has joined #archiveteam-bs [21:16] *** Yoshimura has joined #archiveteam-bs [21:16] *** zhongfu has joined #archiveteam-bs [21:16] *** Kaz has joined #archiveteam-bs [21:16] *** JSharp___ has joined #archiveteam-bs [21:16] *** tklk has joined #archiveteam-bs [21:16] *** floogulin has joined #archiveteam-bs [21:16] *** jiphex has joined #archiveteam-bs [21:16] *** FalconK has joined #archiveteam-bs [21:16] *** t2t2 has joined #archiveteam-bs [21:16] *** K4k has joined #archiveteam-bs [21:16] *** raphidae has joined #archiveteam-bs [21:16] *** Muad-Dib has joined #archiveteam-bs [21:16] *** Meroje has joined #archiveteam-bs [21:16] *** swebb sets mode: +o DFJustin [21:23] *** powerKitt has joined #archiveteam-bs [21:38] *** powerKitt has quit IRC (Ping timeout: 268 seconds) [21:40] Oh Godane [21:41] Well, the local news archive thing is a VERY specific situation. [21:41] It wouldn't work for most things, but there's a finite amount of "buckets" for channels. [21:41] I.e. 600 and maybe eventually to something like 800 [21:41] So as you're pulling down more videos from that guy and elsewhere, they can be shoved into this with relative ease. [21:42] A very specific choice [21:42] Also, this wanders handily into IA space and I don't want to make a big footprint [21:50] *** Lord_Nigh has joined #archiveteam-bs [22:07] *** SmileyG has quit IRC (Remote host closed the connection) [22:25] *** icedice has quit IRC (Ping timeout: 268 seconds) [22:32] ok [22:40] i remember you say how you can only own like 30 something collections if remember correctly [22:41] btw i got your 6 hour video called "Jason Scott's Day of Archiving" [22:43] Ringling Bros Circus will stream their final performance in about 16 minutes (23:00 UTC) via Facebook and YouTube. I can't take care of this (and also don't know how), but it would be great if we could grab it. According to one news article, the video should also be available for a short time (whatever that means) after the show, but why take any chances? [22:44] Oh, meant to send this to the main channel. [23:02] I saw a post on HN ealier by someone who crawled and indexed all Gopher sites on the public Internet and asked him to upload the data to the IA [23:02] And he followed through! https://archive.org/details/gopher-may-2017.tar [23:02] So the IA now has a complete copy of the Gopher Internet as of this month [23:07] *** GE has quit IRC (Remote host closed the connection) [23:13] The blog post he wrote is a good read too since he chose to use the "Personal" version of AltaVista's search indexer (which is apparently something you could buy circa 1997) running in a Windows 98 VM: https://blog.benjojo.co.uk/post/building-a-search-engine-for-gopher [23:14] *** BlueMaxim has joined #archiveteam-bs [23:25] *** powerKitt has joined #archiveteam-bs [23:51] *** qwebirc30 has joined #archiveteam-bs [23:54] *** powerKitt has quit IRC (Ping timeout: 272 seconds) [23:56] *** qwebirc30 is now known as powerKitt