[00:17] Here's a bit more of Docstoc that didn't get uploaded normally, by the way: https://archive.org/download/WARCdealer_BarrelData_kthenu_e257bd0e-5f2c-49fb-9b31-ae516964a559.2015-12-05-04-27-07-934046-_E [00:17] arkiver, ^ [00:18] (it should probably get tucked into the regular Docstoc collection?) [00:19] *** wyatt8740 has quit IRC (Remote host closed the connection) [00:20] also, why is one of the items 18 GB? That seems kind of big for 100 documents... -_- [00:20] Eh, whatever, it's saved now anyway [00:30] *** wyatt8740 has joined #archiveteam-bs [00:32] *** SN4T14 has joined #archiveteam-bs [00:52] *** RichardG has quit IRC (Ping timeout: 499 seconds) [01:43] *** antomatic has joined #archiveteam-bs [01:43] *** swebb sets mode: +o antomatic [01:45] *** antomati_ has quit IRC (Ping timeout: 252 seconds) [01:50] *** RichardG has joined #archiveteam-bs [02:19] *** JesseW has quit IRC (Leaving.) [02:24] *** wyatt8740 has quit IRC (Read error: Operation timed out) [02:28] *** username1 has joined #archiveteam-bs [02:31] *** schbirid2 has quit IRC (Read error: Operation timed out) [02:43] *** VADemon has quit IRC (left4dead) [02:44] *** Start_ has joined #archiveteam-bs [02:44] *** Start has quit IRC (Read error: Connection reset by peer) [02:48] *** wyatt8740 has joined #archiveteam-bs [02:49] *** RichardG has quit IRC (Ping timeout: 369 seconds) [02:53] *** ndiddy has quit IRC (Remote host closed the connection) [03:23] *** diacope has quit IRC (Ping timeout: 491 seconds) [03:23] *** deathy___ has quit IRC (Ping timeout: 491 seconds) [03:23] *** tjg has quit IRC (Read error: Connection reset by peer) [03:23] *** _desu____ has joined #archiveteam-bs [03:23] *** zyphlar_ has joined #archiveteam-bs [03:24] *** tjg has joined #archiveteam-bs [03:25] *** Boltsie__ has quit IRC (Ping timeout: 242 seconds) [03:25] *** JSharp___ has quit IRC (Ping timeout: 242 seconds) [03:25] *** _desu___ has quit IRC (Ping timeout: 242 seconds) [03:25] *** zyphlar has quit IRC (Ping timeout: 242 seconds) [03:25] *** Ctrl-S___ has quit IRC (Ping timeout: 242 seconds) [03:25] *** JSharp___ has joined #archiveteam-bs [03:25] *** _desu____ is now known as _desu___ [03:25] *** zyphlar_ is now known as zyphlar [03:25] *** Boltsie__ has joined #archiveteam-bs [03:25] *** primus104 has quit IRC (Leaving.) [03:25] *** Ctrl-S___ has joined #archiveteam-bs [03:26] *** deathy___ has joined #archiveteam-bs [03:41] *** tjg has quit IRC (Read error: Connection reset by peer) [03:41] *** _desu____ has joined #archiveteam-bs [03:41] *** zyphlar_ has joined #archiveteam-bs [03:42] *** tjg has joined #archiveteam-bs [03:42] *** diacope has joined #archiveteam-bs [03:43] *** deathy___ has quit IRC (Ping timeout: 246 seconds) [03:43] *** JSharp___ has quit IRC (Ping timeout: 246 seconds) [03:43] *** Ctrl-S___ has quit IRC (Ping timeout: 246 seconds) [03:43] *** Boltsie__ has quit IRC (Ping timeout: 246 seconds) [03:43] *** zyphlar has quit IRC (Ping timeout: 246 seconds) [03:43] *** _desu___ has quit IRC (Ping timeout: 246 seconds) [03:43] *** _desu____ is now known as _desu___ [03:43] *** zyphlar_ is now known as zyphlar [03:43] *** Boltsie__ has joined #archiveteam-bs [03:43] *** JSharp___ has joined #archiveteam-bs [03:43] *** Ctrl-S___ has joined #archiveteam-bs [03:45] *** RichardG has joined #archiveteam-bs [03:50] *** JesseW has joined #archiveteam-bs [03:54] *** deathy___ has joined #archiveteam-bs [04:25] *** aaaaaaaaa has quit IRC (Leaving) [04:26] *** Start_ is now known as Start [04:55] Rrrgh. I wish IA didn't dark spam, but just noindexed it [04:56] Trying to figure out how to get to this collection of mp3s of lectures, but it was flagged as spam https://archive.org/details/Dr.JamaalBadawi [04:57] That said, maybe it is fake or something? But I'd rather be able to find out for myself than be left stuck wondering [05:06] It looks like it was an accident https://archive.org/post/1048792/help-with-failed-exit-code-1 [05:07] It's been darked 3 times as spam, and undarked twice [05:07] Why not just have a checkbox on the search pages asking if we want to include things marked as spam in search results? [05:21] *** remsen has quit IRC (Read error: Operation timed out) [05:31] *** remsen has joined #archiveteam-bs [05:32] *** remsen2 has joined #archiveteam-bs [05:35] *** R5M has joined #archiveteam-bs [05:36] *** remsen2 has quit IRC (Read error: Operation timed out) [05:37] *** remsen has quit IRC (Read error: Operation timed out) [05:42] *** R5M has quit IRC (Read error: Operation timed out) [05:47] *** remsen has joined #archiveteam-bs [05:53] *** Sk1d has quit IRC (Read error: Operation timed out) [05:54] *** remsen2 has joined #archiveteam-bs [05:56] *** remsen has quit IRC (Read error: Operation timed out) [05:56] *** R5M has joined #archiveteam-bs [06:04] *** remsen2 has quit IRC (Read error: Operation timed out) [06:14] So, I'm seeing a bunch of people talking about "Twitter Moments". https://twitter.com/moments is some random person who's never tweeted [06:14] I don't have any buttons on twitter called Moment [06:15] or displaying the electric-y logo for it. [06:15] *** vitzli has joined #archiveteam-bs [06:15] Googling "twitter moments" takes me to https://twitter.com/i/moments?lang=en which is a 404 [06:15] After watching their lovely ad, I was so excited to see what all the fuss was about! [06:16] Hashtag #marketing, hashtag #fail. [06:35] *** zerkalo has quit IRC (Read error: Operation timed out) [06:46] *** zerkalo has joined #archiveteam-bs [07:20] worked for me? [07:26] I think they've only enabled it for some accounts, but are advertising it to everyone [07:26] fie ^ [07:26] * kyan is going to sleep now though :3 [07:27] I heard about it on NPR weeks ago... I don't even use twitter [07:27] Huh, ok [07:27] I just heard about it tonight since SketchCow posted about it [07:27] I may or may not have an account... who knwos.... that place is just a trash bin [07:27] * kyan gets all his news from ArchiveTeam [07:28] * fie gets all of his news from many legged creatures that live under rocks [07:31] *** vitzli has quit IRC (Quit: Leaving) [08:38] *** primus104 has joined #archiveteam-bs [08:44] *** JesseW has quit IRC (Leaving.) [08:48] *** dashcloud has quit IRC (Read error: Operation timed out) [08:55] *** dashcloud has joined #archiveteam-bs [09:14] *** BlueMaxim has quit IRC (Quit: Leaving) [09:46] *** primus104 has quit IRC (Leaving.) [10:11] *** Sk1d has joined #archiveteam-bs [11:59] *** primus104 has joined #archiveteam-bs [12:38] *** primus104 has quit IRC (Leaving.) [13:02] *** R5M has quit IRC (Read error: Operation timed out) [13:31] *** VADemon has joined #archiveteam-bs [13:34] *** SN4T14 has quit IRC (Remote host closed the connection) [13:41] *** mistym has quit IRC (Remote host closed the connection) [13:44] *** SN4T14 has joined #archiveteam-bs [14:08] *** SimpBrain has quit IRC (Leaving) [14:11] What are the user-agent strings for archive.org waybackmachine bot? ia_archiver and archive.org_bot are proposed by http://www.archiveteam.org/index.php?title=ArchiveBot#Disclaimers [14:12] But I also have found a website telling that ia_archiver-web.archive.org is a bot from Alexa additionally indexing items for web.archive.org [14:36] sounds like ym seagate 8tb woes are a kernel problem https://bugzilla.kernel.org/show_bug.cgi?id=93581 [15:00] *** primus104 has joined #archiveteam-bs [15:14] *** SimpBrain has joined #archiveteam-bs [15:40] *** SN4T14 has quit IRC (Remote host closed the connection) [15:46] so i found a pattern to grab Time Magazine from there vault website [15:57] so looks like the older vault Time magazines was scanned better [15:57] where things in the 1990s they didn't care [15:58] they scan those very baddly [16:01] here is a exampile of a bad scan: http://time.com/vault/issue/1923-09-03/page/1/ [16:03] anyways the first 100 download ids of fieldsupport.lingnet.org is done: https://archive.org/details/fieldsupport.lingnet.org-download-id-1-to-100-20151206 [16:03] you really only get 74 of them [16:03] but you have a wget.log to see what is missing [16:04] *** R5M has joined #archiveteam-bs [17:18] In EXTREMELY boring news, the archivebot screenshotter ignored an item if it had anything called *png* in it, that's fixed, so those little ones without any screenshots are now getting screenshots. [17:19] It's going to be at this for months, probably, but I can summarily ignore it [17:19] I will automate it a tad more and then just watch it fill. [17:19] The question is if it can ever beat the race condition. [17:19] (I don't want to do things like just do a small set of the page grabs in a given set.) [17:20] I mean, if we were having guests, I might do that. [17:20] We're not having guests [17:23] *** username1 is now known as schbirid [17:23] * schbirid slaps arkiver with nohome [17:29] *** limebyte has quit IRC (ZNC - http://znc.in) [17:29] *** limebyte has joined #archiveteam-bs [17:42] *** Ravenloft has joined #archiveteam-bs [17:44] *** no2pencil has quit IRC (Ping timeout: 252 seconds) [17:44] *** no2pencil has joined #archiveteam-bs [17:44] schbirid: I think I already send you the telenor target? [17:44] nope, or maybe i lost the log [17:45] *** tjg has quit IRC (Read error: Connection reset by peer) [17:45] *** Boltsie__ has quit IRC (Write error: Connection reset by peer) [17:45] *** _desu____ has joined #archiveteam-bs [17:45] *** zyphlar_ has joined #archiveteam-bs [17:45] *** bauruine has quit IRC (Read error: Connection reset by peer) [17:45] *** bauruine_ has joined #archiveteam-bs [17:45] schbirid: found it, looks like you were online [17:46] *** bauruine_ is now known as bauruine [17:46] *** Boltsie__ has joined #archiveteam-bs [17:46] offline* [17:46] cheers! [17:47] *** deathy___ has quit IRC (Ping timeout: 241 seconds) [17:47] *** JSharp___ has quit IRC (Ping timeout: 241 seconds) [17:47] *** Ctrl-S___ has quit IRC (Ping timeout: 241 seconds) [17:47] *** _desu___ has quit IRC (Ping timeout: 241 seconds) [17:47] *** zyphlar has quit IRC (Ping timeout: 241 seconds) [17:47] *** _desu____ is now known as _desu___ [17:47] *** JSharp___ has joined #archiveteam-bs [17:47] *** zyphlar_ is now known as zyphlar [17:48] *** tjg has joined #archiveteam-bs [17:48] *** Ctrl-S___ has joined #archiveteam-bs [17:52] *** PrincessK has joined #archiveteam-bs [17:52] *** deathy___ has joined #archiveteam-bs [18:00] *** Knoeki has quit IRC (Read error: Operation timed out) [18:21] *** Ravenloft has quit IRC (Ping timeout: 252 seconds) [18:23] *** JesseW has joined #archiveteam-bs [18:57] *** aaaaaaaaa has joined #archiveteam-bs [18:57] *** swebb sets mode: +o aaaaaaaaa [19:21] *** JesseW has quit IRC (Leaving.) [19:44] *** SN4T14 has joined #archiveteam-bs [19:53] *** R5M has quit IRC (Read error: Operation timed out) [20:31] *** JesseW has joined #archiveteam-bs [20:48] godane: thanks for the lingnet grab: https://archive.org/details/fieldsupport.lingnet.org-download-id-1-to-100-20151206 [20:52] your welcome [20:53] also this is up to now: https://archive.org/details/fieldsupport.lingnet.org-download-id-301-to-400-20151206 [21:10] *** JesseW has quit IRC (Leaving.) [21:15] godane: should that also be saved into WARCs? [21:17] i was not saving it in warc cause the files are pdfs and zips [21:18] but if you want i could do that [21:18] its like how i did the lego pdfs [21:20] I think it's always best to save files into WARCs [21:20] Direct links to the files in the WARC files can always be made [21:26] ok [21:26] i'm doing 1 to 500 as a WARC [21:27] thanks! [21:28] i mostly do the zips for later collection building [21:28] I'll soon start writing the warrior project for your scripts we talked about [21:29] so you can use many IPs or lot's of bandwidth to do your grabs [21:29] i'm doing good with kpfa so far [21:29] i'm up to may 2006 [21:29] saw that yeah, nice [22:13] *** Ravenloft has joined #archiveteam-bs [23:03] *** Muad-Dib has quit IRC (Ping timeout: 252 seconds) [23:14] *** R5M has joined #archiveteam-bs [23:40] *** zenguy has quit IRC (Quit: see ya!) [23:41] *** schbirid has quit IRC (Quit: Leaving) [23:42] *** zenguy has joined #archiveteam-bs [23:43] *** Ravenloft has quit IRC (Ping timeout: 252 seconds) [23:51] *** zenguy_pc has joined #archiveteam-bs