[00:08] *** Rye has joined #archiveteam [00:11] *** woozie has quit IRC (Quit: Page closed) [00:21] *** Start has joined #archiveteam [00:28] *** bwn has quit IRC (Read error: Operation timed out) [00:34] *** remsen has joined #archiveteam [00:40] *** bithippo has joined #archiveteam [00:40] Could http://whydoeseverythingsuck.com/ be dropped into ArchiveBot? The author has passed away. http://www.usatoday.com/story/tech/2015/11/15/tech-entrepreneur-diversity-advocate-hank-williams-dies-50/75841942/ [00:41] Thank you. [00:41] bithippo, when a slot comes free, its very busy [00:41] No rush HCross. Thanks again. [00:42] Does ArchiveBot need additional resources? [00:42] In the wake of the horrific events in paris, there are a lot of URL's from that being processed. Sure me or one of the others will drop it in when there is a slot [00:42] Ahh, that makes sense. [00:42] I should've checked it's dashboard first [00:42] Yeah, its crazy busy atm. Thanks HCross! [00:45] SketchCow, 39 issues of the RaspberryPi magazine winging their way accross [01:11] http://thenextweb.com/insider/2015/11/16/pandora-to-acquire-music-streaming-service-rdio-in-fire-sale/ [01:11] "Rdio is shutting down as Pandora acquires key assets" [01:14] *** Sk1d has quit IRC (Read error: Operation timed out) [01:15] *** philpem has quit IRC (Read error: Operation timed out) [01:38] *** dashcloud has quit IRC (Read error: Operation timed out) [01:42] *** dashcloud has joined #archiveteam [01:47] *** bwn has joined #archiveteam [02:01] *** nertzy has joined #archiveteam [02:15] *** bithippo has quit IRC (Quit: Page closed) [02:16] *** bzc6p_ has joined #archiveteam [02:16] *** swebb sets mode: +o bzc6p_ [02:21] *** bzc6p has quit IRC (Ping timeout: 730 seconds) [02:40] *** Ymgve has quit IRC (Ping timeout: 506 seconds) [02:44] *** zhongfu has quit IRC (Quit: No Ping reply in 180 seconds.) [02:44] *** zhongfu has joined #archiveteam [02:49] *** Sk1d has joined #archiveteam [02:54] *** JesseW has joined #archiveteam [02:54] *** Froggypwn has joined #archiveteam [03:00] *** Ymgve has joined #archiveteam [03:03] *** primus104 has quit IRC (Leaving.) [03:09] *** Sk1d has quit IRC (Ping timeout: 252 seconds) [03:34] *** Ungstein has quit IRC (Quit: Leaving.) [03:38] *** xk_id has joined #archiveteam [03:40] *** Emcy_ has quit IRC (Read error: Operation timed out) [03:43] *** nekomune has joined #archiveteam [03:43] *** xk_id has quit IRC (Remote host closed the connection) [03:46] *** BlueMaxim has joined #archiveteam [03:49] *** Emcy has joined #archiveteam [03:51] !ao https://www.freewordcentre.com/blog/2015/02/daesh-isis-media-alice-guthrie/ [03:52] *** Emcy has quit IRC (Read error: Connection reset by peer) [04:43] *** bwn has quit IRC (Read error: Operation timed out) [05:13] *** dashcloud has quit IRC (Read error: Operation timed out) [05:16] *** dashcloud has joined #archiveteam [05:17] *** chfoo has quit IRC (Quit: quit) [05:17] *** abartov__ has quit IRC (Ping timeout: 252 seconds) [05:29] Check out @j_w_baker's Tweet: https://twitter.com/j_w_baker/status/665228212687998976?s=09 [05:30] can someone mirror that on ia? [05:31] https://web.archive.org/web/20151117053108/https:/twitter.com/j_w_baker/status/665228212687998976?s=09 -- not, I presume, what you meant. :-) [05:31] *** abartov__ has joined #archiveteam [05:34] The files are apparently only "available (for research purposes) on request" :-( [05:34] http://jmcauley.ucsd.edu/data/amazon/amazon_readme.txt [05:34] The links are a lie... [05:35] Yeah [05:35] time to email [05:45] *** dashcloud has quit IRC (Read error: Operation timed out) [05:47] *** bwn has joined #archiveteam [05:48] *** WinterFox has joined #archiveteam [05:49] *** dashcloud has joined #archiveteam [06:16] *** bzc6p__ has joined #archiveteam [06:16] *** swebb sets mode: +o bzc6p__ [06:21] *** bzc6p_ has quit IRC (Read error: Operation timed out) [06:27] *** bzc6p__ is now known as bzc6p [06:32] *** bzc6p sets mode: +o achip [06:32] *** bzc6p sets mode: +oooo Atluxity garyrh GLaDOS godane [06:32] *** bzc6p sets mode: +oooo HCross Infreq Kenshin midas [06:32] *** bzc6p sets mode: +oooo Nemo_bis sep332 SimpBrain Start [06:32] *** bzc6p sets mode: +o wp494 [06:32] *** bzc6p sets mode: +o SimpBrain [06:43] SketchCow: can you add to web collection please? https://archive.org/details/wikimedia-lists.wikimedia.org-2015-11-08 [06:44] * JesseW has started a project for a 6-character, 52 alphabet *non-incremental* urlshortner -- it's painful to wait for it to find *any* results... [06:44] If people want to dump more power at urlteam, feel free! [07:14] *** asdf has joined #archiveteam [07:25] *** Tallos has joined #archiveteam [07:30] *** asdf has quit IRC (Quit: Leaving) [07:32] *** Tallos_ has quit IRC (Read error: Operation timed out) [07:40] *** asdf has joined #archiveteam [07:42] *** Elegance has quit IRC (Quit: :(){ :|:& };:) [07:46] *** Sk1d has joined #archiveteam [07:55] *** Elegance has joined #archiveteam [07:55] *** Elegance has quit IRC (Client Quit) [07:58] *** Elegance has joined #archiveteam [08:06] *** swebb has quit IRC (Ping timeout: 369 seconds) [08:08] *** atlogbot has quit IRC (Ping timeout: 369 seconds) [08:11] *** primus104 has joined #archiveteam [08:12] *** xk_id has joined #archiveteam [08:17] *** JesseW has quit IRC (Leaving.) [08:17] *** cvb has joined #archiveteam [08:22] *** Ungstein has joined #archiveteam [08:34] *** xk_id has quit IRC (Remote host closed the connection) [08:40] *** atomotic has joined #archiveteam [08:41] *** balrog has quit IRC (Read error: Operation timed out) [08:41] *** dashcloud has quit IRC (Read error: Operation timed out) [08:45] *** dashcloud has joined #archiveteam [08:46] *** balrog has joined #archiveteam [08:50] *** schbirid has joined #archiveteam [09:00] *** garyrh has quit IRC (http://bnc4free.com/) [09:15] https://archiveofourown.org/ [09:16] we archiving this yet? [09:23] When was https://code.googlesource.com/ created? Wasn't in wayback till a second ago [10:12] hi guys. there's probably no chance of this, but has anyone ever archived the Randnet online service in Japan for Nintendo 64? i mean the internal forums and content, not just the webiste. i'm crawling wayback for the website. [10:12] i don't know if any of the forums and such were ever accessible via the web, without a 64DD. [10:43] you'd probably have to ask one of the two japanese people who owned a 64DD tbh [10:51] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [10:56] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [11:04] *** GLaDOS has quit IRC (Ping timeout: 252 seconds) [11:04] *** GLaDOS has joined #archiveteam [11:25] *** bwn has quit IRC (Read error: Operation timed out) [11:41] *** Emcy has joined #archiveteam [11:45] *** WinterFox has quit IRC (Leaving) [11:45] *** WinterFox has joined #archiveteam [11:51] Anything going on that I can throw 20 workers at? [11:58] *** bwn has joined #archiveteam [11:58] Not yet [11:58] I have some time this week to get some new projects started though [11:59] So soon you can [11:59] Ok. Give me a shout when and Ill throw them at it [12:00] Will do if I don't forget [12:01] HCross: I'm keeping an eye out too... [12:02] You both will like Google Code [12:02] :D [12:02] I think that'll really be a project you can throw everything at [12:02] arkiver, when it comes I am gonna rent a dedi to throw at it [12:02] given a decent enough rsync target [12:03] 10Gbps rsync target? [12:03] :D [12:06] can it be an eu one, the one for adrive was awesome tho as it was in the same network as my warrior [12:08] hehe [12:09] strangly, the places I have access to storage is not the same places I have access to bandwidth speeds [12:13] *** WinterFox has quit IRC (Read error: Operation timed out) [12:14] *** atomotic has joined #archiveteam [12:21] *** WinterFox has joined #archiveteam [12:24] *** WinterFox has quit IRC (Client Quit) [12:33] *** HCross2 has joined #archiveteam [12:52] *** Tallos_ has joined #archiveteam [12:58] *** Tallos has quit IRC (Read error: Operation timed out) [13:03] *** vitzli has joined #archiveteam [13:03] aand home.online.no is dead again. Hello all [13:06] Geets from plane [13:07] hi plane, I'm joepie91 [13:07] :) [13:10] *** scyther has joined #archiveteam [13:12] vitzli: was it alive? [13:13] I did not know [13:16] yeah, it was up on Nov 12, but coldfusion worked as before (but now it gave proxy errors) [13:46] When it's up we hit hard? [13:50] its a fragile little thing [13:56] *** garyrh has joined #archiveteam [13:57] Ha HA, my tweet about us saving #allmymovies was declared "Punching Down" [14:08] Oh, she is NOT happy. [14:08] Why do I even engage with angry archivists with 382 followers. [14:08] You get 20 followers just attaching to twitter with the right keywords. [14:10] *** swebb has joined #archiveteam [14:12] *** phuzion has quit IRC (Remote host closed the connection) [14:16] she likes corn tho [14:30] *** arkiver2 has joined #archiveteam [14:35] One has to realize how frustrated and bitter some librarians and archivists are and how I am seriously their living nightmare. [14:35] I don't even havea degree or training. [14:36] Asmall percentage see me as a loon/threat [14:36] Also, this keyboard has a shitty spacebar [14:37] *** bzc6p_ has joined #archiveteam [14:38] ArchiveTeam is getting more work done then a lot of the 'real' internet archivists [14:38] Or archivists in general [14:39] *** bzc6p has quit IRC (Ping timeout: 360 seconds) [14:39] Too true [14:40] the few old-fashioned archivists I've met in person have been very insistent that the job of an archivist is curation-and-preservation-simultaneously and were disturbed by the idea of preserve-first-curate-later [14:40] I don't know how common that is, though [14:41] *** pgoetz has joined #archiveteam [14:41] *** arkiver2 has quit IRC (Ping timeout: 252 seconds) [14:42] Their perspective comes from a valid place, that is: what good is having information if you don't know how to find anything in it? [14:43] But when it comes to digital data it's much eaiser to sort through it after the fact [14:43] And it's not really any good if the data disappears before you could possibly curate it [14:43] *** pgoetz has quit IRC (Remote host closed the connection) [14:44] sure, but the inherent disposal associated with such curation is so permanent [14:44] * jspiros shudders [14:44] True, I didn't even consider that [14:44] Of course, digital changes the game there too with continuously increasing storage densities [14:46] of course [14:49] Reminds me of one of Jason's talks where he says how at the time the Geocties backup seemed huge and now it'll fit onto one hard drive with room to spare. [14:59] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [15:05] *** pgoetz has joined #archiveteam [15:13] *** xk_id has joined #archiveteam [15:29] *** xk_id has quit IRC (Remote host closed the connection) [15:36] *** FAMAS has joined #archiveteam [15:39] *** primus104 has quit IRC (Leaving.) [15:42] *** test_ has joined #archiveteam [15:45] *** test_ has quit IRC (Client Quit) [15:47] Some archivists are like workers next to a lake, studying the tiny and finite fish from the comfort of a cabin [15:47] Some are sushi chefs, taking in the haul and curating it all out of recognition but beautifully. [15:48] Archive Team are the deep sea motherfuckers fighting 50 foot waves and dragging netting ith no sun for 4 days [15:57] *** FAMAS has quit IRC (Read error: Operation timed out) [16:08] So when do we get our own reality TV series? [16:09] Do archives for http://archiveteam.org/index.php?title=Circavie exist somewhere? [16:11] *** cvb has quit IRC (Quit: Leaving) [16:14] Same for http://archiveteam.org/index.php?title=Club_Nintendo [16:18] Win for the Archive Team: http://www.insidesources.com/new-york-times-article-blaming-encryption-paris-attacks/ [16:18] The article is saved in the Wayback Machine apparently due to our activity [16:23] PurpleSym: club nintendo was saved through archivebot [16:28] http://archive.fart.website/archivebot/viewer/?q=club.nintendo [16:30] Got it. [16:30] *** JesseW has joined #archiveteam [16:30] The first one is lost then? [16:30] I wouldn't assume that [16:36] *** Ghost_of_ has joined #archiveteam [16:43] *** vitzli has quit IRC (Leaving) [16:52] *** HCross2 has quit IRC () [16:57] *** JesseW has quit IRC (Leaving.) [16:59] *** notjack has joined #archiveteam [16:59] Hi all [17:03] *** Ghost_of_ has quit IRC (Quit: Leaving) [17:13] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [17:22] Before we brag on the nytimes, the IA REALLY FUCKING SCOPES the NYT [17:22] We definitely do great work here, but NYT and a few other news sites are special cases. [17:23] I believe they do a capture every 5minutes every day [17:23] *** Lord_Nigh has joined #archiveteam [17:23] your games in a browser thing was on every tv channel wasn't it [17:36] Best WARC maker/site grabber for windows? [17:39] SketchCow: You're probably right [17:39] Looking through the ArchiveBot viewer I can't find any direct scrapes of that article [17:39] So if we did capture it then it was only indirectly [17:44] Yeah [17:44] That's fine, we don't needto be The Heroes eachtime. [17:45] My altitude is listed at 20,000 feet [17:45] Time to arrival is listed as 4 minutes [17:45] I assume we're crashing, then [17:46] I'll make sure to archive all the news articles about your crash! [17:47] *** phuzion has joined #archiveteam [17:52] *** abartov__ has quit IRC (Ping timeout: 252 seconds) [17:52] HCross: wget-warc B) [17:53] Ok [17:54] note, that may not actually work well on windows because windows is kinda terrible for this sort of thing [17:54] Using WinHTTrack atm [17:54] wget-warc writes warcs directly, so as long as you keep them under the filesize limit it should be fine [17:55] if you wget plainly and then pack, the filesystem will stomp on your cases [17:55] what's a warc? [17:56] web archive, contains request and response headers as well as page content [17:56] it's what the wayback machine uses [17:56] you can use httrack with warc-proxy I think [17:56] i used to point my browser through warc-proxy most of the time [17:56] xmc, alright [17:59] so httrack just discovers, and then warc-proxy does the work [18:17] *** aaaaaaaaa has joined #archiveteam [18:31] *** primus104 has joined #archiveteam [18:34] *** JW_work has quit IRC (Read error: Operation timed out) [18:37] *** JW_work has joined #archiveteam [18:39] *** JW_work has quit IRC (Client Quit) [18:39] *** JW_work has joined #archiveteam [18:44] [Reuters - World] Passenger restrained on Boston-bound British Airways flight: police http://feeds.reuters.com/~r/Reuters/worldNews/~3/C0daVe1oAHo/story01.htm 2015-11-17T18:36:47 [19:02] *** primus104 has quit IRC (Leaving.) [19:03] *** SketchCow has quit IRC (Ping timeout: 252 seconds) [19:23] *** bzc6p_ has left [19:27] *** abartov__ has joined #archiveteam [19:31] *** remsen2 has joined #archiveteam [19:32] *** remsen has quit IRC (Read error: Operation timed out) [19:39] *** primus104 has joined #archiveteam [19:40] *** xk_id has joined #archiveteam [20:06] *** SketchCow has joined #archiveteam [20:06] *** WinterFox has joined #archiveteam [20:15] *** asdf has quit IRC (Quit: Leaving) [20:18] HCross: Heritrix runs on Windows [20:18] thanks [20:19] *** cvb has joined #archiveteam [20:20] did anyone get a copy of haikuware / bebits recentlky? seems the guy running them has burned them down http://pulkomandy.tk/~beosarchive/karl/1.txt [20:22] *** ex-parro1 is now known as ex-parrot [20:27] *** SN4T14 has joined #archiveteam [20:28] christ, what an asshole [20:29] arkiver, am I on the right track with https://webarchive.jira.com/wiki/display/Heritrix/Heritrix+Configuration [20:38] *** WinterFox has quit IRC (Read error: Operation timed out) [20:39] ex-parrot: I don't know how complete it is but there is a warc of the site in here [20:39] https://archive.org/details/archiveteam_archivebot_go_20150712070001 [20:40] thanks [20:46] *** WinterFox has joined #archiveteam [20:47] *** WinterFox has quit IRC (Remote host closed the connection) [20:59] *** aaaaaaaa_ has joined #archiveteam [20:59] *** aaaaaaaaa has quit IRC (Read error: Connection reset by peer) [21:00] *** scyther has quit IRC (Leaving) [21:00] *** aaaaaaaa_ is now known as aaaaaaaaa [21:08] *** bwn has quit IRC (Read error: Operation timed out) [21:15] *** philpem has joined #archiveteam [21:25] *** aaaaaaaaa has quit IRC (Ping timeout: 615 seconds) [21:28] *** bwn has joined #archiveteam [21:41] HCross: yeah, you probably need java and perl installed [21:41] the in /bin/ create a .bat file with heritrix -a admin:admin [21:41] "heritrix -a admin:admin" [21:42] run the file and heritrix is running with admin:admin [21:42] in /bin/heritrix.cmd you can change memory and other options [21:42] ah, and then I go from there [21:43] I think so [21:43] I haven't used heritrix in over a year though, so something might have changed [22:01] *** notjack has quit IRC (Ping timeout: 240 seconds) [22:04] *** schbirid has quit IRC (Quit: Leaving) [22:04] How am I not opped [22:04] * SketchCow waits for the light yellow mist of OPping [22:07] *** JW_work1 has joined #archiveteam [22:08] *** JW_work has quit IRC (Ping timeout: 360 seconds) [22:12] *** xmc sets mode: +o SketchCow [22:12] *** xmc sets mode: +o swebb [22:12] *** swebb sets mode: +o balrog [22:13] *** balrog sets mode: +o Lord_Nigh [22:17] UPDATE 3:20: The full count: four full timers at Gawker; one at Jezebel; two at Gizmodo. Each of the following shuttered subsites accounts for one freelancer/permalancer: The Vane (Gawker), Millihelen and Kitchenette (Jezebel), Workshop and AfterHours (Lifehacker), Flight Club (Jalopnik), and Indefinitely Wild and Throb (Gizmodo). So that’s eight more. Valleywag, Morning After and Defamer had no d [22:17] edicated staff. [22:17] All of them [22:17] You guys see this? [22:17] Grab all of them if we haven't [22:18] argh. I only recently started reading Kitchenette… [22:23] *** sam007 has joined #archiveteam [22:23] *** JW_work has joined #archiveteam [22:24] panic grab! [22:27] *** xk_id has quit IRC (Remote host closed the connection) [22:28] *** sam007 has quit IRC (Client Quit) [22:29] *** JW_work2 has joined #archiveteam [22:29] *** JW_work has quit IRC (Read error: Operation timed out) [22:31] *** JW_work1 has quit IRC (Read error: Operation timed out) [22:32] *** remsen2 has quit IRC (Leaving) [22:40] *** xk_id has joined #archiveteam [22:48] *** remsen has joined #archiveteam [22:57] I scraped the sitmaps for those eight, under each dir is a urls.txt with them http://ernie.nerds.io/rapidgrab/ [22:59] *** BlueMaxim has joined #archiveteam [23:06] *** useretail has quit IRC (Read error: Operation timed out) [23:42] *** useretail has joined #archiveteam [23:55] *** cvb has quit IRC (Read error: Operation timed out) [23:58] *** xk_id has quit IRC (Remote host closed the connection)