[00:07] *** Ymgve has quit IRC () [00:20] *** bwn has joined #archiveteam [00:22] *** ikreymer has joined #archiveteam [00:24] hi all, happy to announce that http://oldweb.today/ is now live -- browse any page from multiple web archives in 10+ old browsers [00:25] Oh, it's like Browsershots! Cool! :D [00:25] *** SimpBrain has quit IRC (Read error: Operation timed out) [00:26] sort of, its designed for browsing archives not testing new sites, although can browse anything.. lots of old browser back to Mosaic supported [00:27] right :D [00:27] that seems awesome [00:34] *** Marcelo has quit IRC (Ping timeout: 240 seconds) [00:36] *** Marcelo has joined #archiveteam [00:38] here's blog post about it from my collaborators at rhizome.org: http://rhizome.org/editorial/2015/nov/30/oldweb-today/ [00:38] oh, that screenshot of IE on mac just flooded in memories [00:39] How does it work, is it using VMs? [00:40] Are VM images of the machines available, so that mirrors could be set up, or is it tied to the hardware so that when the old hardware breaks, the service is brought to an end? [00:45] Oh, WOW, the browsers are interactive [00:45] Mind blown [00:45] it's using Docker 'containers', so sort of like VMs, but more lightweight.. its not tied to any hardware [00:45] and also several emulators. everything is open source: https://github.com/ikreymer/netcapsule [00:46] Sweet, thanks! :D [00:46] currently running on a pool of machines using Docker swarm (networking capabilities) [00:46] Do you plan to add more browsers? I'd especially like to see Windows Chrome 1 [00:47] with its plain blue tab bar :p [00:47] for nostalgia when it came out i was in high school, working on my web site XD [00:48] yeah, will probably add more browsers.. great, thanks for the suggestions.. ideally unique or different from what's already there [00:52] *** ikreymer has quit IRC (Quit: http://chat.efnet.org ) [00:52] *** ikreymer has joined #archiveteam [01:03] It's not working for me on Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:42.0) Gecko/20100101 Firefox/42.0 [01:05] It just stays on "Initializing Browser…" forever. [01:05] e.g. http://oldweb.today/chrome/19991201010439/http://example.net [01:11] it gets stuck every time? its possible that websockets are blocked on your network, will try to add better error messaging soon.. [01:16] ah, that's likely it [01:17] "Firefox can't establish a connection to the server at ws://54.88.216.221:32953/websockify." [01:18] my work network is somewhat … anxious … regarding strange ports [01:19] *** Start has joined #archiveteam [01:21] *** SimpBrain has joined #archiveteam [01:23] *** philpem has quit IRC (Ping timeout: 252 seconds) [01:24] *** nightpool has quit IRC (Ping timeout: 183 seconds) [01:36] My Firefox crashed today, I had to restart it. [01:37] *** JW_work has quit IRC (Read error: Operation timed out) [01:38] *** remsen has joined #archiveteam [01:45] *** JW_work has joined #archiveteam [01:54] *** primus104 has quit IRC (Leaving.) [02:02] *** JesseW has joined #archiveteam [02:04] *** Marcelo has quit IRC (Ping timeout: 240 seconds) [02:10] *** khaoohs_ has joined #archiveteam [02:13] does flagging spam on archive.org actually achieve anything? [02:14] *** khaoohs has quit IRC (Read error: Operation timed out) [02:15] doesnt even seem possible to flag a whole account [02:17] jleclanch: the flagging system is still in beta; please also email info at archive.org with the identifiers (i.e. URLs) that you flag. I can testify that the (one) guy who answers that address does appreciate (and make dark) spam URLs sent to him. [02:18] JesseW: it's just there's so much of it, every time I find something to flag there's 10 more links in related media [02:19] JesseW: https://archive.org/search.php?query=escort https://archive.org/search.php?query=web%20design https://archive.org/search.php?query=double%20glazing [02:19] like 90% of all that is spam [02:19] I know. I focused on fake technical support numbers for a while. [02:19] mm [02:20] JesseW: would appreciate the ability to flag a spam account [02:20] You may find it useful to use the python library to semi-automate checking search results and formatting emails. [02:20] I agree. Send it in as a suggestion to info@ [02:20] i dont have that much time on my hands :P [02:20] will do [02:21] yeah. You could also make a page on the archiveteam wiki with a list of "search terms that are 90% spam", which me and others could go through regularly and report. [02:22] * JesseW may do that myself, soon [02:22] love to but like i said, not that much time on my hands :) I'm just uploading stuff when I can [02:24] *** WinterFox has joined #archiveteam [02:25] *** nightpool has joined #archiveteam [02:29] *** bwn has quit IRC (Ping timeout: 252 seconds) [02:30] jleclanch: and it's very appreciated! [02:30] JesseW: pm :) [02:47] *** nightpool has quit IRC (Ping timeout: 183 seconds) [02:47] *** nightpool has joined #archiveteam [02:58] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [02:59] *** nightpool has quit IRC (Ping timeout: 615 seconds) [02:59] *** Lord_Nigh has joined #archiveteam [02:59] *** nightpool has joined #archiveteam [03:01] *** Coderjoe_ has joined #archiveteam [03:02] *** JesseW has quit IRC (Leaving.) [03:03] *** Coderjoe has quit IRC (Read error: Operation timed out) [03:06] *** ikreymer has quit IRC (Quit: http://chat.efnet.org ) [03:16] *** vitzli has joined #archiveteam [03:22] *** WinterFox has quit IRC (Read error: Operation timed out) [03:24] *** nightpool has quit IRC (Ping timeout: 258 seconds) [03:25] *** Marcelo has joined #archiveteam [03:28] *** WinterFox has joined #archiveteam [03:35] *** nightpool has joined #archiveteam [03:54] *** BlueMaxim has joined #archiveteam [04:07] *** nightpool has quit IRC (Ping timeout: 183 seconds) [04:27] *** Marcelo has quit IRC (Quit: Page closed) [04:31] *** Coderjoe_ has quit IRC (Read error: Operation timed out) [04:34] *** Coderjoe has joined #archiveteam [04:56] *** Coderjoe has quit IRC (Read error: Connection reset by peer) [05:01] *** Coderjoe has joined #archiveteam [05:03] *** Marcelo has joined #archiveteam [05:08] *** Marcelo has quit IRC (Quit: http://chat.efnet.org (Ping timeout)) [05:12] *** vitzli has quit IRC (Leaving) [05:16] *** aaaaaaaaa has quit IRC (Leaving) [05:18] *** nightpool has joined #archiveteam [05:23] *** superkuh has quit IRC (Read error: Connection reset by peer) [05:24] *** nightpool has quit IRC (Ping timeout: 360 seconds) [05:25] *** WinterFox has quit IRC (Remote host closed the connection) [05:27] *** WinterFox has joined #archiveteam [05:27] *** superkuh has joined #archiveteam [05:29] *** JesseW has joined #archiveteam [05:30] *** nightpool has joined #archiveteam [05:48] *** ndiddy has quit IRC (Quit: Leaving) [05:48] *** nightpool has quit IRC (Read error: Operation timed out) [05:52] *** xk_id has joined #archiveteam [05:57] I think I've got a stuck item from docstoc; maybe 2... [05:57] *** vitzli has joined #archiveteam [05:58] Item 100documents:264234 is trying to get: http://embed.docstoc.com/handlers/downloadfilefromflash.ashx?docid=26423476&ref_url=http://www.docstoc.com/docs/26423476/images/images/images/images/images/images/images/images/images/images/images/images/images/images/images/images/images/images/images/images/images/images/images/images/images/images/images/images/images/04-09.jpg [05:58] and Item 100documents:578820 has been running for 19 hours. [05:59] *** nightpool has joined #archiveteam [05:59] *** Sk1d has quit IRC (Read error: Operation timed out) [06:01] http://i.imgur.com/bU6ckP0.webm [06:02] ! [06:10] *** nightpool has quit IRC (Ping timeout: 255 seconds) [06:14] *** DMackey- has joined #archiveteam [06:16] *** DMackey has quit IRC (Ping timeout: 310 seconds) [06:27] *** nightpool has joined #archiveteam [07:05] *** nightpool has quit IRC (Ping timeout: 183 seconds) [08:28] Atluxity, are you there? docstoc is down, but queue is 290k items and warriors just download the shutdown message [08:29] *** primus104 has joined #archiveteam [08:30] can anyone stop the docstoc tracker, please? docstoc is gone, and warrior just grabs the shutdown message [08:31] not disabling uploads though, i've still got real ones uploading [08:34] I will shut my hose down [08:34] thanks [08:34] thank you [08:34] *** bwn has joined #archiveteam [08:38] *** atomotic has joined #archiveteam [08:40] *** kyan has quit IRC (Ping timeout: 258 seconds) [08:52] *** afics has quit IRC (Quit: Quit.) [08:52] *** afics has joined #archiveteam [09:03] *** xk_id has quit IRC (Remote host closed the connection) [09:11] *** JesseW has quit IRC (Leaving.) [09:12] *** JesseW has joined #archiveteam [09:13] *** JesseW has quit IRC (Client Quit) [09:25] *** arkiver2 has joined #archiveteam [09:36] *** RedType has quit IRC (Read error: Operation timed out) [09:40] arkiver, are you online? [09:42] There is a problem with docstoc tracker items now, docstoc is gone, but warriors keep downloading shutdown message (0.4 MB file). Could you pause the tracker, please ? It may poison the grab with silly 0.4 MB files [09:45] it will probably be easy enough to fix [09:45] but yes, the tracker should be paused [09:46] my beta-cloud sysadmin came over and wondered if he had broken the network or if I did something [09:46] was pushing 1gbps bidirectional [09:48] (both incomming and outgoing) [09:48] Kenshin: did you feel it? :P [09:52] *** arkiver2 has quit IRC (Ping timeout: 252 seconds) [09:57] *** arkiver2 has joined #archiveteam [10:02] *** nightpool has joined #archiveteam [10:05] I paused the grab [10:06] *** nightpool has quit IRC (Ping timeout: 258 seconds) [10:07] thank you [10:20] *** DMackey has joined #archiveteam [10:20] *** Sk1d has joined #archiveteam [10:23] *** DMackey- has quit IRC (Ping timeout: 310 seconds) [10:32] *** schbirid has joined #archiveteam [10:46] *** arkiver2 has quit IRC (Ping timeout: 252 seconds) [10:46] *** arkiver2 has joined #archiveteam [11:06] *** arkiver2 has quit IRC (Ping timeout: 252 seconds) [11:09] *** xk_id has joined #archiveteam [11:38] *** BlueMaxim has quit IRC (Quit: Leaving) [11:41] *** Stiletto has quit IRC (Read error: Connection reset by peer) [11:42] *** Stiletto has joined #archiveteam [11:43] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [12:08] *** Stiletto has quit IRC (Read error: Connection reset by peer) [12:09] *** Stiletto has joined #archiveteam [12:32] *** Ghost_of_ has joined #archiveteam [12:33] *** remsen has quit IRC (Read error: Operation timed out) [12:39] *** remsen has joined #archiveteam [12:40] *** atomotic has joined #archiveteam [12:41] *** nertzy has joined #archiveteam [12:44] *** dserodio has quit IRC (Quit: ZNC - http://znc.in) [12:47] *** dserodio has joined #archiveteam [12:53] *** RedType has joined #archiveteam [13:04] *** DMackey- has joined #archiveteam [13:07] *** DMackey has quit IRC (Ping timeout: 310 seconds) [13:09] *** WinterFox has quit IRC (Remote host closed the connection) [13:14] *** nomadpeng has joined #archiveteam [13:17] *** primus104 has quit IRC (Leaving.) [13:37] *** nomadpeng has quit IRC (Ping timeout: 244 seconds) [13:37] *** nickname has joined #archiveteam [13:37] Hey, is this the place to ask questions about the archive? [13:38] "the archive"? [13:38] well.... [13:38] Oh sorry, desustorage.org. [13:38] we are not archive.org [13:39] This was listed as the IRC channel for desustorage on http://www.archiveteam.org/index.php?title=4chan [13:40] ah, ok, then I know what you are talking about [13:40] did you have a question? [13:42] Yes, I'm trying to find a specific post, and after some poking around, I've found that quite a large chunk of posts from 2013 to 2015 are inaccessible. Is this just because desustorage is still in the process of getting all the backups properly set up, or are these posts lost completely? [13:50] nickname: which board was the post on? [13:50] /co/ [13:50] Searching for any post from 2014 yields a "post not found" page. [13:50] Oh. [13:51] It's possible that someone has the threads WARC'd somewhere and they haven't been fed into the archives yet. [13:51] (WARC is an archive format) [13:53] *** slyphic|a is now known as slyphic [13:56] That's a relief. How long should I expect to wait for the archive to be at 100%? [14:01] No idea, I don't even know whether someone actually grabbed the threads. [14:07] As far as I know desustorage is supposed to be using the archive.moe dump, and archive.moe had everything from 2012 to 2015 before it went down, and even in the hardware failure they said they only lost four months of data, so it seems like it should exist somewhere. But none of the archive sites that are running appear to have that time period, or they don't have the boards I need at all. [14:17] *** Elegance has quit IRC (Read error: Operation timed out) [14:17] *** Ghost_of_ has quit IRC (Remote host closed the connection) [14:17] *** nickname has quit IRC () [14:19] I can't belive "nickname" was not taken as a nickname :P [14:26] arkiver is kicking ass with the negotiations [14:27] *** scyther has joined #archiveteam [14:38] SketchCow: i just found more lego catalogs [14:38] Great [14:39] its a bit mix [14:39] I discovered the hilarious reason that some of the Archivebot items are taking so long to generate previews of. [14:39] Some of these .WARC files are over 50gb, some over 100gb [14:39] Damn son [14:40] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [14:40] So I spent a little cleaning-the-cube time to add a size checker [14:57] *** Ymgve has joined #archiveteam [15:00] *** DMackey has joined #archiveteam [15:01] *** DMackey- has quit IRC (Ping timeout: 310 seconds) [15:25] *** Start has quit IRC (Quit: Disconnected.) [15:26] *** Stiletto has quit IRC (Read error: Operation timed out) [15:32] *** primus104 has joined #archiveteam [15:52] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [15:53] *** Start has joined #archiveteam [16:04] *** nightpool has joined #archiveteam [16:05] *** primus104 has quit IRC (Leaving.) [16:23] SketchCow: thanks! [16:24] So we are runnig very good with the FTP project, already 800 GB in [16:24] Still some small things though [16:24] SketchCow: can you please remove everything from user 'matthusby' from the ftp rsync target? [16:25] A lot of those items are bad [16:25] still figuring out why [16:27] *** nightpool has quit IRC (Read error: Operation timed out) [16:41] Done [16:42] *** atomotic has joined #archiveteam [16:44] *** atomotic_ has joined #archiveteam [16:44] Just realised my reddit account was created february 29th [16:50] *** atomotic_ has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [16:52] *** atomotic_ has joined #archiveteam [16:52] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [16:54] *** Elegance has joined #archiveteam [16:55] I can't find my car keys [16:55] everyone look [17:01] *** vitzli has quit IRC (Quit: Leaving) [17:03] *** Start has quit IRC (Quit: Disconnected.) [17:09] SketchCow: i found a ton of lego pdfs [17:09] from here: http://worldbricks.com/en/ [17:09] i can brute force the download id too [17:11] this type of grab is most likely going to be upload as zip for a range [17:11] *** JesseW has joined #archiveteam [17:19] *** Start has joined #archiveteam [17:35] *** JesseW has quit IRC (Leaving.) [17:45] *** primus104 has joined #archiveteam [17:47] *** atomotic_ has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [18:17] *** nightpool has joined #archiveteam [18:22] I had this one as a child: http://worldbricks.com/en/instructions-theme/s/space/all/3005-6990-Monorail-Transport-System.html [18:22] *** RichardG has quit IRC (Read error: Connection reset by peer) [18:24] *** RichardG has joined #archiveteam [18:36] *** xk_id has quit IRC (Remote host closed the connection) [18:37] *** scyther has quit IRC (Quit: Leaving) [18:39] *** Start has quit IRC (Quit: Disconnected.) [18:44] *** Stiletto has joined #archiveteam [18:52] SketchCow: you didn't upload a copy to IA? [18:54] *** nightpool has quit IRC (Read error: Operation timed out) [19:10] *** nightpool has joined #archiveteam [19:11] *** primus105 has joined #archiveteam [19:13] *** primus104 has quit IRC (Read error: Operation timed out) [19:30] *** n00b954 has joined #archiveteam [19:34] *** bwn has quit IRC (Ping timeout: 606 seconds) [19:35] Is there anywhere to check what percentage of Docstoc was collect? http://tracker.archiveteam.org/docstoc/ is listing "285081 to do" which does not sound right [19:35] *collected [19:37] *** Start has joined #archiveteam [19:37] 2 fast 2 archive [19:43] media3.steampowered.com folder is full uploaded now on ftp [19:52] godane: full uploaded on ftp? [19:52] what's the ftp? [19:53] FOS [20:01] *** bwn has joined #archiveteam [20:06] *** scyther has joined #archiveteam [20:09] *** WinterFox has joined #archiveteam [20:12] *** atomotic has joined #archiveteam [20:21] zite is shutting down soon: http://blog.zite.com/2015/08/27/migrate-your-zite-to-flipboard/ [20:21] http://readwrite.com/2014/03/05/why-zite-flipboard-acquisition-cnn-perfect-news-reading-experience [20:42] *** Start has quit IRC (Quit: Disconnected.) [20:48] *** Start has joined #archiveteam [20:52] Might be good to dump http://selfiecity.net/ into #archivebot — seems like a limited-time art project that will probably go away. [20:54] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [20:58] *** aaaaaaaaa has joined #archiveteam [21:01] *** nightpool has quit IRC (Ping timeout: 310 seconds) [21:07] *** GLaDOS has quit IRC (Read error: Operation timed out) [21:09] *** GLaDOS has joined #archiveteam [22:00] *** remsen has quit IRC (Read error: Operation timed out) [22:02] *** BlueMaxim has joined #archiveteam [22:11] *** nightpool has joined #archiveteam [22:11] *** scyther has quit IRC (Read error: Connection reset by peer) [22:12] *** Meeh has joined #archiveteam [22:16] *** jmad980 has quit IRC (Read error: Operation timed out) [22:16] *** Start has quit IRC (Quit: Disconnected.) [22:16] *** n00b954 has quit IRC (Quit: Page closed) [22:22] *** nightpool has quit IRC (Read error: Operation timed out) [22:23] *** bwn has quit IRC (Ping timeout: 606 seconds) [22:25] *** jmad980 has joined #archiveteam [22:30] *** Ghost_of_ has joined #archiveteam [22:39] *** ndiddy has joined #archiveteam [22:49] *** Lord_Nigh has joined #archiveteam [23:16] *** schbirid has quit IRC (Quit: Leaving) [23:17] *** Start has joined #archiveteam [23:26] *** nightpool has joined #archiveteam [23:33] *** bwn has joined #archiveteam [23:47] *** remsen has joined #archiveteam [23:49] *** nightpool has quit IRC (Read error: Operation timed out) [23:59] SketchCow: do we still need ftp.sunet.se saved?