[00:13] JetBalsa, this channel is logged FYI [00:13] Its Archive Team, I figured [00:14] that pass is used for temp accounts we hand out, its changed pretty much asap, but Thanks for the note [00:14] A surprisingly large number of people don't. [00:14] hahah [00:19] most of the archiveteam channels are *not* logged, at least in public [00:19] just the always-on ones [00:20] that should be apart of the archive [00:20] the channel logs, documenting the saving of everything [00:20] *** VADemon has quit IRC (left4dead) [00:34] Anything happened to FOS? My upload has tanked [00:35] *** bwn_ has joined #archiveteam [00:36] Please update your scripts for Google Code! [00:42] *** bwn has quit IRC (Read error: Operation timed out) [01:19] *** philpem has quit IRC (Ping timeout: 252 seconds) [01:20] *** SN4T14 has quit IRC (Read error: Operation timed out) [01:28] *** SN4T14 has joined #archiveteam [01:38] *** jleclanch has quit IRC (Read error: Operation timed out) [01:46] *** jleclanch has joined #archiveteam [01:47] *** JesseW has joined #archiveteam [02:01] *** jleclanch has quit IRC (Ping timeout: 255 seconds) [02:14] *** bwn_ has quit IRC (Read error: Connection reset by peer) [02:14] *** bwn has joined #archiveteam [02:22] *** schbirid2 has joined #archiveteam [02:24] *** schbirid has quit IRC (Read error: Operation timed out) [02:35] *** vitzli has joined #archiveteam [02:44] *** HCross has quit IRC (Max SendQ exceeded) [02:44] *** Fusl has quit IRC (Max SendQ exceeded) [02:44] *** HCross has joined #archiveteam [02:45] *** Fusl has joined #archiveteam [02:47] *** _desu___ has quit IRC (Ping timeout: 252 seconds) [02:50] *** _desu___ has joined #archiveteam [02:52] *** xmc has quit IRC (Quit: brb) [03:12] *** bwn has quit IRC (Read error: Operation timed out) [03:24] *** xmc has joined #archiveteam [03:24] *** swebb sets mode: +o xmc [03:36] *** JetBalsa has quit IRC (Quit: - nbs-irc 2.39 - www.nbs-irc.net -) [03:36] tree3: I've started re-grabbing the ones that failed on the first time, but it looks like he's started setting them to private [03:37] even though it's not saturday yet [03:37] :( [03:37] So I won't be able to get those unless you can convince him to set them to "unlisted" [03:38] I got most of the ones that were available though (289 I'm retrying). [03:38] So that's 4161 successfully grabbed (or were already taken down due to Content ID). [03:39] They're still uploading, though [03:39] Also he's put up some new ones since then, which I haven't gotten yet. [03:41] *** andrew_m has joined #archiveteam [03:42] *** andrew_m has quit IRC (Client Quit) [04:03] *** tree33 has joined #archiveteam [04:09] *** tree3 has quit IRC (Read error: Operation timed out) [04:27] Can someone stop the google code grab? [04:30] yipdw, chfoo, arkiver ping ^^ [04:31] it is 2015-12-12 04:31 UTC, for logging purposes [04:33] i stopped it [04:33] what was wrong with it? [04:34] augie_at in #googlecodeblue logged in and asked to reduce the load, he is currently in #googlecodeblue [04:35] load spikes trip ddos alerts to the google staff [04:37] *** Ghost_of_ has quit IRC (Remote host closed the connection) [05:01] *** nertzy has joined #archiveteam [05:08] *** aaaaaaaaa has quit IRC (Leaving) [05:20] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [05:48] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [05:50] tree33: I'm pretty sure I've gotten all of the ones that I didn't get the first time around and haven't already been set to private. [05:50] I don't plan on getting the more recent videos (posted since the shutdown alert) unless you let me know they're at risk. [05:51] It'll be a while before everything shows up on archive.org, though. [05:57] *** Sk1d has joined #archiveteam [06:13] *** redlob has quit IRC (Read error: Operation timed out) [06:14] *** jleclanch has joined #archiveteam [06:17] *** redlob has joined #archiveteam [06:53] *** kniffy has quit IRC (Excess Flood) [07:01] *** kniffy has joined #archiveteam [07:01] *** kniffy has quit IRC (Excess Flood) [07:02] *** kniffy has joined #archiveteam [07:04] *** Froggypwn has quit IRC (Read error: Operation timed out) [07:20] *** dtm has quit IRC (hub.efnet.us irc.Prison.NET) [07:20] *** JW_work has quit IRC (hub.efnet.us irc.Prison.NET) [07:20] *** kyan has quit IRC (hub.efnet.us irc.Prison.NET) [07:20] *** logan has quit IRC (hub.efnet.us irc.Prison.NET) [07:20] *** patrickod has quit IRC (hub.efnet.us irc.Prison.NET) [07:27] *** chfoo has quit IRC (Ping timeout: 310 seconds) [07:29] *** dtm has joined #archiveteam [07:29] *** JW_work has joined #archiveteam [07:29] *** kyan has joined #archiveteam [07:29] *** logan has joined #archiveteam [07:29] *** patrickod has joined #archiveteam [07:36] *** BlueMaxim has quit IRC (Quit: Leaving) [07:59] *** vitzli has quit IRC (Quit: Leaving) [08:34] *** bwn has joined #archiveteam [09:23] *** asdf has joined #archiveteam [09:23] *** JesseW has quit IRC (Leaving.) [09:28] *** acridAxid has quit IRC (Quit: marauder) [09:29] *** acridAxid has joined #archiveteam [10:17] *** remsen has quit IRC (Leaving) [10:17] *** remsen has joined #archiveteam [10:50] *** vitzli has joined #archiveteam [10:59] *** vOYtEC has quit IRC (Quit: rm -r *) [11:31] *** Jogie has quit IRC (Ping timeout: 506 seconds) [11:31] *** Jogie has joined #archiveteam [11:48] *** VADemon has joined #archiveteam [12:03] *** vOYtEC has joined #archiveteam [12:36] *** vOYtEC has quit IRC (Quit: rm -r *) [12:42] *** Ghost_of_ has joined #archiveteam [12:52] *** vOYtEC has joined #archiveteam [13:46] *** WinterFox has quit IRC (Remote host closed the connection) [14:03] *** remsen2 has joined #archiveteam [14:03] *** rizzzz has quit IRC (Remote host closed the connection) [14:03] *** rizzzz has joined #archiveteam [14:05] *** remsen has quit IRC (Read error: Operation timed out) [14:21] *** altlabel has quit IRC (Ping timeout: 506 seconds) [14:24] *** fie has joined #archiveteam [14:26] *** xmc has quit IRC (Read error: Operation timed out) [14:27] *** K4k has joined #archiveteam [14:54] *** K4k has quit IRC (WeeChat 1.0.1) [14:54] *** K4k has joined #archiveteam [14:58] *** foobar_ has joined #archiveteam [15:00] *** nertzy has joined #archiveteam [15:01] *** foobar_ has quit IRC (Client Quit) [15:06] *** foobar_ has joined #archiveteam [15:07] Hi, does someone know the current status of the Gitorious archiving project? [15:07] Are you looking for any volunteers? [15:12] foobar_: http://archive.fart.website/bin/irclogger_log/archiveteam?date=2015-12-11,Fri&sel=195#l191 [15:18] Where's my hug!! [15:19] @PurpleSysm: Thanks! [15:20] *** foobar_ has left [15:20] Got my telethon haircut [15:21] IA OCR is dealing with piles of Macintosh books [15:25] HUG [15:25] Good luck with the telethon! [15:27] I guess we'll later hear more about the livestreams [15:29] telethon.archive.org is the site that will have all info. [15:30] Yes. Good luck SketchCow - will be there in spirit (while actually being over 5000 miles away :D) [15:38] So, FOS is now slightly filling with FTP. [15:38] 64% [15:38] And I have things wiping away items ater they're uploaded. [15:39] Want us to slow a lot lot down? [15:42] *** kniffy has quit IRC (Ping timeout: 252 seconds) [15:46] Well, of the 4.7tb being used, 3.3 is FTP [15:50] Maybe we should create a server which is only for the FTP grab [15:50] *** DopefishJ has joined #archiveteam [15:50] *** swebb sets mode: +o DopefishJ [15:50] So it won't affect the other grabs we do [15:50] *** Start_ has joined #archiveteam [15:50] *** Start has quit IRC (Read error: Connection reset by peer) [15:51] arkiver, if we did that, could it be EU based please? [15:51] *** godane has quit IRC (Quit: Leaving.) [15:51] *** DFJustin has quit IRC (Read error: Operation timed out) [15:55] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [16:12] *** n00b390 has joined #archiveteam [16:13] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [16:14] "yahoosucks" [16:14] Hail! [16:14] And many thanks! [16:15] ^_^ [16:15] *** n00b390 has quit IRC (Client Quit) [16:15] *sigh* [16:15] With great powers comes great responsibility [16:20] *** kniffy_ has joined #archiveteam [16:22] *** kniffy_ is now known as kniffy [16:31] *** kniffy has quit IRC (Quit: :^)) [16:32] *** kniffy has joined #archiveteam [16:33] *** chfoo has joined #archiveteam [16:45] *** SN4T14_ has joined #archiveteam [16:46] *** SN4T14 has quit IRC (Read error: Operation timed out) [16:47] *** sep332 has quit IRC (Read error: Operation timed out) [16:49] *** altlabel has joined #archiveteam [16:50] *** vitzli has quit IRC (Leaving) [16:54] *** kyan has quit IRC (Ping timeout: 258 seconds) [16:56] *** godane has joined #archiveteam [16:59] *** JesseW has joined #archiveteam [17:09] *** JetBalsa has joined #archiveteam [17:12] *** tree33 is now known as tree3 [17:18] *** antomati_ has joined #archiveteam [17:18] *** swebb sets mode: +o antomati_ [17:18] *** wxtr has quit IRC (Read error: Operation timed out) [17:18] *** RichardG_ has joined #archiveteam [17:18] *** Fletcher has quit IRC (Read error: Operation timed out) [17:18] *** Famicoman has quit IRC (Read error: Operation timed out) [17:18] *** afics has quit IRC (Read error: Operation timed out) [17:18] *** antomatic has quit IRC (Read error: Operation timed out) [17:18] *** no2pencil has quit IRC (Read error: Operation timed out) [17:19] *** no2pencil has joined #archiveteam [17:19] *** Stiletto has quit IRC (Read error: Operation timed out) [17:19] *** Stiletto has joined #archiveteam [17:19] *** nox has quit IRC (Read error: Operation timed out) [17:20] *** wxtr has joined #archiveteam [17:20] *** cadbury has quit IRC (Read error: Operation timed out) [17:20] *** Apathy has quit IRC (Read error: Operation timed out) [17:21] *** [phire] has quit IRC (Read error: Operation timed out) [17:22] *** brayden_ has quit IRC (Read error: Operation timed out) [17:22] *** vtyl has joined #archiveteam [17:23] *** wp494 has quit IRC (Read error: Operation timed out) [17:23] *** mistym has quit IRC (Ping timeout: 606 seconds) [17:23] *** wp494 has joined #archiveteam [17:23] *** nox has joined #archiveteam [17:23] *** mistym has joined #archiveteam [17:24] *** Start_ is now known as Start [17:24] *** RichardG has quit IRC (Ping timeout: 606 seconds) [17:26] *** lytv has quit IRC (Ping timeout: 606 seconds) [17:28] *** Fletcher has joined #archiveteam [17:31] *** Emcy_ has joined #archiveteam [17:32] *** RichardG has joined #archiveteam [17:38] *** wp494 has quit IRC (hub.se efnet.portlane.se) [17:38] *** RichardG_ has quit IRC (hub.se efnet.portlane.se) [17:38] *** Sk1d has quit IRC (hub.se efnet.portlane.se) [17:38] *** dashcloud has quit IRC (hub.se efnet.portlane.se) [17:38] *** ParkerR has quit IRC (hub.se efnet.portlane.se) [17:38] *** Elegance has quit IRC (hub.se efnet.portlane.se) [17:38] *** Gfy has quit IRC (hub.se efnet.portlane.se) [17:38] *** thefinn93 has quit IRC (hub.se efnet.portlane.se) [17:38] *** Emcy has quit IRC (hub.se efnet.portlane.se) [17:39] *** Elegance_ has joined #archiveteam [17:42] *** Gfy_ has joined #archiveteam [17:47] *** Apathy has joined #archiveteam [17:47] *** parker_ has joined #archiveteam [17:47] *** afics has joined #archiveteam [17:49] *** JesseW has quit IRC (Leaving.) [17:50] *** cadbury has joined #archiveteam [17:50] *** thefinn91 has joined #archiveteam [17:54] *** Gfy_ is now known as Gfy [17:54] *** wp494 has joined #archiveteam [17:55] *** dashcloud has joined #archiveteam [18:03] *** [phire] has joined #archiveteam [18:15] *** brayden_ has joined #archiveteam [18:15] *** swebb sets mode: +o brayden_ [18:18] *** remsen has joined #archiveteam [18:22] *** R5M has joined #archiveteam [18:22] *** R5M has quit IRC (Client Quit) [18:24] *** Froggypwn has joined #archiveteam [18:27] *** Ghost_of_ has quit IRC (Quit: Leaving) [18:28] *** remsen2 has quit IRC (Read error: Operation timed out) [18:29] *** Famicoman has joined #archiveteam [18:29] *** chfoo- has quit IRC (ZNC - 1.6.0 - http://znc.in) [18:30] *** chfoo- has joined #archiveteam [18:31] *** remsen has quit IRC (Read error: Operation timed out) [18:32] *** SimpBrain has quit IRC (Read error: Operation timed out) [18:41] *** sep332 has joined #archiveteam [18:42] *** SimpBrain has joined #archiveteam [18:54] *** SimpBrain has quit IRC (Leaving) [18:54] *** SimpBrain has joined #archiveteam [18:57] *** JesseW has joined #archiveteam [19:00] *** dashcloud has quit IRC (Read error: Operation timed out) [19:04] *** dashcloud has joined #archiveteam [19:13] *** Atom-- has quit IRC (Read error: Connection reset by peer) [19:19] *** thefinn91 is now known as thefinn93 [19:21] *** xXx_ndidd has joined #archiveteam [19:24] *** aaaaaaaaa has joined #archiveteam [19:24] *** swebb sets mode: +o aaaaaaaaa [19:28] *** ndiddy has quit IRC (Read error: Operation timed out) [19:35] *** toad2 has joined #archiveteam [19:36] *** toad1 has quit IRC (Read error: Operation timed out) [19:54] *** bwn has quit IRC (Read error: Operation timed out) [20:01] *** Start_ has joined #archiveteam [20:01] *** Start has quit IRC (Read error: Connection reset by peer) [20:31] *** xXx_ndidd has quit IRC (Read error: Connection reset by peer) [20:34] *** ndiddy has joined #archiveteam [20:34] *** ndiddy has quit IRC (Read error: Connection reset by peer) [20:37] *** ndiddy has joined #archiveteam [20:43] *** bwn has joined #archiveteam [20:55] http://www.cryengine.com/community/downloads.php is going away because of http://www.cryengine.com/news/the-new-cryenginecom-is-coming-next-week Best way of getting it all [20:56] We have until Monday [20:56] *** Ghost_of_ has joined #archiveteam [21:21] *** scyther has joined #archiveteam [21:23] *** Dennisjr1 has joined #archiveteam [21:23] *** philpem has joined #archiveteam [21:26] I should of added, put it in ArchiveBot but was wondering if a warrior task is needed [21:26] i don't know if archivebot would take it [21:26] it uses javascript for the download buttons [21:26] *** ndiddy has quit IRC (Read error: Connection reset by peer) [21:26] hmm. What is the best way of saving it then [21:26] ive put --phantom-js on [21:27] *** ndiddy has joined #archiveteam [21:28] IMHO its important [21:28] i know [21:29] but looks like its not grabbing the files [21:32] ah. Do we need a warrior project? [21:33] i'm trying to get files in wget [21:33] wget -e robots=off --user-agent=Firefox --post-data="submit=Download&hotlink_id=&df_id=4329&modcp=0&cat_id=109&hotlink_id=&view=load" http://www.cryengine.com/community/downloads.php [21:33] its not working [21:34] HCross: least overhead is for you or someone to just resolve the URLs and download them [21:34] hmm ok [21:34] warrior is overkill, Crytek isn't that large [21:35] tru, but there are quite a few [21:35] computers are pretty good at doing repetitive tasks quickly [21:37] tempted to fire up a web browser and start clicking download a lot [21:38] Does the wayback machine handle POST requests correctly? [21:38] how would it possibly do so [21:39] It could match the input parameters with requests it has in WARCs and hope for the best. [21:39] *** Start_ is now known as Start [21:40] how do i put the post data in so wget get the file [21:40] GET works fine, btw: http://www.cryengine.com/community/downloads.php?submit=Download&hotlink_id=&df_id=5310&modcp=0&cat_id=125&hotlink_id=&view=load [21:40] Thank you [21:41] i thought there was a way to do it using GET [21:41] I don't know if IA's wayback does, but emulating POST like that is asking for a lot of trouble [21:41] i will start archiving [21:41] Thanks godane [21:41] let me know if you need a hand with bandwith or something [21:42] Sure, yipdw, for the requests that actually *modify* something things will go terribly wrong. [21:42] godane, are you aware of the deadline? [21:42] yes [21:42] yes, which is most POST requests [21:42] http://www.cryengine.com/community/downloads.php?submit=Download&hotlink_id=&df_id=5310 [21:42] that works too [21:42] anyway this is offtopic [21:43] True, I’m sorry. [21:46] *** Ghost_of_ has quit IRC (Quit: Leaving) [21:46] i can't get wget to work with it [21:48] HELP [21:51] Access denied! [21:52] Sorry I cant really help with this [21:54] curl -A user-agent -I the-get-url you had above [21:54] seems to get me a 301 redirect to http://crytekfiles.com/files/CRYENGINE_Build_PC_v3_5_8_2310_freesdk.zip [21:54] which seems to download OK [21:58] full code [21:58] its not working for me [21:59] working on it [22:02] can you get a list of df_id's (from the view=detail links on the download pages)? We probably want the view=detail pages in any case, and they should be grabbale with grab-site (or wget, etc) [22:02] doing this still give me access denied pages: curl -A firefox -i 'http://www.cryengine.com/community/downloads.php?view=load&hotlink_id=&code=&df_id=5310' [22:05] captal -I ,not lowercase [22:05] doing a HEAD request [22:07] still working on it [22:12] *** melody has quit IRC (Ping timeout: 252 seconds) [22:12] *** melody has joined #archiveteam [22:13] i'm grabbing 6000 download pages [22:13] just the pages [22:15] looks like i grab the pages [22:16] *i can grab the pages [22:19] awesome, we'll need those [22:19] I'm close to getting code to translate them into downloads [22:19] bash shell is OK? [22:21] yes [22:22] function cryengine_download {foo=$(curl -I -A 'Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0 Iceweasel/38.3.0' 'http://www.cryengine.com/community/downloads.php?submit=Download&df_id='$1'&view=load' | awk '/Location:/{print $2}'); wget "${foo:0:-1}"; } [22:22] pass it a df_id [22:23] it will download the file to the current directory [22:23] do we have a list of df_id's? [22:23] godane: is working on that -- that's what the download pages are [22:23] or can we just have a list of numbers and go from there [22:23] e.g. http://www.cryengine.com/community/downloads.php?view=detail&category=45&df_id=5289 [22:23] the df_id is 5289 [22:24] its working [22:25] your script [22:25] i'm going to brute force it [22:27] nice [22:27] On behalf of everyone who plays Crysis, thanks everyone [22:27] curl 'http://www.cryengine.com/community/downloads.php?&sort_by=0&order=ASC&start=80' > ~/blahxxx | awk -F \" '/df_id\"/{print $6}' [22:27] will get you a list of df_ids on the page [22:28] Do we need a channel for this? [22:28] it looks like are about 2,640 downloads [22:28] * JesseW shrug [22:28] I don't think there's that much more to talk about. as long as godane keeps us updated if he needs someone else to take part of the range, I think we're good. [22:29] What does the archive size for this look like/how much data are we looking at here? [22:30] *** scyther has quit IRC (Read error: Connection reset by peer) [22:37] Dennisjr1: most of the downloads appear to be in the few megabytes range, which would give us a few gigabytes total. There may be some big ones buried among them, though. [22:38] JesseW: ah that's not too bad then :) [22:50] *** Rickster has quit IRC (Ping timeout: 252 seconds) [22:50] *** wutno has quit IRC (Ping timeout: 252 seconds) [22:51] *** diacope has quit IRC (Ping timeout: 252 seconds) [22:51] *** sigkell has quit IRC (Ping timeout: 252 seconds) [22:51] *** sigkell has joined #archiveteam [22:52] *** Zebranky has quit IRC (Ping timeout: 252 seconds) [22:52] *** Zebranky has joined #archiveteam [22:52] *** Fletcher has quit IRC (Ping timeout: 252 seconds) [22:52] *** _desu___ has quit IRC (Ping timeout: 252 seconds) [22:52] *** zyphlar has quit IRC (Ping timeout: 252 seconds) [22:52] *** Atluxity has quit IRC (Ping timeout: 252 seconds) [22:52] *** _desu___ has joined #archiveteam [22:52] *** Atluxity has joined #archiveteam [22:53] *** Rickster has joined #archiveteam [22:53] *** Fletcher has joined #archiveteam [22:53] *** zyphlar has joined #archiveteam [22:54] *** diacope has joined #archiveteam [22:56] *** bauruine has quit IRC (Ping timeout: 252 seconds) [22:57] *** bauruine has joined #archiveteam [23:00] *** WinterFox has joined #archiveteam [23:06] godane, JesseW, are we underway and grabbing? [23:09] I just helped with coding -- godane is the one doing the grab [23:10] ah ok. [23:10] Thanks BTW [23:10] sure [23:10] They have totally done a Yahoo [23:10] it's a dammed shame they couldn't, you know, make a torrent of all of it and seed it for a month [23:11] that would be a *responsible* way to stop hosting it... [23:11] most of these companies dont have any sense [23:15] *** asdf has quit IRC (Quit: Leaving) [23:26] *** JesseW has quit IRC (Leaving.) [23:29] So how's cryengine? any help needed? [23:29] godane: can you give me an example list of links saved by your script? [23:33] *** Ghost_of_ has joined #archiveteam [23:37] I want to save this with POST requests too [23:42] *** ats has quit IRC (Quit: let's see if installing a new graphics card has got any less painful in the last ten years) [23:43] I'm only finding cat_id 87 [23:44] nevermind [23:47] curl -I -A 'Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0 Iceweasel/38.3.0' 'http://www.cryengine.com/community/downloads.php?submit=Download&df_id='1'&view=load [23:48] ok [23:48] so no WARCs? [23:51] anyway, please continue the grab godane [23:51] *** dashcloud has quit IRC (Read error: Operation timed out) [23:51] will do my best to also grab these in WARCs [23:52] so i got the WARC of the pages [23:53] ok, I'll do them for the POST requests and actual files too then [23:53] 6000 pages but only 2636 exist [23:53] the 2636 that have files anyways [23:54] http://pastebin.com/DfXyp6nu [23:56] *** dashcloud has joined #archiveteam [23:57] thanks!