[00:11] *** BlueMaxim has joined #archiveteam [00:11] Coursera scripts are online. [00:28] *** RichardG has quit IRC (Ping timeout: 250 seconds) [00:34] *** db48x has joined #archiveteam [00:49] *** ris has quit IRC () [01:07] Coursera scripts are updated [01:07] items loaded [01:07] 217 courses [01:09] project started. [01:33] nice arkiver [02:07] *** nickname_ has joined #archiveteam [03:27] *** Igloo_ has joined #archiveteam [03:27] *** nickname_ has quit IRC (Read error: Connection reset by peer) [03:29] *** Igloo has quit IRC (Read error: Operation timed out) [04:00] *** DoomTay has joined #archiveteam [04:47] *** jrwr has quit IRC (Read error: Operation timed out) [04:56] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [05:04] *** tomwsmf-a has quit IRC (Ping timeout: 258 seconds) [05:05] *** Sk1d has joined #archiveteam [05:23] *** metalcamp has joined #archiveteam [05:40] *** JesseW has joined #archiveteam [06:09] citeseerxpdf tracker returned 503 for me (using the Warrior). If this is unexpected, there may be something to fix. [06:11] *** DoomTay has quit IRC (Quit: Page closed) [06:11] googlecode: 11/5 = 0 http://jaraya-.googlecode.com/. Server returned 0. Sleeping. [06:13] *** VADemon has quit IRC (Quit: left4dead) [06:14] Failed WgetDownload for Item oldcourse:clinicalneurology FileNotFoundError: [Errno 2] No such file or directory: 'v3' [06:20] And if someone (arkiver?) could requeue oldcourse:algorithms, it apparently managed to write to a bad block on my drive and is failing to upload (or read locally). [06:24] And apparently the failure mode here makes me kill off the entire pipeline, as this'll be an infinite loop of retries, so I guess requeue oldcourse:{researchforhealth, dmathgen, inforiskman} too. [06:26] *** JesseW has quit IRC (Read error: Operation timed out) [06:33] *** fie__ has quit IRC (Quit: Leaving) [07:26] *** ris has joined #archiveteam [07:30] *** ris has quit IRC (Client Quit) [07:55] *** schbirid has joined #archiveteam [08:28] *** WinterFox has joined #archiveteam [09:47] I am trying to set up the coursera scripts on a ubuntu 14.04 server [09:47] wget does not compile [09:47] someone got url to wget-lua I can use? [09:51] nvm, found it on another server [09:54] Meroje: do you currently have any items running? [09:54] Due to the size of the items and the time we have we need to keep each other informed of failing items, so I can requeue fast [09:55] aschmitz: all requeued [09:55] Only aschmitz has returned items currently [09:55] Meroje: how is your uploadspeed to FOS? [09:56] arkiver: I got juice again, where should I aim my efforts? [09:57] Atluxity: do you have enough space and good speed to FOS? [09:57] there's 3 items available in coursera, you can take those [09:58] let me know when you're running coursera, I'll make the 3 available then [09:58] I got 15G, on about 20 servers, but speed should be good [09:58] ok, 2 sec [09:58] 15G in total? [09:58] over the 20 servers [09:58] no [09:58] on each [09:58] ok [09:59] might be good to take one job per server [10:02] I am running [10:03] items released. [10:03] i can run some. Speed to fos is decent(~600-800Mbit/s, ~1TB free) [10:03] that's good [10:04] can I use a warrior or does it have to be script? [10:04] you can use the warrior [10:04] currently no items are available though [10:05] I'm not sure about Meroje, that person has 152 items [10:05] and not returned anything yet [10:05] arkiver: http://pastebin.com/qtKUBwzX [10:05] not a success [10:05] is that on all three? [10:05] yes [10:06] warrior? [10:07] no, pipeline-script [10:07] I have a warrior up now waiting for work. concurrent 2 [10:08] added 3 items to Medowar [10:08] let me know if you get the same error [10:09] instant fail [10:09] same v3 problem? [10:09] http://pastebin.com/55nbJC9B [10:09] yes [10:10] maybe you have a diferent version of wget. for me it creates a v3 file. Let me remove that v3 deletion line and try again. [10:14] so if you get the v3 error logging in didn't go right [10:14] currently wget is used to login. [10:16] GNU Wget 1.14.lua.20130523-9a5c built on linux-gnu. [10:17] Medowar: you did 2 items [10:17] nah, they're bad [10:17] ok [10:18] i gave arkiver the instance. [10:19] afk now [10:23] *** BartoCH has quit IRC (Quit: WeeChat 1.5) [10:24] Newest version is working! [10:25] *** BartoCH has joined #archiveteam [10:26] git? [10:26] or wget [10:26] on git [10:28] Let me know when you are set up, I'll give you one item so we can test speed to FOS [10:28] Medowar has some amazing speed to FOS [10:33] SketchCow: is the upload of coursera to FOS started? [10:34] err, to IA [10:34] Nope [10:35] arkiver: ok, I am running the latest git [10:35] Which explains the puffy filesystem [10:36] Atluxity: I mean the newest version is https://github.com/ArchiveTeam/coursera-grab version 20160627.07 [10:36] you need that [10:36] root@teamarchive0:/1/ARCHIVETEAM# du -sh COURSERA [10:36] 25G COURSERA [10:36] Barely enough for item 1 [10:37] arkiver: I am up to date with master [10:37] Yes, but it might be good to get an item up to IA as soon as possible, so we can see how it runs on there [10:37] Atluxity: ok [10:38] Atluxity: I gave you one item [10:38] please let me know how it goes [10:39] especially speed to FOS and cloudfront for downloading the videos [10:39] arkiver, want me to throw 1 concurrent on there? [10:39] yes [10:39] if you have a good speed to FOS [10:40] how much space? [10:40] arkiver: same error :\ [10:40] I'd say keep 20 GB for each item [10:40] They probably won't be bigger than 10 GB, but we never know [10:40] Atluxity: wget-lua problem? [10:41] v3 again right? You might have a problem with wget-lua [10:41] is it compiled correctly? [10:41] same error, yes [10:42] I have noe way of checking if it is compiled correctly... [10:42] It is wget, compiled with lua [10:42] I'll be afk for bit, I'll check if I am back [10:42] talk then [10:44] yes [10:49] Atluxity, was the download for http://warriorhq.archiveteam.org/downloads/wget-lua/wget-1.14.lua.LATEST.tar.bz2 slow at all for you? [10:49] I did not notice [10:49] seems to be taking an age for me [10:51] OSError: [Errno 13] Permission denied: '/home/archiveteam/coursera-grab/data' [11:36] *** j08nY has joined #archiveteam [11:41] ARCHIVE TEAM: Grab https://web.archive.org/web/20160520211303/http://www.voteleavetakecontrol.org/ and put it somewhere not Internet Archive [11:41] *** metal_cam has joined #archiveteam [11:42] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [11:47] *** metal_cam has quit IRC (Read error: Operation timed out) [11:48] *** metalcamp has joined #archiveteam [11:49] SketchCow pls give more details [11:51] I've grown to like the Zardoz movie? [11:52] awww. vote leave trying to erase their lies [11:53] SketchCow i don't think i get that reference if it's a reference :P [11:53] i'll fire a grab-site now [11:54] luckcolor, tried it, and it was downloading random IA collections [11:54] sigh [11:54] we totally need better tools to getting data out of the archive.org [11:54] rather than in [11:54] :) [11:55] SketchCow: any idea on when it's going down? [11:56] Id say legal obligations, and I dont think SketchCow will say more, since he may not be allowed to. [11:56] arkiver: 119=0 https://class.coursera.org/criminallaw-001/wiki/discusion. [11:56] Server returned 0 (AUTHFAILED). Sleeping. [11:56] yeah [11:56] saw it [11:56] arkiver, did you see my permissions error? [11:57] I have no idea on that error unfortunately [11:57] Maybe remove the dir and try again [11:57] de /data dir [11:57] the* [12:05] *** kristian_ has joined #archiveteam [12:06] fos limited to 100mbit/connection? cant get more [12:12] zino: can we use your target as well, next to FOS? [12:20] fuck [12:20] the account is banned [12:20] {"message":"You are temporarily blocked from loading this URL. Please try again later."} [12:28] oh it really was temporary [12:28] nice, let's hope it hold this time [12:29] *** BlueMaxim has quit IRC (Quit: Leaving) [12:32] HCross: did removing the dir fix the error for you? [12:33] not sure, will be trying now [12:34] nope - seems to not like creating the directory [12:34] got an idea. hang on [12:34] ok [12:37] arkiver, fixed it [12:38] awesome :) [12:38] you should see it working away [12:38] I hope you have some good speeds to FOS [12:38] its in the EU, but its OVH so it shouldnt be that slow [12:49] 17 cuban sites now in newsgrabber [12:50] as ive banged on about before, if anyone else wants to run a grabber - do let either me or arkiver know [12:51] ^ would be greatly appreciated [13:15] how good does the FOS speed need to be? [13:16] I can probably help out if you need it [13:16] Well, some items are 10 GB [13:17] I run archivebot and it seems to max out my 100M when it uploads. I think it'll be all right [13:18] unless something has changed since I last looked at it [13:26] Would a couple of !ao only pipelines be useful for archivebot? [13:28] *** WinterFox has quit IRC (Remote host closed the connection) [13:29] *** JW_work1 has joined #archiveteam [13:33] *** JW_work has quit IRC (Ping timeout: 370 seconds) [14:03] *** RichardG has joined #archiveteam [14:21] *** kristian_ has quit IRC (Leaving) [14:35] *** VADemon has joined #archiveteam [14:39] *** dan- has quit IRC (Read error: Operation timed out) [14:44] *** VADemon_ has joined #archiveteam [14:47] *** VADemon_ has quit IRC (Read error: Connection reset by peer) [14:52] *** JesseW has joined #archiveteam [14:52] *** VADemon has quit IRC (Read error: Operation timed out) [15:00] *** dan- has joined #archiveteam [15:24] *** JesseW has quit IRC (Ping timeout: 370 seconds) [15:58] *** MMovie1 has joined #archiveteam [15:59] *** MMovie has quit IRC (Read error: Operation timed out) [15:59] *** VADemon has joined #archiveteam [16:02] *** MMovie has joined #archiveteam [16:05] *** dan- has quit IRC (Ping timeout: 260 seconds) [16:08] *** MMovie1 has quit IRC (Read error: Operation timed out) [16:22] *** ndizzle has joined #archiveteam [16:25] *** DoomTay has joined #archiveteam [16:28] *** xXx_ndidd has quit IRC (Read error: Operation timed out) [16:51] *** dan- has joined #archiveteam [17:12] *** dashcloud has quit IRC (Read error: Operation timed out) [17:13] *** dashcloud has joined #archiveteam [17:13] *** alfie has quit IRC (Quit: Seeeya! - ZNC 1.6.3+deb1+jessie0) [17:18] *** alfie has joined #archiveteam [17:30] *** qwebirc87 has joined #archiveteam [17:30] A lot of 429, Retrfinished... Why? [17:30] Project: Coursera [17:31] we need more accounts [17:38] Ok... Can't we create more... using mailinator email or so? [17:38] Btw, why is Rsync threads missing from my advanced settings page? [17:38] 1 warrior has it, the others not... [17:50] *** qwebirc87 has quit IRC (Quit: Page closed) [18:04] what is the copyright/licensing situation with coursera? [18:08] this is not a question archiveteam asks [18:15] Hey chaps, My warriors running ArchiveTeams Choice are currently not running anything [18:15] What's priority at the moment? [18:21] I think they're focusing on Coursera, since it's to go kaput in just 3 days [18:22] I've manually started one of them [18:23] make sure you have enough space [18:23] say 20 GB [18:23] and good speed to FOS [18:23] 200GB available to each warrior [18:23] FOS? [18:23] The standard target [18:24] It's running on a 100/100 dedicated [18:24] So i'd imagine whatever/wherever it is it should be sufficient [18:25] *** Tomcat_ has joined #archiveteam [18:26] *** brayden has quit IRC (Read error: Operation timed out) [18:31] *** ris has joined #archiveteam [18:32] *** JanSiemer has joined #archiveteam [18:32] Does the Coursera tracker contain all courses that will be lost? [18:33] Read it will be 400+ and only see 200-something items... [18:42] Have they been completed? [18:45] 68done + 114out + 60to do [18:45] But the excel file on Google drive shows 495... [18:52] *** JanSiemer has quit IRC (Quit: Page closed) [18:56] *** dashcloud has quit IRC (Read error: Operation timed out) [19:00] *** dashcloud has joined #archiveteam [19:15] *** Wuked has joined #archiveteam [19:27] *** MMovie has quit IRC (Read error: Operation timed out) [19:29] 24969216 0% 128.57kB/s 6:01:58 [19:29] *** MMovie has joined #archiveteam [19:29] Cloudflare are so very fast.. zzz [19:33] Wait, you got around CloudFlare? [19:34] No unfortunately [19:34] I think joepie tried to make a patch to get around it, though I have no idea what became of it [19:36] *** alfie has quit IRC (Quit: Seeeya! - ZNC 1.6.3+deb1+jessie0) [19:46] *** MMovie1 has joined #archiveteam [19:47] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [19:48] *** MMovie has quit IRC (Read error: Operation timed out) [19:54] *** BartoCH has joined #archiveteam [19:55] DoomTay: doesn't get around cloudflare, just solves the I'm-under-attack challenge [19:55] still need to package it and port to Python [20:18] *** Tomcat_ has quit IRC (Remote host closed the connection) [20:37] *** brayden has joined #archiveteam [20:37] *** ris has quit IRC (Read error: Connection reset by peer) [20:37] *** swebb sets mode: +o brayden [20:37] *** ris has joined #archiveteam [20:52] *** schbirid has quit IRC (Quit: Leaving) [21:09] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [21:19] *** Aranje has joined #archiveteam [21:21] Atluxity: are your items running at the moment? [21:21] of coursera [21:22] *** Wuked has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) [21:27] *** dashcloud has quit IRC (Read error: Operation timed out) [21:29] arkiver i'll run one item of coursera [21:29] how is your uplaod speed to FOS? [21:29] i usually i have 2 megabyte/s to FOS [21:29] Pretty much all of mine are 429'd atm arkiver :( [21:30] *** dashcloud has joined #archiveteam [21:31] tracker limit is on [21:32] Yeah, Account limit iirc [21:32] Server returned 429 (RETRFINISHED). Sleeping. [21:32] yeah getting it from here too [21:33] 993427456 35% 131.35kB/s 3:51:26 [21:33] Is what one thread is currently getting :( [21:33] *** Igloo_ is now known as Igloo [21:35] Igloo: that upload speed to FOS is very low [21:35] Can you make these your last grabs for coursera if the speed doesn't go up? [21:36] I might take the remaining items and spread them among some fast FOS uploaders [21:36] I have a 100mb upload from that server [21:36] It should not be slow... [21:38] http://pastebin.com/nufzhpR7 @arkiver [21:39] There is literally nothing else running apart from 4 threads for the grab [21:39] *** MMovie1 has quit IRC (Read error: Operation timed out) [21:43] *** MMovie has joined #archiveteam [21:45] arkiver, how is my upload looking? [21:45] no idea [21:49] arkiver: You needed a target? Which project? [21:49] NFI why it's slow uploading arkiver. I've just tried uploading to a bunch of other places and they're all running fast. I'm not worried about space there is 2TB on that box which isn't used. Are others uploading maxing out FOS? [21:52] Deleting 20T of uploaded warcs now, so let me know if you still needed a target arkiver. [21:52] zino: that would be great [21:53] FOS might be able to handle it, but it'd be good to have yours anyway, just in case [21:54] Just give me a target name and I'll set it up. [21:54] *** tomwsmf-a has joined #archiveteam [21:54] something like 'coursera' [21:54] OK. Moment. [21:59] arkiver: usual place, eldrimner.lysator.liu.se module coursera. [22:01] xmc: the reason I asked about copyright/licensing is because coursera may try and get the content removed from IA if we upload them there [22:01] it happens [22:01] archiveteam has been around for most of a decade [22:01] we've had this conversation a million times and it is banned [22:02] ah, wasn't aware, won't bring it up again [22:23] *** qwebirc32 has joined #archiveteam [22:24] Will there be official support for the warrior under Docker? [22:28] Congratulations on your talk at HOPE SketchCow! [22:37] *** qwebirc32 has quit IRC (Quit: Page closed) [22:46] *** j08nY has quit IRC (Quit: Leaving) [23:26] *** tfgbd_znc has joined #archiveteam [23:33] Scripts for arto are updated. [23:34] Items for arto are requeued [23:34] Oh, that's the site where the hosters kept it up just for us? [23:35] yeah [23:35] Accoridng to the wiki, there's only 3 days left [23:35] it's doing down the 30th [23:35] yeah [23:35] Ow. And I thought you already had your hands full with Coursera