#archiveteam 2016-06-27,Mon

↑back Search

Time Nickname Message
00:11 🔗 BlueMaxim has joined #archiveteam
00:11 🔗 arkiver Coursera scripts are online.
00:28 🔗 RichardG has quit IRC (Ping timeout: 250 seconds)
00:34 🔗 db48x has joined #archiveteam
00:49 🔗 ris has quit IRC ()
01:07 🔗 arkiver Coursera scripts are updated
01:07 🔗 arkiver items loaded
01:07 🔗 arkiver 217 courses
01:09 🔗 arkiver project started.
01:33 🔗 jrwr nice arkiver
02:07 🔗 nickname_ has joined #archiveteam
03:27 🔗 Igloo_ has joined #archiveteam
03:27 🔗 nickname_ has quit IRC (Read error: Connection reset by peer)
03:29 🔗 Igloo has quit IRC (Read error: Operation timed out)
04:00 🔗 DoomTay has joined #archiveteam
04:47 🔗 jrwr has quit IRC (Read error: Operation timed out)
04:56 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
05:04 🔗 tomwsmf-a has quit IRC (Ping timeout: 258 seconds)
05:05 🔗 Sk1d has joined #archiveteam
05:23 🔗 metalcamp has joined #archiveteam
05:40 🔗 JesseW has joined #archiveteam
06:09 🔗 JesseW citeseerxpdf tracker returned 503 for me (using the Warrior). If this is unexpected, there may be something to fix.
06:11 🔗 DoomTay has quit IRC (Quit: Page closed)
06:11 🔗 JesseW googlecode: 11/5 = 0 http://jaraya-.googlecode.com/. Server returned 0. Sleeping.
06:13 🔗 VADemon has quit IRC (Quit: left4dead)
06:14 🔗 JesseW Failed WgetDownload for Item oldcourse:clinicalneurology FileNotFoundError: [Errno 2] No such file or directory: 'v3'
06:20 🔗 aschmitz And if someone (arkiver?) could requeue oldcourse:algorithms, it apparently managed to write to a bad block on my drive and is failing to upload (or read locally).
06:24 🔗 aschmitz And apparently the failure mode here makes me kill off the entire pipeline, as this'll be an infinite loop of retries, so I guess requeue oldcourse:{researchforhealth, dmathgen, inforiskman} too.
06:26 🔗 JesseW has quit IRC (Read error: Operation timed out)
06:33 🔗 fie__ has quit IRC (Quit: Leaving)
07:26 🔗 ris has joined #archiveteam
07:30 🔗 ris has quit IRC (Client Quit)
07:55 🔗 schbirid has joined #archiveteam
08:28 🔗 WinterFox has joined #archiveteam
09:47 🔗 Atluxity I am trying to set up the coursera scripts on a ubuntu 14.04 server
09:47 🔗 Atluxity wget does not compile
09:47 🔗 Atluxity someone got url to wget-lua I can use?
09:51 🔗 Atluxity nvm, found it on another server
09:54 🔗 arkiver Meroje: do you currently have any items running?
09:54 🔗 arkiver Due to the size of the items and the time we have we need to keep each other informed of failing items, so I can requeue fast
09:55 🔗 arkiver aschmitz: all requeued
09:55 🔗 arkiver Only aschmitz has returned items currently
09:55 🔗 arkiver Meroje: how is your uploadspeed to FOS?
09:56 🔗 Atluxity arkiver: I got juice again, where should I aim my efforts?
09:57 🔗 arkiver Atluxity: do you have enough space and good speed to FOS?
09:57 🔗 arkiver there's 3 items available in coursera, you can take those
09:58 🔗 arkiver let me know when you're running coursera, I'll make the 3 available then
09:58 🔗 Atluxity I got 15G, on about 20 servers, but speed should be good
09:58 🔗 Atluxity ok, 2 sec
09:58 🔗 arkiver 15G in total?
09:58 🔗 arkiver over the 20 servers
09:58 🔗 Atluxity no
09:58 🔗 Atluxity on each
09:58 🔗 arkiver ok
09:59 🔗 arkiver might be good to take one job per server
10:02 🔗 Atluxity I am running
10:03 🔗 arkiver items released.
10:03 🔗 Medowar i can run some. Speed to fos is decent(~600-800Mbit/s, ~1TB free)
10:03 🔗 arkiver that's good
10:04 🔗 Medowar can I use a warrior or does it have to be script?
10:04 🔗 arkiver you can use the warrior
10:04 🔗 arkiver currently no items are available though
10:05 🔗 arkiver I'm not sure about Meroje, that person has 152 items
10:05 🔗 arkiver and not returned anything yet
10:05 🔗 Atluxity arkiver: http://pastebin.com/qtKUBwzX
10:05 🔗 Atluxity not a success
10:05 🔗 arkiver is that on all three?
10:05 🔗 Atluxity yes
10:06 🔗 arkiver warrior?
10:07 🔗 Atluxity no, pipeline-script
10:07 🔗 Medowar I have a warrior up now waiting for work. concurrent 2
10:08 🔗 arkiver added 3 items to Medowar
10:08 🔗 arkiver let me know if you get the same error
10:09 🔗 Medowar instant fail
10:09 🔗 arkiver same v3 problem?
10:09 🔗 Medowar http://pastebin.com/55nbJC9B
10:09 🔗 Medowar yes
10:10 🔗 arkiver maybe you have a diferent version of wget. for me it creates a v3 file. Let me remove that v3 deletion line and try again.
10:14 🔗 arkiver so if you get the v3 error logging in didn't go right
10:14 🔗 arkiver currently wget is used to login.
10:16 🔗 Atluxity GNU Wget 1.14.lua.20130523-9a5c built on linux-gnu.
10:17 🔗 Atluxity Medowar: you did 2 items
10:17 🔗 arkiver nah, they're bad
10:17 🔗 Atluxity ok
10:18 🔗 Medowar i gave arkiver the instance.
10:19 🔗 Medowar afk now
10:23 🔗 BartoCH has quit IRC (Quit: WeeChat 1.5)
10:24 🔗 arkiver Newest version is working!
10:25 🔗 BartoCH has joined #archiveteam
10:26 🔗 Atluxity git?
10:26 🔗 Atluxity or wget
10:26 🔗 arkiver on git
10:28 🔗 arkiver Let me know when you are set up, I'll give you one item so we can test speed to FOS
10:28 🔗 arkiver Medowar has some amazing speed to FOS
10:33 🔗 arkiver SketchCow: is the upload of coursera to FOS started?
10:34 🔗 arkiver err, to IA
10:34 🔗 SketchCow Nope
10:35 🔗 Atluxity arkiver: ok, I am running the latest git
10:35 🔗 SketchCow Which explains the puffy filesystem
10:36 🔗 arkiver Atluxity: I mean the newest version is https://github.com/ArchiveTeam/coursera-grab version 20160627.07
10:36 🔗 arkiver you need that
10:36 🔗 SketchCow root@teamarchive0:/1/ARCHIVETEAM# du -sh COURSERA
10:36 🔗 SketchCow 25G COURSERA
10:36 🔗 SketchCow Barely enough for item 1
10:37 🔗 Atluxity arkiver: I am up to date with master
10:37 🔗 arkiver Yes, but it might be good to get an item up to IA as soon as possible, so we can see how it runs on there
10:37 🔗 arkiver Atluxity: ok
10:38 🔗 arkiver Atluxity: I gave you one item
10:38 🔗 arkiver please let me know how it goes
10:39 🔗 arkiver especially speed to FOS and cloudfront for downloading the videos
10:39 🔗 HCross arkiver, want me to throw 1 concurrent on there?
10:39 🔗 arkiver yes
10:39 🔗 arkiver if you have a good speed to FOS
10:40 🔗 HCross how much space?
10:40 🔗 Atluxity arkiver: same error :\
10:40 🔗 arkiver I'd say keep 20 GB for each item
10:40 🔗 arkiver They probably won't be bigger than 10 GB, but we never know
10:40 🔗 arkiver Atluxity: wget-lua problem?
10:41 🔗 arkiver v3 again right? You might have a problem with wget-lua
10:41 🔗 arkiver is it compiled correctly?
10:41 🔗 Atluxity same error, yes
10:42 🔗 Atluxity I have noe way of checking if it is compiled correctly...
10:42 🔗 Atluxity It is wget, compiled with lua
10:42 🔗 arkiver I'll be afk for bit, I'll check if I am back
10:42 🔗 Atluxity talk then
10:44 🔗 arkiver yes
10:49 🔗 HCross Atluxity, was the download for http://warriorhq.archiveteam.org/downloads/wget-lua/wget-1.14.lua.LATEST.tar.bz2 slow at all for you?
10:49 🔗 Atluxity I did not notice
10:49 🔗 HCross seems to be taking an age for me
10:51 🔗 HCross OSError: [Errno 13] Permission denied: '/home/archiveteam/coursera-grab/data'
11:36 🔗 j08nY has joined #archiveteam
11:41 🔗 SketchCow ARCHIVE TEAM: Grab https://web.archive.org/web/20160520211303/http://www.voteleavetakecontrol.org/ and put it somewhere not Internet Archive
11:41 🔗 metal_cam has joined #archiveteam
11:42 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
11:47 🔗 metal_cam has quit IRC (Read error: Operation timed out)
11:48 🔗 metalcamp has joined #archiveteam
11:49 🔗 luckcolor SketchCow pls give more details
11:51 🔗 SketchCow I've grown to like the Zardoz movie?
11:52 🔗 HCross awww. vote leave trying to erase their lies
11:53 🔗 luckcolor SketchCow i don't think i get that reference if it's a reference :P
11:53 🔗 luckcolor i'll fire a grab-site now
11:54 🔗 HCross luckcolor, tried it, and it was downloading random IA collections
11:54 🔗 luckcolor sigh
11:54 🔗 luckcolor we totally need better tools to getting data out of the archive.org
11:54 🔗 luckcolor rather than in
11:54 🔗 luckcolor :)
11:55 🔗 arkiver SketchCow: any idea on when it's going down?
11:56 🔗 Medowar Id say legal obligations, and I dont think SketchCow will say more, since he may not be allowed to.
11:56 🔗 Medowar arkiver: 119=0 https://class.coursera.org/criminallaw-001/wiki/discusion.
11:56 🔗 Medowar Server returned 0 (AUTHFAILED). Sleeping.
11:56 🔗 arkiver yeah
11:56 🔗 arkiver saw it
11:56 🔗 HCross arkiver, did you see my permissions error?
11:57 🔗 arkiver I have no idea on that error unfortunately
11:57 🔗 arkiver Maybe remove the dir and try again
11:57 🔗 arkiver de /data dir
11:57 🔗 arkiver the*
12:05 🔗 kristian_ has joined #archiveteam
12:06 🔗 Medowar fos limited to 100mbit/connection? cant get more
12:12 🔗 arkiver zino: can we use your target as well, next to FOS?
12:20 🔗 arkiver fuck
12:20 🔗 arkiver the account is banned
12:20 🔗 arkiver {"message":"You are temporarily blocked from loading this URL. Please try again later."}
12:28 🔗 arkiver oh it really was temporary
12:28 🔗 arkiver nice, let's hope it hold this time
12:29 🔗 BlueMaxim has quit IRC (Quit: Leaving)
12:32 🔗 arkiver HCross: did removing the dir fix the error for you?
12:33 🔗 HCross not sure, will be trying now
12:34 🔗 HCross nope - seems to not like creating the directory
12:34 🔗 HCross got an idea. hang on
12:34 🔗 arkiver ok
12:37 🔗 HCross arkiver, fixed it
12:38 🔗 arkiver awesome :)
12:38 🔗 HCross you should see it working away
12:38 🔗 arkiver I hope you have some good speeds to FOS
12:38 🔗 HCross its in the EU, but its OVH so it shouldnt be that slow
12:49 🔗 arkiver 17 cuban sites now in newsgrabber
12:50 🔗 HCross as ive banged on about before, if anyone else wants to run a grabber - do let either me or arkiver know
12:51 🔗 arkiver ^ would be greatly appreciated
13:15 🔗 Frogging how good does the FOS speed need to be?
13:16 🔗 Frogging I can probably help out if you need it
13:16 🔗 arkiver Well, some items are 10 GB
13:17 🔗 Frogging I run archivebot and it seems to max out my 100M when it uploads. I think it'll be all right
13:18 🔗 Frogging unless something has changed since I last looked at it
13:26 🔗 HCross Would a couple of !ao only pipelines be useful for archivebot?
13:28 🔗 WinterFox has quit IRC (Remote host closed the connection)
13:29 🔗 JW_work1 has joined #archiveteam
13:33 🔗 JW_work has quit IRC (Ping timeout: 370 seconds)
14:03 🔗 RichardG has joined #archiveteam
14:21 🔗 kristian_ has quit IRC (Leaving)
14:35 🔗 VADemon has joined #archiveteam
14:39 🔗 dan- has quit IRC (Read error: Operation timed out)
14:44 🔗 VADemon_ has joined #archiveteam
14:47 🔗 VADemon_ has quit IRC (Read error: Connection reset by peer)
14:52 🔗 JesseW has joined #archiveteam
14:52 🔗 VADemon has quit IRC (Read error: Operation timed out)
15:00 🔗 dan- has joined #archiveteam
15:24 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
15:58 🔗 MMovie1 has joined #archiveteam
15:59 🔗 MMovie has quit IRC (Read error: Operation timed out)
15:59 🔗 VADemon has joined #archiveteam
16:02 🔗 MMovie has joined #archiveteam
16:05 🔗 dan- has quit IRC (Ping timeout: 260 seconds)
16:08 🔗 MMovie1 has quit IRC (Read error: Operation timed out)
16:22 🔗 ndizzle has joined #archiveteam
16:25 🔗 DoomTay has joined #archiveteam
16:28 🔗 xXx_ndidd has quit IRC (Read error: Operation timed out)
16:51 🔗 dan- has joined #archiveteam
17:12 🔗 dashcloud has quit IRC (Read error: Operation timed out)
17:13 🔗 dashcloud has joined #archiveteam
17:13 🔗 alfie has quit IRC (Quit: Seeeya! - ZNC 1.6.3+deb1+jessie0)
17:18 🔗 alfie has joined #archiveteam
17:30 🔗 qwebirc87 has joined #archiveteam
17:30 🔗 qwebirc87 A lot of 429, Retrfinished... Why?
17:30 🔗 qwebirc87 Project: Coursera
17:31 🔗 arkiver we need more accounts
17:38 🔗 qwebirc87 Ok... Can't we create more... using mailinator email or so?
17:38 🔗 qwebirc87 Btw, why is Rsync threads missing from my advanced settings page?
17:38 🔗 qwebirc87 1 warrior has it, the others not...
17:50 🔗 qwebirc87 has quit IRC (Quit: Page closed)
18:04 🔗 r3c0d3x what is the copyright/licensing situation with coursera?
18:08 🔗 xmc this is not a question archiveteam asks
18:15 🔗 Igloo_ Hey chaps, My warriors running ArchiveTeams Choice are currently not running anything
18:15 🔗 Igloo_ What's priority at the moment?
18:21 🔗 DoomTay I think they're focusing on Coursera, since it's to go kaput in just 3 days
18:22 🔗 Igloo_ I've manually started one of them
18:23 🔗 arkiver make sure you have enough space
18:23 🔗 arkiver say 20 GB
18:23 🔗 arkiver and good speed to FOS
18:23 🔗 Igloo_ 200GB available to each warrior
18:23 🔗 Igloo_ FOS?
18:23 🔗 arkiver The standard target
18:24 🔗 Igloo_ It's running on a 100/100 dedicated
18:24 🔗 Igloo_ So i'd imagine whatever/wherever it is it should be sufficient
18:25 🔗 Tomcat_ has joined #archiveteam
18:26 🔗 brayden has quit IRC (Read error: Operation timed out)
18:31 🔗 ris has joined #archiveteam
18:32 🔗 JanSiemer has joined #archiveteam
18:32 🔗 JanSiemer Does the Coursera tracker contain all courses that will be lost?
18:33 🔗 JanSiemer Read it will be 400+ and only see 200-something items...
18:42 🔗 Igloo_ Have they been completed?
18:45 🔗 JanSiemer 68done + 114out + 60to do
18:45 🔗 JanSiemer But the excel file on Google drive shows 495...
18:52 🔗 JanSiemer has quit IRC (Quit: Page closed)
18:56 🔗 dashcloud has quit IRC (Read error: Operation timed out)
19:00 🔗 dashcloud has joined #archiveteam
19:15 🔗 Wuked has joined #archiveteam
19:27 🔗 MMovie has quit IRC (Read error: Operation timed out)
19:29 🔗 Igloo_ 24969216 0% 128.57kB/s 6:01:58
19:29 🔗 MMovie has joined #archiveteam
19:29 🔗 Igloo_ Cloudflare are so very fast.. zzz
19:33 🔗 DoomTay Wait, you got around CloudFlare?
19:34 🔗 Igloo_ No unfortunately
19:34 🔗 DoomTay I think joepie tried to make a patch to get around it, though I have no idea what became of it
19:36 🔗 alfie has quit IRC (Quit: Seeeya! - ZNC 1.6.3+deb1+jessie0)
19:46 🔗 MMovie1 has joined #archiveteam
19:47 🔗 BartoCH has quit IRC (Ping timeout: 260 seconds)
19:48 🔗 MMovie has quit IRC (Read error: Operation timed out)
19:54 🔗 BartoCH has joined #archiveteam
19:55 🔗 joepie91 DoomTay: doesn't get around cloudflare, just solves the I'm-under-attack challenge
19:55 🔗 joepie91 still need to package it and port to Python
20:18 🔗 Tomcat_ has quit IRC (Remote host closed the connection)
20:37 🔗 brayden has joined #archiveteam
20:37 🔗 ris has quit IRC (Read error: Connection reset by peer)
20:37 🔗 swebb sets mode: +o brayden
20:37 🔗 ris has joined #archiveteam
20:52 🔗 schbirid has quit IRC (Quit: Leaving)
21:09 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
21:19 🔗 Aranje has joined #archiveteam
21:21 🔗 arkiver Atluxity: are your items running at the moment?
21:21 🔗 arkiver of coursera
21:22 🔗 Wuked has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…)
21:27 🔗 dashcloud has quit IRC (Read error: Operation timed out)
21:29 🔗 luckcolor arkiver i'll run one item of coursera
21:29 🔗 arkiver how is your uplaod speed to FOS?
21:29 🔗 luckcolor i usually i have 2 megabyte/s to FOS
21:29 🔗 Igloo_ Pretty much all of mine are 429'd atm arkiver :(
21:30 🔗 dashcloud has joined #archiveteam
21:31 🔗 luckcolor tracker limit is on
21:32 🔗 Igloo_ Yeah, Account limit iirc
21:32 🔗 luckcolor Server returned 429 (RETRFINISHED). Sleeping.
21:32 🔗 luckcolor yeah getting it from here too
21:33 🔗 Igloo_ 993427456 35% 131.35kB/s 3:51:26
21:33 🔗 Igloo_ Is what one thread is currently getting :(
21:33 🔗 Igloo_ is now known as Igloo
21:35 🔗 arkiver Igloo: that upload speed to FOS is very low
21:35 🔗 arkiver Can you make these your last grabs for coursera if the speed doesn't go up?
21:36 🔗 arkiver I might take the remaining items and spread them among some fast FOS uploaders
21:36 🔗 Igloo I have a 100mb upload from that server
21:36 🔗 Igloo It should not be slow...
21:38 🔗 Igloo http://pastebin.com/nufzhpR7 @arkiver
21:39 🔗 Igloo There is literally nothing else running apart from 4 threads for the grab
21:39 🔗 MMovie1 has quit IRC (Read error: Operation timed out)
21:43 🔗 MMovie has joined #archiveteam
21:45 🔗 HCross arkiver, how is my upload looking?
21:45 🔗 arkiver no idea
21:49 🔗 zino arkiver: You needed a target? Which project?
21:49 🔗 Igloo NFI why it's slow uploading arkiver. I've just tried uploading to a bunch of other places and they're all running fast. I'm not worried about space there is 2TB on that box which isn't used. Are others uploading maxing out FOS?
21:52 🔗 zino Deleting 20T of uploaded warcs now, so let me know if you still needed a target arkiver.
21:52 🔗 arkiver zino: that would be great
21:53 🔗 arkiver FOS might be able to handle it, but it'd be good to have yours anyway, just in case
21:54 🔗 zino Just give me a target name and I'll set it up.
21:54 🔗 tomwsmf-a has joined #archiveteam
21:54 🔗 arkiver something like 'coursera'
21:54 🔗 zino OK. Moment.
21:59 🔗 zino arkiver: usual place, eldrimner.lysator.liu.se module coursera.
22:01 🔗 r3c0d3x xmc: the reason I asked about copyright/licensing is because coursera may try and get the content removed from IA if we upload them there
22:01 🔗 xmc it happens
22:01 🔗 xmc archiveteam has been around for most of a decade
22:01 🔗 xmc we've had this conversation a million times and it is banned
22:02 🔗 r3c0d3x ah, wasn't aware, won't bring it up again
22:23 🔗 qwebirc32 has joined #archiveteam
22:24 🔗 qwebirc32 Will there be official support for the warrior under Docker?
22:28 🔗 dashcloud Congratulations on your talk at HOPE SketchCow!
22:37 🔗 qwebirc32 has quit IRC (Quit: Page closed)
22:46 🔗 j08nY has quit IRC (Quit: Leaving)
23:26 🔗 tfgbd_znc has joined #archiveteam
23:33 🔗 arkiver Scripts for arto are updated.
23:34 🔗 arkiver Items for arto are requeued
23:34 🔗 DoomTay Oh, that's the site where the hosters kept it up just for us?
23:35 🔗 arkiver yeah
23:35 🔗 DoomTay Accoridng to the wiki, there's only 3 days left
23:35 🔗 arkiver it's doing down the 30th
23:35 🔗 arkiver yeah
23:35 🔗 DoomTay Ow. And I thought you already had your hands full with Coursera

irclogger-viewer