[00:09] WE'LL FIND OUT [00:24] SketchCow: I'm grabing all of offical xbox magazine podcast [00:24] there is like 311 podcast [00:25] *podcasts [00:25] i'm uploading the rest of no bs podcast now too [02:49] so, I've got all the laptop service manuals from dell's ftp- someone have a place I can upload them to? [11:37] alard: is btinternet a warrior project yet? [11:38] Yes, it's more or less ready (barring any new insights) but it's not actually on the warrior. [11:39] https://github.com/ArchiveTeam/btinternet-grab [11:39] Is ready to go. [11:39] (Almost.) [11:40] Why? [11:42] well, when it's done, my warrior has something important to do :P [11:43] We should keep looking for more usernames, though. [11:43] I added the sites from DMOZ, from the wayback machine and am waiting for the btinternet links on tvtropes.org. [11:45] alright [11:49] I'm now downloading the wikipedia dump as well. [11:50] wikipedia dump? as in, find btinternet links on wikipedia? [11:50] speaking of which.. I'll have a look in the stackexchange dump [11:50] I have it here locally [11:53] joepie91: Yes, bunzip2 | grep ... [11:54] It seems that there are a few links on Wikipedia: https://encrypted.google.com/search?hl=en&q=site%3Awikipedia.org%20btinternet.co.uk [11:55] oh goddamnit, I removed the stackexchange data dump a few days ago [11:55] redownload time [11:59] alard: I think "all Projects" tab in warrior should be "Choose Project" ? [12:00] SmileyG: Perhaps. But "Choose" is a verb. "Settings" is not. Is "Available projects" a solution? [12:01] Yeah that works [12:01] Currently I'd think "All Projects" would select all projects...... make sense? [12:03] Yes, I think I understand your point. (Although you could also say that it's a tab, not a button, so it shows you "all projects", like it does.) [12:03] Hehe [12:03] UI design is hard :P [12:04] Well I have a habit of reading things differently to others, but I was good at it at uni. :S [12:04] It's fun. [12:15] http://tracker.archiveteam.org/btinternet/ [12:16] (Don't go too fast.) [12:16] thanks for reminding me to see how webshots was doing :P [12:16] underscor with 2364GB. [12:16] I'm going to kill him one day [12:19] why does it say only 8 items done so far? :P [12:19] oh I see... [12:19] nvm :P [12:21] balrog_: You could be number 1 with 9! [12:22] alard: do I have to use warrior? [12:22] :| [12:23] What's wrong with the warrior? It's a small project. [12:23] takes up more ram and cpu on my side :/ [12:23] It's pretty minimal how much it takes up. [12:24] BlueMaxim: not exactly [12:24] it uses up to 20% of my 4GB of RAM [12:24] how long til bt ties? [12:24] usually around 13 [12:24] joepie91, seriously? I thought it only needed 256MB of RAM [12:24] BlueMax: that's the VM itself - apparently virtualbox adds a bunch of overhead on top of that [12:24] also [12:24] or something [12:24] it's quite heavy on CPU [12:24] on my shitty notebook i3 [12:25] 2 x 1,3ghz [12:25] hm, these are synging like 6kb [12:25] *syncing [12:25] I guess if there is only one page [12:26] Oh, 404 error, even smaller [12:26] guess I didn't notice [12:27] My computer must be better at this than I thought :P [12:28] can the tracker do less more verbose than 0Mb? [12:30] I ran into virtualbox using 7 gigabytes of ram before it got OOM-killed [12:30] While running the warrior a few days back [12:31] lot of 0MBs [12:31] lol [12:31] memory leak to the max :P [12:32] alard: when should it start new processes :S, I've got it set to 6 but it still only shows 4? [12:33] SmileyG: When an item finishes the warrior checks the number to see how many new items there should be. [12:33] hmmm k [12:33] ones just finished, lets ee if it works this time [12:33] Also I've changed to BT but the banner still shows webshots (I presume because some of the jobs are still webshots). [12:34] have there ever been archiving/warrior projects where the warriors were throttled/rate-limited/blocked? [12:34] SmileyG: it will first finish the webshots jobs [12:34] then move on to BT [12:34] I rate limit mine joepie91 :P [12:34] oooo, 39MB [12:34] The warrior can't run multiple projects at the same time, so yes, it waits for webshots to complete. [12:35] ok, makes sense :D [12:35] (Also: why not keep it on webshots? I expect btinternet won't take long.) [12:35] it'd be cool if it could multitask [12:35] one process on one project, four on another [12:35] I have a webshots running at work on 5Mbit, this is amazingly slow compared to that ;) [12:42] alard: http://www.quickonlinetips.com/archives/2012/09/google-feedburner-shutting-down/ [12:43] not sure if there's any useful data on feedburner [12:43] but sure looks like signs of imminent death [12:43] also http://searchenginewatch.com/article/2213759/Google-Shutting-Down-AdSense-for-Feeds-Classic-Plus-More-Services?utm_source=twitterfeed&utm_medium=twitter [12:43] Isn't that just a proxy/cache/stats service? [12:43] Yeah, it is a stats tracking service for RSS feeds [12:44] So thousands of RSS feeds will break [12:44] but they don't really host much data [12:44] this may also be a problem for THQ-related sites: http://www.gamearena.com.au/news/read.php/5116588 [12:44] THQ Asia Pacific shutting down [12:44] i got to grab my t3 magazine podcast then [12:45] are there any THQ Asia Pacific-run sites that have user content? [12:45] looking now [12:46] Carmeron_D: links to a lot podcasts and stuff could be lost [12:47] http://feeds.feedburner.com/T3/podcast [12:48] feedburner just acts as a proxy though (To collect stats) [12:48] Somewhere on the t3 site is the actual feed [12:48] At least that is how I remember it working [12:49] but that feed i think doesn't go back that far [12:50] Cameron_D: also as an aggregrator afaik [12:50] there only feed is from feedburner [13:07] the warrior image has issues [13:08] first off, vmware complains that it doesn't meet ova specs [13:08] second, I get an error that there's an ide slave with no master [13:08] balrog_: Which image? [13:09] 20121008? [13:09] archiveteam-warrior-v2-20121008 [13:09] yes [13:09] http://dmorton.staff.hostgator.com/archiveteam-warrior-vmware.ova vmware-compatible (albeit an older version) [13:09] why did this one break? [13:10] I don't know about the ova specs. There previously was a problem with the filename. I had exported the image as archiveteam-warrior-v2.ova, and then renamed it to include the date. This new image is exported with the correct name. [13:10] And IDE slave with no master, that seems to be a virtualbox - vmware incompatibility. [13:10] The import failed because /path/to/archiveteam-warrior-v2-20121008.ova did not pass OVF specification conformance or virtual hardware compliance checks. Click Retry to relax OVF specification and virtual hardware compliance checks and try the import again, or click Cancel to cancel the import. If you retry the import, you might not be able to use the virtual machine in VMware Fusion. [13:11] I've added two disks in VirtualBox, but for some reason VMware ends up with two controllers: 1-master for disk 1, 2-slave for disk 2. [13:11] and then ... There is an IDE slave with no master at ide1:1. This configuration does not work correctly in virtual machines. Move the disk/CD-ROM from ide1:1 to ide1:0 using the configuration editor. [13:12] I wouldn't be surprised if VBox is malforming the ova [13:12] VBox is unfortunately full of bugs [13:13] heh, ESXi still rejects the file too http://i.imgur.com/z3Kox.png [13:14] hm, they have an OVF tool [13:16] balrog_ [13:16] are you running vmware workstation? [13:16] no, fusion [13:16] which is basically the mac version of workstation [13:17] when i first imported archiveteam-warrior-v2-20120813 i got the error about it not being valid. then i just imported again and it worked. [13:17] i got the ide error as well after that too [13:17] yeah but I keep getting the ide error [13:17] you just have to go into the settings and change the second drive to ide0:1 [13:17] from ide 1:0 [13:23] hmm [13:23] what if someone imported the vm into vmware, fixed it, and exported it? [13:23] I wonder if the ova file would be more up-to-spec [13:25] youd probably want to export as a vmdk or wahtever the vmware equivlent is. you can always just rar up the vmdk files and if someone uses them vmware will just ask if they copied it [13:25] alard: btinternet\.(com|co\.uk) [13:25] right? [13:25] ova is better if it's compatible [13:25] err, compliant [13:25] apparently vbox does't produce compliant files [13:26] bingo [13:26] http://www.btinternet.com/~se16/hgb/statjoke.htm [13:26] se16 :P [13:27] uploaded: http://archive.org/details/cdrom-linuxformatmagazine-76 [13:27] joepie91: Yes, and then www\.(.+)\.btinternet or /~([^%?/]+) [13:28] Final webshots rsync finishes in a few min and then bt ':D [13:29] alard: I've also seen a few *without* www in front [13:29] and just the username [13:31] alard: 7z e -so *.7z | grep -P "(([^\s(/]+)\.)?btinternet\.(com|co\.uk)(\/~([^/ %?]+))?" [13:31] :) [13:31] will take a few hours for the torrent to finish downloading [13:31] after that, that will yield all the relevant entries [13:36] better: [13:36] 7z e -so *.7z 2> /dev/null | grep -Po "(([^\s(/]+)\.)?btinternet\.(com|co\.uk)(\/~([^/ %?]+))?" [13:57] how well does warrior handle a network connection change? [14:01] how well does warrior handle a network connection change? [14:01] also, why no rsync with continue? [14:05] balrog_: it should back off then continue once it figures it out [14:06] you mean with the wget? [14:06] rsync seems to lack continue though... [14:08] Doesn't --partial-dir enable --partial? [14:08] (Just rsync --partial is dangerous in this case, since SketchCow will move any file in the upload directory.) [14:22] Hey there, if you see my name on uncompleted webshots job please release the lock. [14:25] willwill: No problem. (There will probably be other failed jobs, so I'll requeue them all at once later.) [14:46] balrog_: rsync, continue? [14:46] rsync knows what its sent and it doesn't require continue [14:46] resume rather [14:47] --partial or -P switch [14:47] doesn'tneed it.... [14:47] partial does partial files [14:48] rsync checks for each file as it goes [14:48] yeah well a single .warc is pretty large [14:48] and if it gets interrupted, whole thing has to start over [14:48] yeah true, then your screwed :S [14:52] I've added --partial to btinternet, so the next project will have it too. [14:52] Isn't that going to cause issues as you highlighted earlier? [14:52] No, because --partial-dir keeps the partial files in a separate directory. [14:53] They're uploaded to the .rsync-tmp/ subdirectory and moved when they're uploaded. [14:54] I thought --partial-dir would be enough, but apparently you need --partial too. [14:55] oooo [14:55] heh thats random devs for you [14:59] alard: the title in the btinternet pipeline.py is still webshots [14:59] ;) [15:02] I see. And apparently the title isn't used anywhere. [15:03] Wikipedia produced 933 new btinternet names. [15:04] :D [15:04] I'm searching math stackexchange now [15:04] wikipedia? :o [15:04] alard: stats stackexchange produced "se16" as only username [15:06] it's referenced a *lot* on math. as well [15:06] seems like a pretty important site [15:06] ha [15:06] Think twice before using BT as an ISP. [15:06] on the homepage of that site [15:06] BT used to provide its internet subscribers with a small amount of personal webspace, but did not promote the service so only the oldest most loyal customers used it. Now it now longer wishes to satisfy these customers and is closing the service down. So this page and others of mine, which have received over 2 million hits in 13 years, have to move. [15:06] If your browser does not automatically go to http://www.se16.info/index.htm within a few seconds, you may want to go to the destination manually. [15:06] My conclusion is that if you ever consider BT as a possible ISP for some reason, you should not expect that reason to last. [15:07] yah [15:09] joepie91: We already had it. :) Processed items: 1, added to main queue: 0 [15:12] alright :P [15:12] brb [15:14] alard: Quick question about the warrior: If there are multiple warcs waiting to upload, how does it decide which one goes next? [15:15] LIFO, I think, but if you really want to know you should check here: https://github.com/ArchiveTeam/seesaw-kit/blob/master/seesaw/task.py#L72-107 [15:17] I... have no idea what I'm looking at. [15:18] But since it looks like array manipulation, I'm guessing my request to do smallest file first is a no-go. [15:19] That would be hard, I think. Then the queueing thing would have to know about file sizes. [15:19] And does it really matter? [15:19] Kinda-maybe. It'd free up more threads to download quicker. [15:20] As it is there are times when all my worker threads are waiting for one upload to finish so they can go. [15:20] Of course then you'd have a problem with large files never uploading, but you could conceivably have that with LIFO as well and I haven't seen it happen yet. [15:22] Maybe the upload limit should just go. [15:23] Some people wanted it in the previous warrior. [15:23] I limit the VM, shrug. [15:23] Upload limit, as in throughput, or as inwaiting turns? [15:24] Waiting turns. I think the thinking then was that one rsync uploads faster, so can start downloading sooner. [15:24] The opposite of what you say now, basically. :) [15:24] I can kinda see that, since the overhead for switching wouldn't help overall. [15:24] wasn't it because the upload location was really slow at one point? [15:24] and no one could finish anything :D [15:24] ended up eating all the space on the warriors. [15:25] Is there someplace I can set it to let 2 upload at once, see if there are any wins to be had that way? [15:26] yup [15:26] you running vm? [15:26] I have upto 6 uploads at once. [15:26] Yes. [15:26] ok, on the vm window [15:26] alt+F3 [15:26] OK, log in to the VM. Got that. [15:26] nano -w /home/warrior/projects/webshots/pipeline.py [15:27] ctrl+w [15:27] (Well, I will have that, about 6:00 tonight. can't access theVM from work :) ) [15:27] Ah ok [15:27] I need to do a page on this on the wiki [15:27] But keep going. I'll check the scrollback tonight. [15:29] alard: Dunno what project it was requested for, but webshots may just be a different critter. Large variation in upload sizes. Waiting is probably still good, we just might want to be smarter about the criteria for deciding who's next :) [15:29] But the current warrior wins on simplicity. [15:29] Is it worth removing the limit? [15:29] type LimitConcurrent and hit enter, and change the 1 to 6 (or whatever figure) [15:29] (At least, I think it does. I can read Python about as well as I can read Japanese. (Not at all.)) [15:30] I'll try mine tonight. It may let smaller files squeak out, butit may also take longer because of drive-spinning at either end. [15:32] Word of caution: if you change the pipeline.py in your warrior, you may break future updates. (If git can't figure out how to apply the update to your modified version.) [15:32] heh, i seem to have breoken it anyway Â¬_Â¬ [15:32] still getting no output [15:33] Stop the project, go into your warrior and use git pull to figure out what's wrong? [15:33] Understood. But define "break". Update won't apply, warrior will conk out, house burns down, what? [15:33] I think you can expect the SmileyG problem. [15:34] Ah. [15:34] webserver runs, nothing else does :D [15:34] So you'll have to login, use git pull to figure out what's going wrong. [15:34] And as we're talking about it my 261-meg user finishes:) [15:35] alard, would it work to just delete project and restart warrior? [15:35] alard: I'd vote for keep the limit, but add option to change it. [15:35] SmileyG: Is that worth stopping every warrior? (That's what happens if I push an update. Every warrior will finish its current task and restart the project.) [15:36] primus: That would work. [15:36] alard: can't you just do the update and let them pull it in time? [15:36] Yeah, restarting warriors on this project I think is worse. [15:36] Define "in time"? [15:36] when ever they restart their vm? [15:36] No. They check for updates on github. [15:36] Also, add "Check for updates" button to settings page? [15:36] Heh. Like Windows Update. "Updates to this warrior are now available. Apply? This may require your warrior to restart." [15:36] lol [15:37] where do I run the git pull? [15:37] What we should have, in a future version, is a gradual update. [15:37] cd /home/warrior/projects/$project/ [15:37] (perhaps su -u warrior first) [15:38] hmmm its moanin about the changes in pipeline [15:39] * SmileyG changs it back and git pulls [15:39] It'd probably be an awful bitch, but would the multiple-project stuff be useful for that? So /home/warrior/projects/$project.$version instead? Let one run out while the new one sees threadsdisappear and spins up? [15:39] s/stuff/idea/ [15:40] alard: ok I see the new rsync code... [15:40] need to restart the warrior for web interface to update? [15:41] or is it only set via the code (And won't this then cause git to explode again?) [15:41] :O [15:41] ITS GONE CRAZY [15:41] 15 users and counting on one screen [15:43] There we go... [15:43] that is bonkers when it first starts up [15:43] you just see hundreds of boxes popping up [15:44] alard: I remember - The script to create the 50Gb tars couldn't keep up for fortuneCity, thats why the rsync got limited. [15:54] DoubleJ: Yes, that's similar. (I was thinking it might be better to have the cloned git repo in /home/warrior/projects/$project, as the most up-to-date version, then do a clone to /data/projects/$project.$version before starting a project.) [16:37] Have we killed fos? [16:38] :O [16:39] 2Kb/s! \o/ [16:39] Oh its coming back now [16:40] Planned Delivery Date [16:40] Wednesday 10th October [16:40] Planned Delivery Time [16:40] Between 07:30 and 17:30 [16:40] Wed Oct 10 17:40:33 BST 2012 [16:40] HERP? [17:08] HEY [17:08] yeah the uploads are totally dead? [17:08] primus [17:08] :( [17:08] you've overtaken me [17:08] SmileyG: ? [17:08] 4587520 39% 12.21kB/s 0:09:45 [17:08] [sender] io timeout after 300 seconds -- exiting [17:09] sec [17:09] wtf, mine is dead [17:09] Retrying RsyncUpload for Item jpr.tree after 30 seconds... [17:13] .... brokeyd :D [17:13] alard: did you break something :( [17:21] my rsyncs are dying.. [17:21] rsync: failed to connect to fos.textfiles.com: Connection timed out (110) [17:21] Process RsyncUpload returned exit code 10 for Item andrewjjstanley [17:21] Retrying RsyncUpload for Item andrewjjstanley after 30 seconds... [17:21] rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.7] [17:22] yah [17:22] :< [17:23] they retry, but still its killed all progress :< [17:23] oh [17:23] they run now [17:24] http://isup.me/fos.textfiles.com [17:26] I think this is a SketchCow problem. [17:27] :< [17:27] (The warriors will retry 50 times with 30 second pauses before they fail.) [17:28] :< herp. [17:34] alard: it responds to ping [17:46] alard: se16 0MB << hey look :D [18:21] SmileyG: mmm [18:21] it's probably because he replaced the index page [18:22] joepie91: yeah I figured it might be that. [18:22] well it makes sense, the script forwards you off site. [18:41] fos is currently down-ish [18:41] fyi [18:41] ish [18:41] how can a box be down-ish [18:42] He's mincing words. [18:42] it still pings [18:42] It's down. [18:42] It's superdown. [18:42] VMs at archive have 3 states. Up, nossh/services, and noping [18:43] anyway, yeah, it's turbofucked [18:46] how does tpb fetch Google Books' stuff? does it accept suggestions? http://lists.wikimedia.org/pipermail/wikisource-l/2012-October/001204.html [18:49] wait [18:49] how is rsync still working if fos is down :O [19:13] OKAY HI [19:13] NEED HELP [19:14] https://docs.google.com/a/textfiles.com/spreadsheet/ccc?key=0ApQeH7pQrcBWdDZIUEVjR3d1UmRoU0lPSWZYX0Q1Ync#gid=0 [19:14] OK, that's a listing of all archiveteam projects on archive.org. [19:14] 1. Please see if I missed any. [19:15] (i.e. just browse through the archiveteam set to see) [19:28] haha, I love the item counts [19:28] 26, 70, 29, 3956 [19:35] is IA down? not working for me. [19:39] its not working for me too [19:39] k [19:42] SketchCow: you missed the most famous of all - geocities. [19:45] heh [19:45] okay, maybe a recursive grep through my entire repository folder was a bad idea [19:46] Geocities isn't warc. [19:46] IA is fucked right now [19:46] please leave a message after the beep [19:46] :D [19:46] * chronomex waits for the beep [19:46] boop [19:46] * SmileyG hears helicopters [19:47] But yeah, it's down. Once of the core boxes decided to take a dump all over everything, people are working on fixing now [19:47] ok, I'm not in a hurry [19:47] underscor: wat [19:47] IA went down? [19:48] it's down right now [19:48] we broke it Â¬_Â¬ [19:48] lol [19:49] oh wow [19:50] Can't edit the list, but Cinch is missing. City of Heroes (two items, I think: boards and www). [19:52] Qaudio. [20:04] god I hate efnet [20:05] anyway [20:05] is anyone up for testing a useful script? [20:05] wrote a script that takes a glob pattern, then tries to figure out (from extension) what kind of archive each file is, and prints the decompressed contents to stdout using the appropriate application, without actually unpacking it [20:05] consider it a 'cat' for archives :) [20:14] so like zcat? [20:15] igelritte: you know you can be in multiple channels at once, right? [20:15] igelritte: yeah, most of us are in both [20:16] well, actually, I don't know how to do it with pidgin [20:16] but I think you can [20:16] just /j #channel1 and /j #channel2 [20:16] they open up as tabs [20:16] at least in my pidgin [20:17] yeah, I didn't think about it [20:17] whateve's. I'm here now [20:18] k [20:19] so, tell me more about your structure and how one can plug in. [20:21] Is it some starry-eyed-open-source-free-for-all? Or is there a process wherein you tell a gatekeeper what you can do, what you're experienced with, and then they tell you where you can start helping? [20:22] freeforall. [20:23] I've seen Mr. Scott's presentation at Defcon on how AT is going to save your shit...which sounds good to me...but that doesn't tell me a lot about how the group is organized. [20:23] some people write code [20:23] I appear and make comments [20:23] most people run some sort of downloaders [20:23] godane is ..... well I don't know :D [20:24] There are often projects you can help in by running code written by others, basically volunteering your bandwidth to help out. [20:24] godane is affiliated but mostly works on solo projects [20:24] Those are usually advertised on the wiki and IRC, plus I think there's a mailing list for it now too. [20:24] Unfortunately, I'm not really in a good position at the moment to run downloaders or anything else that requires a 24 hour network connection. [20:24] If you haven't got bandwidth, then you can help with the wiki and possibly coding... [20:25] doesn't need 24hr, it'll work when you can [20:25] upto a point [20:25]