[00:31] that's a pretty sweet coat [00:59] Holy CRAP [00:59] That tarfile script? WAY too slow. [01:00] I didn't realize what it was doing. [01:01] what script? perhaps someone can rewrite it to perform the same ultimate function, but much faster [01:02] I already am. [01:02] It was: [01:03] calculating, doing a tar rf, making a new name for the tar rf if it hit a byte limit. [01:03] Now I have it: [01:03] calculating, doing an echo of the filenames to a textfile, changing the new name as it hits a byte limit. [01:03] Then I'll use the textfiles as lists for tar operations to make the tarfiles :) [01:04] are you taking the tar file overhead into account? [01:04] If it's 50gb and it turns out to be 54 or 60gb, so be it. [01:04] ok [01:40] so some of the stage6 descriptions seem to have raw utf-8 following wbr> http://archive.org/search.php?query=collection%3Astage6&sort=-publicdate [02:05] argh [02:15] DFJustin: https://gist.github.com/7ddcfce69e769cb74b7b [02:16] DFJustin: also, you can look at the raw database data: http://archive.org/details/stage6-database-dump [02:16] First splinder major pack finished! [02:16] (of the database I have with the metadata in it) [02:24] i'm begining to think there might have been some charset corruption during the steps to pull the description out of the stage6 page. [02:24] on some items [02:24] http://archive.org/details/archiveteam-splinder-00000000 [02:24] The fun begins. [02:24] but i'm not sure. there is valid utf-8 at the start, and then it drops to a bunch of HTML entities [03:14] root@teamarchive-1:/2/ANYHUB# tar cf - . | pv > ANYHUB-2011-11-RESCUE.tar 174MB 0:00:04 [37.3MB/s] [ <=> [03:14] See, they're ALL snapping together now. [03:14] I can finally get FOS ready for the next bits, and/or us going through and cultivating what we DO have. [03:14] We definitely need to do a cleanup, maybe we can work together on that. [03:15] Analysis of the stuff, etc. [04:23] SketchCow: is FOS hosted on OVH by any chance? [04:27] no [04:27] http://archive.org/details/archiveteam-splinder-00000000 finished [04:27] Only 20 more coming [04:28] Huh, I've only used half of my expected bandwidth use. [05:05] SketchCow: two things: 1: i have that very jacket in silver. 2: good mod to the tar script. i did say it was likely to be slow, but i didn't realize how bad =P [06:02] http://i.imgur.com/kxs3b.jpg [06:02] Archive Team Headquarters [06:03] that's a lot of space [06:03] what are we using, punched paper tape? [06:04] Barrels of Rum [06:04] Most levels are Rum levels [06:04] ah, okay then [06:04] sounds good [06:04] there's a pile of hard drives on the 14th floor [06:09] ooh [06:09] minceaft [06:41] I have to wonder two things: does it go all the way up to 127, and how far does it go down? [06:46] uploading episode 116 of crankygeeks [06:57] SketchCow, about Splinder missing data: the problem is that last time a user showed up with some spare splinder data he's got no rsync slot to upload it to fos or anything :) [06:57] over it [07:05] :/ [07:05] anyone able to access publicbt.com? [07:06] I've got nothing on .org and .com [07:07] which reminded me to check that my own tracker was still up, and it is. Awesome. [07:08] I get an `address unreachable` for publicbt [07:09] 16. rbx-33-m1.routers.chtix.eu 0.0% 12 118.1 118.3 117.7 118.8 0.4 [07:09] 17. ??? [07:10] traceroute falls over right there for me [07:10] I wish http://archive.org/details/publicbt.com had more usable info [07:11] oh nice, that's a good archive [07:11] and if you try to Google it, you'll find search results have been removed with DMCA complaints! [07:12] How do you define useable info? [07:12] Aranje, well, for instance titles of torrents :) [07:12] But they do not have those [07:12] publicbt runs (ran?) the exact same software I do, which is opentracker. It only keeps track of info hashes and IP's. [07:13] Those files are a dump of that in-memory database [07:13] dht can get you more info [07:13] iirc, someone was doing something with dht mining [07:13] I'd bet that would also be hugely time consuming [07:14] Aranje, I know, but this means that you can't really use those hashes unless you're a torrent index website [07:14] That's the point :P [07:14] If you know the hash, you know it. If you don't, fuck you. [07:15] :3 [07:15] I was honestly never 100% sure why the published that information. It makes little sense to me, but I bet it's fun to do some basic data analysis on. [07:16] Aranje, the stated reasons was to help indexers :) [07:16] Oh! Well there you have it :D [07:17] but yes, data analysis is the only purpose I imagined for the archive.org copy [07:17] I didn't expect the tracker to get down though, let's see what happens [07:17] Yeah we're running out of open public trackers :/ [07:18] There's a couple archiveteam torrents pinned to my tracker, but I don't run anything fancy enough to shout that everyone should use it :P [07:19] uh, the torrents with udp://tracker.publicbt.com:80 are still working for me [07:20] * Aranje nods [07:20] They haven't responded via http in a bit, on the tracker anyway [07:25] uploading episode 119 of crankygeeks [08:29] http://archive.org/details/comix-international-001 [11:36] there are three different IP addresses for tracker.publicbt.com [11:36] and only one for publicbt.com [14:51] fileplanet http://i.imgur.com/w2Px3.png [16:23] any idea if i can pipe tar to s3cmd? [16:23] because i do really have a space problem AGAIN [16:26] alternatively, since i think tar over ssh is fairly easy to do: does anyone have ~45G for me? :} [16:27] bonus points if you could upload it to archive.org too [16:27] where is the stuff? [16:27] on my server [16:28] which has 30G free [16:28] Schbirid, can't you make smaller tars and upload it in pieces? [16:28] duuuur [16:28] hit me please [16:28] <- stupid [16:28] will do :) [16:28] :D [16:29] That's probably recommended at this point anyway, given how big the tars are getting [16:30] yeah [16:32] +1 [16:39] SketchCow: i'd <3 you for a tweet about http://archiveteam.org/index.php?title=Fileplanet [16:49] Are you seeking help? [16:49] Or something else? [16:49] I wanted to wait to announce it AFTER they killed it. [16:50] i could use some help getting the stuff down [16:50] althought 1-2 people would probably be best [16:50] so we do not upset the site or something [16:50] my 10Mb down isn't gonna help much is it? [16:51] Aren't we resplendent with volunteers here? [16:51] ----------------------------------------------- [16:51] WE COULD USE A LITTLE HELP DOWNLOADING FILEPLANET [16:51] :D [16:51] JUST A SMIDGE [16:51] ----------------------------------------------- [16:51] * SmileyG_ is glad to have appeared in that Memo :D [16:52] :)) [16:52] If theres an automatic script like for memac.... ? [16:52] Looks like we're bout 30-40% done... [16:52] That's just IDed files - still this mess of unIDed stuff [16:52] oh, you underestimate how big it gets higher up [16:53] SmileyG: if you want, try the 85000-89999 range. [16:53] i am often not getting more than 10mbit/s down anyways [16:54] and up to archive.org it is similar (althought right now it rocks with 2MB/s) [16:54] isa there a script? [16:54] http://archiveteam.org/index.php?title=Fileplanet#How_to_help [16:57] o_O [16:58] i need to rate limit it to less than the full 10Mbit [16:58] this still needs to be a usable connection :D [16:58] I likely could mark the packets with iptables if I had enough time :/ [16:59] ah, sorry [17:00] in the script, add "--limit-rate=XX" to the wget lines. eg --limit-rate=500k for 500 Kilobytes/s [17:19] Ha ha, poor FOS is starting to implode under the weight of the processing I'm doing. [17:33] hmm, getting an error trying to create an archiveteam account [17:33] -> Fatal error: Call to a member function userCannot() on a non-object in /home/archivet/public_html/extensions/TitleBlacklist/TitleBlacklist.hooks.php on line 86 [17:34] Yes [17:34] I REALLY need to fix that [17:35] I was trying to sign up for fileplanet block [17:35] Emijrp was my man for helping me find plugins. if someone else knows wiki enough to help me I'll get them. [17:35] Pass it to Schbirid for now [17:35] schbirid - what range should I start with? [17:36] i was doing 80000-84999 [17:36] sadly my server rebooted in the middle [17:38] I [17:38] 'll go resu,e it now [17:42] codebear: Which block? I'll add you [17:42] schbirid - I'll start with 115000 - 119999 and 120000 - 124999 [17:43] Got it [17:43] thanks [17:45] Anyone else want a block? [17:46] ok, those two block ranges are downloading [17:48] I *so* want to pass DoubleJ before NotGLaDOS passes me [17:57] il take that last range if you guys want 85000-89999 [17:57] Dunno if anyone else is on it; if not, it's yours [17:57] Gotta run for it [17:58] a bit*...I need a new keyboard [17:58] its empty on the wiki so i assume so [17:58] SmileyG is running 85000-89999 i think [17:59] ah i see [17:59] well enjoy downloading :) [18:00] go 125000-129999 if you want :) [18:00] i think we should split those in 1k increments soonish [18:00] ah awesome theirs more ranges 125000-129999 it is [18:01] is the split needed because of ending file sizes? [18:01] yeah, 130000-220000 is free (and huge) [18:01] yes [18:01] the one server i'm using has 600G free - so I'm not worried about it [18:02] ah, and 800 on the other [18:02] youshould link to the raw script on the wiki [18:02] yea, that got me also - downloaded the html at first [18:02] hehe [18:02] Schbirid, can't you copy the seesaw-s3 logic so that one doesn't need to worry about manual tarring and upload? :) [18:02] Doing a final check of ANYHUB panic grab before deleting the original data - looks like 867gb of ANYHUB in total. [18:02] Yep. [18:03] Nemo_bis: i have no idea how it works but something automatic would rule [18:04] fixed the link and updated the table [18:04] thanks you guys :) [18:39] schbirid - I did a quick hack on your script to give it some seesaw like items [18:39] nothing as fancy, but it does the range given in chunks of 1000 [19:05] Schbirid: https://gist.github.com/2689781 needs another pair of eyes to test and confirm I didn't do something silly [19:07] codebear: sweet! [19:07] codebear: i would prefer endrange supplied as $3 so it is controllable without editing the script [19:07] the endrange is [19:07] wait [19:07] what I do is break the start - end up into chunks of 1k [19:08] so you specify 12000 13000 and then it breaks that down into 1k chunks, tars them, uploads and moves on [19:08] oh poo [19:08] make it the number of IDs to do. eg "100" (then it should do the $startrange until $startrange+99 [19:09] refactoring for "other people to read" made me make a mistake [19:09] we need smaller chunks later [19:09] :) [19:09] yeah, you dont use $3 at all [19:09] how does the controlling work? i only used that seesaw stuff as enduser so far, never looking into how it woks [19:10] I had to remove all of the fancy stuff that seesaw has - since this one doesn't have a web service to control the ranges and the like [19:10] so I just made this into another layer of bash loop to break the chunks into smaller pieces [19:10] awww, i thought i'd get that :)) [19:11] I don't know where they keep the code [19:14] ok, testing a fixed version with chunks of 99 [19:14] SketchCow: I heard you posted synth manuals to archive.org? [19:17] codebear: i gotta go, will take a look tomorrow. thanks! [19:17] schbirid - np - i'll update the gist in a bit [19:17] thanks [19:48] Yes, I'm doing that as we speak. [19:55] SketchCow: interesting — we spent much of the last week doing synth work, now gonna take a break from that and focus heavily on discferret again [19:57] there are a ton of service manuals at synfo.nl [20:05] http://www.loscha.com/scans/ also has a pile of manuals [20:39] 867G . [20:39] root@teamarchive-1:/2/ANYHUB/ANYHUB-PACKS# du -sh . [20:39] About to shove THAT in today [20:39] I estimate it'll probably take 24 hours. [20:39] But then Anyhub is somewhere! [21:11] Okay [21:11] I finished 80000-84999 [21:11] I need to merge the directory from my failed D/L [22:07] Debianer: Once you get the file count and size let me know [22:07] k [22:08] Are you taking another block? [22:09] not yet [22:10] 31G 80000-84999/ [22:10] du: cannot access `80000-84999/www.fileplanet.com/84351': Input/output error [22:10] Give me a few minutes; gotta run [22:42] Debianer: that doesn't sound good. [22:48] Debianer: Do you have a file count? [22:50] I'm fixing the FS [23:59] http://archive.org/details/archiveteam-anyhub-00000002 [23:59] As predictable, slow.