[02:34] Do we have rave archive? I see it on wayback machine but the download links don't work. [02:47] http://www.unlambda.com/cadr/index.html [02:48] Context: Lisp Machine OS tapes and emulator. [02:49] Link isn't even touched by wayback. [02:52] looks like it's all in wayback to me? [02:52] DFJustin: Maybe I need to type in the full URL? [02:52] http://web.archive.org/web/20130517041504/http://www.unlambda.com/cadr/index.html [02:53] Nope you're right, we've got it. [02:53] How about rave archive? The download links weren't working when I checked it. I think they might rely on some JS poopoo or something. [02:53] link? [02:54] ravearchive.com [02:56] yeah it's only got a handful of the actual files http://web.archive.org/web/*/http://ravearchive.com/* [02:56] (filter for "mp3") [02:56] Like wayback has the site, but if you try to download anything the links aren't there. [02:56] Of course the downloads are the valuable portion. [03:02] Interesting. [03:02] I can access the mp3's directly if I type in the link from the download page. [03:03] I wonder if the server will give me a listing of files in each directory. [03:03] Then we could just write a script to grab them. [03:04] The names seem to follow a standard format too. [03:07] The artist name, then a underscore followed by a dash, then another underscore, then the name of the mix tape with each word seperated by underscores. [03:07] So this is very grabbable, I wonder what archivebot was having trouble with. [03:10] DFJustin: My only concern is that they mention in their last news update that their monthly hosting transfers are like 800gigs, since this job would be in the tens of gigabytes at least, I wouldn't want to put too much strain on the guy operating the site. [03:28] oh he's actually looking for better hosting [03:28] this is exactly the kind of guy jason likes to help https://twitter.com/textfiles/status/383242724599558144 [03:28] DFJustin: Oh, right, silly me. [03:29] because IA can just host the downloads directly [03:29] Perfect. [03:35] i'm doing this but its not work: wget -r -l 0 -np -nc ftp://ftp.qmags.com/ --accept-regex='(.pdf)' --reject-regex='(\.exe|\.zip|\.sea.hqx)' [03:36] when i try to use that i the exe and sea.hqx still download [03:36] but i just want to grab the pdf [03:38] what's in the sae.hqx and the zip and the others? [03:38] usually for ftp servers I just use lftp's mirror to mirror the entire thing [03:38] sea.hqx** [03:39] hqx is an older mac compression format I believe- for OS 9 and earlier I'm pretty sure [03:40] hqx is an encoding format, yes; sea is basically sit [03:40] which is stuffit compression format [03:40] the unarchiver supports all these [03:42] for your regex, either you made a typo here or in the command itself when using it- unless there's actually files with .sea.hqx, you probably wanted \.sea|\.hqx [03:43] it is common for files to have both those extensions [03:43] personally I'd just use lftp to mirror the entire ftp server [03:43] rather than wget and exclude things other than pdfs [03:47] DFJustin: So when I find stuff like that what do I do? Email Jason? [03:47] (Where they're obviously looking for hosting.) [03:55] yeah that works [04:09] * namespace nearly just pulled a stupid with DD again on his backup drive [04:09] Is there a FOSS program that does the same thing without being ridiculously easy to screw up? [04:09] ddrescue? [04:09] I always use it [04:09] ddrescue is great [04:10] I'm not trying to rescue anything, just copy the disk. [04:10] *clone [04:10] still [04:10] it can be used for that [04:11] balrog: And it won't lend itself to borderline retarted behavior like destroying a directory if I forget to specify that I want the data to go into a file inside the directory as opposed to overwriting the directory? [04:11] sorry what do you mean [04:11] usually with dd you screw up when writing to a raw device [04:12] balrog: dd if=/dev/mydisk of=/media/my-backup/drives/main/directory [04:12] ddrescue: Output file exists and is not a regular file. [04:13] balrog: Yeah, I need to test to see if dd outputs the same message. [04:13] Doubt it. [04:13] nope [04:13] dd is pretty basic. [04:13] ddrescue can also be stopped and restarted, and logs bad sectors [04:13] (and can work around them) [04:13] Ooh, interesting. [04:13] yeah, it's a very powerful tool [04:14] i found the images to magazines published by qmag: http://img.qmags.com/SIPR/SIPR0112/thumbnails/size10/SIPR_0112_0011.jpg [04:14] the images put together in a cbz will be +100mb each [04:14] vs the 10-20mb pdf they have [04:15] also no borders in the images [05:25] DFJustin: By the way, did you already contact Jason or are we going to wait for him to get on IRC or? [05:25] I haven't done anything [05:26] DFJustin: What's his mail volume look like? If I send him an email will he read it? [05:26] Jason is never on IRC. [05:27] SketchCow: True, true. :P [05:27] SketchCow: You know what we're talking about then? [05:27] Vaguely. [05:27] Let's see what you're about to say and if I'll respond. [05:28] SketchCow: Okay, basically there's this website called ravearchive.com that's having hosting problems. I wanted to archive it, but apparently you've already had an open offer to help host collections like that. [05:33] (Their last update in this regard was circa 2011, so the situation may have changed for them.) [05:53] yipdw: Pull all tracker activity off FOS [05:54] done [05:54] bebo, myopera, and viddler are no longer going to fos [05:56] FOS? [05:56] namespace, fos.textfiles.com is one of the upload servers [05:56] garyrh: Got it. [05:57] it's the archive team fortress of solitude [05:59] the fortress of solitude is currently too crowded [05:59] It is going south, it will likely crash out. [05:59] So kill it, kill rsyncing to it. [05:59] That's just adding, and it WILL fail. [05:59] it's totally out of the tracker [06:00] Great. [06:08] SketchCow: So does ravearchive look interesting at all? [06:41] [06:41] Sorry. [06:41] https://news.ycombinator.com/item?id=7289296 [06:42] grabbing what i can of their tweets off of google cache [06:42] Anything else Mt. Gox related is probably gonna go soon. [06:42] putting it on archive.is [06:46] I'm sure we've already got their website grabbed. *goes to check anyway* [06:49] http://downforeveryoneorjustme.com/archive.org [07:02] Power outage [07:02] SketchCow: *nod* [07:37] Yeah, huge power outage. [07:37] We're clocked. [07:45] Good luck. [07:46] damn, that's one strong ddos [08:03] Oh? I suppose I should stop the uploads I just started [08:08] Power is out. [08:08] I'd call it a night if you wanted to do archive stuff. [08:08] my big problem right now is my flippy disk arrived. [08:09] And it buzzes like crazy when it gets power. [08:09] Not good. [08:09] DOA? That'd be sad. [08:33] aww buzz, you can't get your kitten to rip the disks while you sleep [08:46] good thing we have enough storage to keep archiving without archive.org :p [08:48] SketchCow: archive.org doesnt have diesel backup power? or is the upstream provider also down? [08:59] The whole part of the city was out [09:05] so who stuck a fork in the socket? [09:20] midas: A crispy crispy man if that was the result. [09:20] true true [09:21] ? [09:21] midas: can you create project-scoped rsync shares on your upload target [09:21] we need more space for bebo and viddler too [09:21] sure [09:21] thanks [09:25] :viddler and :bebo are up [09:26] ok [09:27] it's limited to 200 users atm, not sure how much this atom will take... will up it to 500 [09:27] is that /rsync/viddler or /viddler? [09:27] should be /viddler [09:27] ok [09:27] created two extra pools [09:28] can you confirm that /bebo and /viddler are filling up? [09:28] yep [09:28] both are [09:28] cool [09:28] ok, that'll give some time for archiveteam.kenshin.sg to breathe [09:28] ill up the limit to 500 and by the end of the day ill start building the megawarc box at my fiber box [12:26] I have gotten the files from the temp FP link. I'm ready for the next batch of ut3 mods. Also using FDM now. [13:11] https://news.ycombinator.com/item?id=7289296 [13:11] anybody have an archive? [13:13] Twitter. Library of Congress. [13:14] heh [13:14] just noticed that mithrandir posted some stuff [13:21] mtgox is killing it [13:21] +self ;) [13:36] tht coul dbe bad [13:39] That would be a discussion for #archiveteam-bs though. [15:14] http://blogs.loc.gov/digitalpreservation/2014/02/getting-public-radios-legacy-off-ageing-rewritable-cds-an-interview-with-wnycs-john-passmore/ [15:47] SketchCow: can we get a viddler archiveteam collection? [15:48] schbirid? [15:48] hi there [15:48] i ran out of space :) [15:48] well I got the ones you placed on the temp FP [15:49] and seem to be good. [15:49] and using the FDM :> [15:49] sweet! [15:49] ok, will grab the rest (or as much as possible again) [16:03] yipdw: Shortly, yes. [17:50] SketchCow: cool, thanks [17:53] Made. You're the one with the + in the e-mail, right? [17:59] yeah [18:13] archiveteam_viddler is now co-owned with you [18:14] Viddler wants to talk to me about the "overload" [18:14] We're apparently killing Viddler. [18:14] I thought they were working on that [18:14] maybe they don't like us inching on on their turf [18:20] I left a message with your company to call me. [18:20] I would appreciate your attention to this matter that is literally taking money out of my pocket. I am not a stake holder or owner, I'm hard working person that you are directly negatively effecting. I don't believe this was your original purpose or intent, but it's the reality right now. [18:20] Jason [18:20] Thank you and I look forward to speaking to you. [18:20] Bernie [18:21] SketchCow: ...ouch [18:21] the reality is that Bernie used the wrong word [18:21] it's affecting him [18:21] Shhh, he also used the words "I am not a stake holder" [18:21] Which is demonstrably false. [18:21] I don't hold the stake, I just drive it in [18:22] heh [18:22] literally taking money out of my pocket [18:22] Archive Team: Robbing The Rich To Give To The Poor [18:23] I wonder if that was indirectly due to the Kenshin cannon [18:24] what's that? [18:25] it's when Kenshin uses an ISP to power a bunch of warriors [18:25] Oh, I'm sure it's hurting them. [18:25] But Bernie is an engineer. [18:25] he pays the aws bill! [18:25] personally! [18:25] I just wrote back going "please tell me what you mean by "taking money out of your pocket". [18:26] asking the hard questions [18:27] i murdered viddler? [18:28] ohhhh Kenshin is a person [18:28] * Smiley renames Kenshin to RobinInDaHud. [18:28] uuh Kenshin exists [18:28] I thought yipdw was making some weird anime reference [18:29] * xmc returns to worklike activities [18:29] why would archiving viddler affect someone who has no stake in viddler [18:29] *scratches head* [18:30] Nemo_bis: of course i exist :P [18:30] you looked like a myth or something [18:31] i just don't talk much :) [18:34] xmc: nah, if I were doing that I'd call it the Macross or something [18:49] * ersi laughs at Bernie [19:06] Ha ha. [19:07] OK, so Bernie and I just "talked" [19:07] It turns out, Bernie and I both get what we want by yelling and not backing down. [19:07] So. [19:07] I've called a truce. No Viddler downloads for 3 days. [19:08] So who is this non-stake holder? [19:08] We said a lot of things. [19:08] He's the lead tech [19:08] He uses words and legal terms wrong, whatever. [19:08] Oh, so it's his baby sort of [19:08] Upshot is, we cost viddler, supposedly, $12k in bandwidth [19:09] nice [19:09] And we supposedly caused three customers to drop them, but who knows. [19:09] so will they willing give us their data [19:09] Outlook cloudy [19:09] Anyway, we hit the CDN directly. [19:09] or do we need to pry it from them [19:09] not to be a dick or something, but why do we care? [19:09] We're getting awesomeness from the CDN [19:09] hmm? are we? [19:09] Oh, bear in mind, I'm just passing along what was said. [19:09] doesn't look like cdn when i started digging [19:10] It's a shitty CDN [19:10] No access controls. [19:10] ah, so YOU will stop for 3 days SketchCow ;) [19:10] rly [19:10] No, we'll all stop. [19:10] videos not stored on cdn i think [19:10] I'm going to negotiate with them. [19:10] when i checked the IPs it was owned by viddler [19:10] But we were in pure reaction mode, me and them. [19:10] The issue is they have three kids of customers. [19:10] Free Vloggers [19:10] Paid Vloggers [19:10] Vendor Customers [19:11] which type bailed? [19:11] We of course have no way to tell who is who, so we were going linearly [19:11] Free Vloggers are being kicked to the curb. [19:11] Apparently the free vlogger stuff is going into Amazon Glacier [19:12] But he was terrified about telling the world that, a pretty distinct difference of philosophy [19:12] He goes "I know this is important and a big thing to you, getting our stuff" [19:12] I went "Dude, you are MONDAY." [19:12] In case you're wondering if I get digs in, etc. [19:12] how can you run a video site and pay that much for bandwidth [19:13] You obviously can't. [19:13] we only grabbed 1.5TB [19:13] i call bull. 1.7TB [19:13] i do that shit on my mobile phone [19:14] besides, if he puts in amazon Glacier it will even be more expensive to retrieve it for viddler, we are doing them a service [19:16] service-as-a-service [19:18] Sorry, phone calls with laywers [19:20] DFJustin: The assumption is that no one watches those free vloggers [19:20] ie they don't have to pay for 'em [19:20] :D [19:20] but then we come along, and fuck them right in the wallet [19:21] well if 1.6T did this the paid vlogs arent watched either [19:23] probably aren't, but doesn't matter - they get money for those [19:32] "but then we come along, and fuck them right in the wallet" [19:41] I'll take this to #viddler. [19:43] midas: how do things look on the hdd front? [19:45] Schbirid: been busy, very busy. ill let you know by the end of the week [19:45] i have stacks of old drives, but no new drives inbound atm :p [19:45] ok :) [19:46] Viddler not slowingdown, they say [19:51] Can you not kill them all yourself? [20:09] it will be fun, they sid [20:12] dud1: I don't know how I'd do that [20:12] that'd require me to have access on each fetch node [20:12] what you mean there isn't a root backdoor on the warrior [20:13] I don't work for the NSA, I am not l33t enough [20:17] goto fail [20:20] also, yeah, the viddler downloader actually does go straight to the CDN [20:20] https://github.com/ArchiveTeam/viddler-grab/blob/master/riddler.py pulls that off [20:24] Yeah [21:01] hello [21:01] I am trying to find out whether the actual file was archived and not just a link through the internet archive database? [21:04] the content is freeware so that is not an issue but I keep getting redirects or the message that the server the file is on is down? [21:05] can anyone please help me? thanks [21:09] hello? anyone? [21:12] that's not how irc works [21:12] give specifics on your problem and then maybe someone will respond [21:13] what file, what url, etc [21:14] but you may have to leave the window open for a while and come back since the exact person who can help you may not be on within the same 5 minute window as you [21:14] here is the link to one of the files I am talking about http://wayback.archive.org/web/20110623013539/http://www.ustrainz.com/files/ACECorona.cdp [21:15] so going to the "all dates" view it appears that url was only archived once http://web.archive.org/web/*/http://www.ustrainz.com/files/ACECorona.cdp [21:15] and that time was a redirect [21:15] so it would appear no, the file was not archived, at least under that url [21:16] is there a way to see if the actual file was saved\archived? [21:16] isn't that what I just said [21:17] this is the main site "archived": http://wayback.archive.org/web/*/http://www.ustrainz.com/* [21:21] the file extention is .cdp and all that was "back-up" are freeware and I have been able to download quite a few of the files for which I am great full but it is a pity that a number of the files weren't properly archived. :( [21:23] Yeah, we know how URLs work man [21:23] and see DFJustin's earlier answer [21:24] archive.org's great and all, but they can't catch all material that are up briefly. That's why this rag tag team of digital anarchy pack rats exist [21:24] we grab shit, stuff it places (like many times up to archive.org) [21:25] yeah you'll have to try google and hit up community forums or something [21:25] is the a way to actually see if the actual .cdp was archived and not a redirect,etc...? [21:26] if you fill in a * for the date like I did above, you can see how many times the url was archived [21:26] it looks like the actual content was only up until 2007 or so, so grabs after that are useless [21:27] unfortunately I can't get the files elsewhere as the company decides to a controversial move by removing free limited access to the database where the files were move to and I can't afford access atm :( [21:27] will try that [21:27] if you go to http://wayback.archive.org/web/*/http://www.ustrainz.com/* and search for .cdp you can see the earliest archive date for each of the files [21:29] @DFJustin I saw that [22:27] "And I am very lucky that I can demonstrate my lack of involvement in the spam links, thanks to numerous screenshots from the Internet Archive (and thank goodness for the Internet Archive)." http://www.sportsmediawatch.com/2014/02/how-google-nuked-sports-media-watch-for-a-crime-it-did-not-commit/ [22:29] logout