#archiveteam 2014-02-24,Mon

↑back Search

Time Nickname Message
02:34 🔗 namespace Do we have rave archive? I see it on wayback machine but the download links don't work.
02:47 🔗 namespace http://www.unlambda.com/cadr/index.html
02:48 🔗 namespace Context: Lisp Machine OS tapes and emulator.
02:49 🔗 namespace Link isn't even touched by wayback.
02:52 🔗 DFJustin looks like it's all in wayback to me?
02:52 🔗 namespace DFJustin: Maybe I need to type in the full URL?
02:52 🔗 DFJustin http://web.archive.org/web/20130517041504/http://www.unlambda.com/cadr/index.html
02:53 🔗 namespace Nope you're right, we've got it.
02:53 🔗 namespace How about rave archive? The download links weren't working when I checked it. I think they might rely on some JS poopoo or something.
02:53 🔗 DFJustin link?
02:54 🔗 namespace ravearchive.com
02:56 🔗 DFJustin yeah it's only got a handful of the actual files http://web.archive.org/web/*/http://ravearchive.com/*
02:56 🔗 DFJustin (filter for "mp3")
02:56 🔗 namespace Like wayback has the site, but if you try to download anything the links aren't there.
02:56 🔗 namespace Of course the downloads are the valuable portion.
03:02 🔗 namespace Interesting.
03:02 🔗 namespace I can access the mp3's directly if I type in the link from the download page.
03:03 🔗 namespace I wonder if the server will give me a listing of files in each directory.
03:03 🔗 namespace Then we could just write a script to grab them.
03:04 🔗 namespace The names seem to follow a standard format too.
03:07 🔗 namespace The artist name, then a underscore followed by a dash, then another underscore, then the name of the mix tape with each word seperated by underscores.
03:07 🔗 namespace So this is very grabbable, I wonder what archivebot was having trouble with.
03:10 🔗 namespace DFJustin: My only concern is that they mention in their last news update that their monthly hosting transfers are like 800gigs, since this job would be in the tens of gigabytes at least, I wouldn't want to put too much strain on the guy operating the site.
03:28 🔗 DFJustin oh he's actually looking for better hosting
03:28 🔗 DFJustin this is exactly the kind of guy jason likes to help https://twitter.com/textfiles/status/383242724599558144
03:28 🔗 namespace DFJustin: Oh, right, silly me.
03:29 🔗 DFJustin because IA can just host the downloads directly
03:29 🔗 namespace Perfect.
03:35 🔗 godane i'm doing this but its not work: wget -r -l 0 -np -nc ftp://ftp.qmags.com/ --accept-regex='(.pdf)' --reject-regex='(\.exe|\.zip|\.sea.hqx)'
03:36 🔗 godane when i try to use that i the exe and sea.hqx still download
03:36 🔗 godane but i just want to grab the pdf
03:38 🔗 balrog what's in the sae.hqx and the zip and the others?
03:38 🔗 balrog usually for ftp servers I just use lftp's mirror to mirror the entire thing
03:38 🔗 balrog sea.hqx**
03:39 🔗 dashcloud hqx is an older mac compression format I believe- for OS 9 and earlier I'm pretty sure
03:40 🔗 balrog hqx is an encoding format, yes; sea is basically sit
03:40 🔗 balrog which is stuffit compression format
03:40 🔗 balrog the unarchiver supports all these
03:42 🔗 dashcloud for your regex, either you made a typo here or in the command itself when using it- unless there's actually files with .sea.hqx, you probably wanted \.sea|\.hqx
03:43 🔗 DFJustin it is common for files to have both those extensions
03:43 🔗 balrog personally I'd just use lftp to mirror the entire ftp server
03:43 🔗 balrog rather than wget and exclude things other than pdfs
03:47 🔗 namespace DFJustin: So when I find stuff like that what do I do? Email Jason?
03:47 🔗 namespace (Where they're obviously looking for hosting.)
03:55 🔗 DFJustin yeah that works
04:09 🔗 * namespace nearly just pulled a stupid with DD again on his backup drive
04:09 🔗 namespace Is there a FOSS program that does the same thing without being ridiculously easy to screw up?
04:09 🔗 balrog ddrescue?
04:09 🔗 balrog I always use it
04:09 🔗 garyrh ddrescue is great
04:10 🔗 namespace I'm not trying to rescue anything, just copy the disk.
04:10 🔗 namespace *clone
04:10 🔗 balrog still
04:10 🔗 balrog it can be used for that
04:11 🔗 namespace balrog: And it won't lend itself to borderline retarted behavior like destroying a directory if I forget to specify that I want the data to go into a file inside the directory as opposed to overwriting the directory?
04:11 🔗 balrog sorry what do you mean
04:11 🔗 balrog usually with dd you screw up when writing to a raw device
04:12 🔗 namespace balrog: dd if=/dev/mydisk of=/media/my-backup/drives/main/directory
04:12 🔗 balrog ddrescue: Output file exists and is not a regular file.
04:13 🔗 namespace balrog: Yeah, I need to test to see if dd outputs the same message.
04:13 🔗 namespace Doubt it.
04:13 🔗 balrog nope
04:13 🔗 balrog dd is pretty basic.
04:13 🔗 balrog ddrescue can also be stopped and restarted, and logs bad sectors
04:13 🔗 balrog (and can work around them)
04:13 🔗 namespace Ooh, interesting.
04:13 🔗 balrog yeah, it's a very powerful tool
04:14 🔗 godane i found the images to magazines published by qmag: http://img.qmags.com/SIPR/SIPR0112/thumbnails/size10/SIPR_0112_0011.jpg
04:14 🔗 godane the images put together in a cbz will be +100mb each
04:14 🔗 godane vs the 10-20mb pdf they have
04:15 🔗 godane also no borders in the images
05:25 🔗 namespace DFJustin: By the way, did you already contact Jason or are we going to wait for him to get on IRC or?
05:25 🔗 DFJustin I haven't done anything
05:26 🔗 namespace DFJustin: What's his mail volume look like? If I send him an email will he read it?
05:26 🔗 SketchCow Jason is never on IRC.
05:27 🔗 namespace SketchCow: True, true. :P
05:27 🔗 namespace SketchCow: You know what we're talking about then?
05:27 🔗 SketchCow Vaguely.
05:27 🔗 SketchCow Let's see what you're about to say and if I'll respond.
05:28 🔗 namespace SketchCow: Okay, basically there's this website called ravearchive.com that's having hosting problems. I wanted to archive it, but apparently you've already had an open offer to help host collections like that.
05:33 🔗 namespace (Their last update in this regard was circa 2011, so the situation may have changed for them.)
05:53 🔗 SketchCow yipdw: Pull all tracker activity off FOS
05:54 🔗 yipdw done
05:54 🔗 yipdw bebo, myopera, and viddler are no longer going to fos
05:56 🔗 namespace FOS?
05:56 🔗 garyrh namespace, fos.textfiles.com is one of the upload servers
05:56 🔗 namespace garyrh: Got it.
05:57 🔗 DFJustin it's the archive team fortress of solitude
05:59 🔗 yipdw the fortress of solitude is currently too crowded
05:59 🔗 SketchCow It is going south, it will likely crash out.
05:59 🔗 SketchCow So kill it, kill rsyncing to it.
05:59 🔗 SketchCow That's just adding, and it WILL fail.
05:59 🔗 yipdw it's totally out of the tracker
06:00 🔗 SketchCow Great.
06:08 🔗 namespace SketchCow: So does ravearchive look interesting at all?
06:41 🔗 namespace 
06:41 🔗 namespace Sorry.
06:41 🔗 namespace https://news.ycombinator.com/item?id=7289296
06:42 🔗 garyrh grabbing what i can of their tweets off of google cache
06:42 🔗 namespace Anything else Mt. Gox related is probably gonna go soon.
06:42 🔗 garyrh putting it on archive.is
06:46 🔗 namespace I'm sure we've already got their website grabbed. *goes to check anyway*
06:49 🔗 namespace http://downforeveryoneorjustme.com/archive.org
07:02 🔗 SketchCow Power outage
07:02 🔗 namespace SketchCow: *nod*
07:37 🔗 SketchCow Yeah, huge power outage.
07:37 🔗 SketchCow We're clocked.
07:45 🔗 namespace Good luck.
07:46 🔗 xmc damn, that's one strong ddos
08:03 🔗 Nemo_bis Oh? I suppose I should stop the uploads I just started
08:08 🔗 SketchCow Power is out.
08:08 🔗 SketchCow I'd call it a night if you wanted to do archive stuff.
08:08 🔗 SketchCow my big problem right now is my flippy disk arrived.
08:09 🔗 SketchCow And it buzzes like crazy when it gets power.
08:09 🔗 SketchCow Not good.
08:09 🔗 SketchCow DOA? That'd be sad.
08:33 🔗 Nemo_bis aww buzz, you can't get your kitten to rip the disks while you sleep
08:46 🔗 midas good thing we have enough storage to keep archiving without archive.org :p
08:48 🔗 midas SketchCow: archive.org doesnt have diesel backup power? or is the upstream provider also down?
08:59 🔗 SketchCow The whole part of the city was out
09:05 🔗 midas so who stuck a fork in the socket?
09:20 🔗 namespace midas: A crispy crispy man if that was the result.
09:20 🔗 midas true true
09:21 🔗 yipdw ?
09:21 🔗 yipdw midas: can you create project-scoped rsync shares on your upload target
09:21 🔗 yipdw we need more space for bebo and viddler too
09:21 🔗 midas sure
09:21 🔗 yipdw thanks
09:25 🔗 midas :viddler and :bebo are up
09:26 🔗 yipdw ok
09:27 🔗 midas it's limited to 200 users atm, not sure how much this atom will take... will up it to 500
09:27 🔗 yipdw is that /rsync/viddler or /viddler?
09:27 🔗 midas should be /viddler
09:27 🔗 yipdw ok
09:27 🔗 midas created two extra pools
09:28 🔗 yipdw can you confirm that /bebo and /viddler are filling up?
09:28 🔗 midas yep
09:28 🔗 midas both are
09:28 🔗 yipdw cool
09:28 🔗 yipdw ok, that'll give some time for archiveteam.kenshin.sg to breathe
09:28 🔗 midas ill up the limit to 500 and by the end of the day ill start building the megawarc box at my fiber box
12:26 🔗 unbeholde I have gotten the files from the temp FP link. I'm ready for the next batch of ut3 mods. Also using FDM now.
13:11 🔗 joepie91 https://news.ycombinator.com/item?id=7289296
13:11 🔗 joepie91 anybody have an archive?
13:13 🔗 ersi Twitter. Library of Congress.
13:14 🔗 joepie91 heh
13:14 🔗 joepie91 just noticed that mithrandir posted some stuff
13:21 🔗 midas mtgox is killing it
13:21 🔗 midas +self ;)
13:36 🔗 BiggieJon tht coul dbe bad
13:39 🔗 ersi That would be a discussion for #archiveteam-bs though.
15:14 🔗 SketchCow http://blogs.loc.gov/digitalpreservation/2014/02/getting-public-radios-legacy-off-ageing-rewritable-cds-an-interview-with-wnycs-john-passmore/
15:47 🔗 yipdw SketchCow: can we get a viddler archiveteam collection?
15:48 🔗 unbeholde schbirid?
15:48 🔗 Schbirid hi there
15:48 🔗 Schbirid i ran out of space :)
15:48 🔗 unbeholde well I got the ones you placed on the temp FP
15:49 🔗 unbeholde and seem to be good.
15:49 🔗 unbeholde and using the FDM :>
15:49 🔗 Schbirid sweet!
15:49 🔗 Schbirid ok, will grab the rest (or as much as possible again)
16:03 🔗 SketchCow yipdw: Shortly, yes.
17:50 🔗 yipdw SketchCow: cool, thanks
17:53 🔗 SketchCow Made. You're the one with the + in the e-mail, right?
17:59 🔗 yipdw yeah
18:13 🔗 SketchCow archiveteam_viddler is now co-owned with you
18:14 🔗 SketchCow Viddler wants to talk to me about the "overload"
18:14 🔗 SketchCow We're apparently killing Viddler.
18:14 🔗 xmc I thought they were working on that
18:14 🔗 xmc maybe they don't like us inching on on their turf
18:20 🔗 SketchCow I left a message with your company to call me.
18:20 🔗 SketchCow I would appreciate your attention to this matter that is literally taking money out of my pocket. I am not a stake holder or owner, I'm hard working person that you are directly negatively effecting. I don't believe this was your original purpose or intent, but it's the reality right now.
18:20 🔗 SketchCow Jason
18:20 🔗 SketchCow Thank you and I look forward to speaking to you.
18:20 🔗 SketchCow Bernie
18:21 🔗 balrog SketchCow: ...ouch
18:21 🔗 yipdw the reality is that Bernie used the wrong word
18:21 🔗 yipdw it's affecting him
18:21 🔗 SketchCow Shhh, he also used the words "I am not a stake holder"
18:21 🔗 SketchCow Which is demonstrably false.
18:21 🔗 xmc I don't hold the stake, I just drive it in
18:22 🔗 yipdw heh
18:22 🔗 yipdw literally taking money out of my pocket
18:22 🔗 yipdw Archive Team: Robbing The Rich To Give To The Poor
18:23 🔗 yipdw I wonder if that was indirectly due to the Kenshin cannon
18:24 🔗 xmc what's that?
18:25 🔗 yipdw it's when Kenshin uses an ISP to power a bunch of warriors
18:25 🔗 SketchCow Oh, I'm sure it's hurting them.
18:25 🔗 SketchCow But Bernie is an engineer.
18:25 🔗 xmc he pays the aws bill!
18:25 🔗 xmc personally!
18:25 🔗 SketchCow I just wrote back going "please tell me what you mean by "taking money out of your pocket".
18:26 🔗 xmc asking the hard questions
18:27 🔗 Kenshin i murdered viddler?
18:28 🔗 xmc ohhhh Kenshin is a person
18:28 🔗 * Smiley renames Kenshin to RobinInDaHud.
18:28 🔗 Nemo_bis uuh Kenshin exists
18:28 🔗 xmc I thought yipdw was making some weird anime reference
18:29 🔗 * xmc returns to worklike activities
18:29 🔗 Kenshin why would archiving viddler affect someone who has no stake in viddler
18:29 🔗 Kenshin *scratches head*
18:30 🔗 Kenshin Nemo_bis: of course i exist :P
18:30 🔗 Nemo_bis you looked like a myth or something
18:31 🔗 Kenshin i just don't talk much :)
18:34 🔗 yipdw xmc: nah, if I were doing that I'd call it the Macross or something
18:49 🔗 * ersi laughs at Bernie
19:06 🔗 SketchCow Ha ha.
19:07 🔗 SketchCow OK, so Bernie and I just "talked"
19:07 🔗 SketchCow It turns out, Bernie and I both get what we want by yelling and not backing down.
19:07 🔗 SketchCow So.
19:07 🔗 SketchCow I've called a truce. No Viddler downloads for 3 days.
19:08 🔗 ersi So who is this non-stake holder?
19:08 🔗 SketchCow We said a lot of things.
19:08 🔗 SketchCow He's the lead tech
19:08 🔗 SketchCow He uses words and legal terms wrong, whatever.
19:08 🔗 ersi Oh, so it's his baby sort of
19:08 🔗 SketchCow Upshot is, we cost viddler, supposedly, $12k in bandwidth
19:09 🔗 ersi nice
19:09 🔗 SketchCow And we supposedly caused three customers to drop them, but who knows.
19:09 🔗 Kenshin so will they willing give us their data
19:09 🔗 ersi Outlook cloudy
19:09 🔗 SketchCow Anyway, we hit the CDN directly.
19:09 🔗 Kenshin or do we need to pry it from them
19:09 🔗 midas not to be a dick or something, but why do we care?
19:09 🔗 SketchCow We're getting awesomeness from the CDN
19:09 🔗 Kenshin hmm? are we?
19:09 🔗 SketchCow Oh, bear in mind, I'm just passing along what was said.
19:09 🔗 Kenshin doesn't look like cdn when i started digging
19:10 🔗 SketchCow It's a shitty CDN
19:10 🔗 SketchCow No access controls.
19:10 🔗 midas ah, so YOU will stop for 3 days SketchCow ;)
19:10 🔗 Kenshin rly
19:10 🔗 SketchCow No, we'll all stop.
19:10 🔗 Kenshin videos not stored on cdn i think
19:10 🔗 SketchCow I'm going to negotiate with them.
19:10 🔗 Kenshin when i checked the IPs it was owned by viddler
19:10 🔗 SketchCow But we were in pure reaction mode, me and them.
19:10 🔗 SketchCow The issue is they have three kids of customers.
19:10 🔗 SketchCow Free Vloggers
19:10 🔗 SketchCow Paid Vloggers
19:10 🔗 SketchCow Vendor Customers
19:11 🔗 Kenshin which type bailed?
19:11 🔗 SketchCow We of course have no way to tell who is who, so we were going linearly
19:11 🔗 SketchCow Free Vloggers are being kicked to the curb.
19:11 🔗 SketchCow Apparently the free vlogger stuff is going into Amazon Glacier
19:12 🔗 SketchCow But he was terrified about telling the world that, a pretty distinct difference of philosophy
19:12 🔗 SketchCow He goes "I know this is important and a big thing to you, getting our stuff"
19:12 🔗 SketchCow I went "Dude, you are MONDAY."
19:12 🔗 SketchCow In case you're wondering if I get digs in, etc.
19:12 🔗 DFJustin how can you run a video site and pay that much for bandwidth
19:13 🔗 SketchCow You obviously can't.
19:13 🔗 midas we only grabbed 1.5TB
19:13 🔗 Kenshin i call bull. 1.7TB
19:13 🔗 midas i do that shit on my mobile phone
19:14 🔗 midas besides, if he puts in amazon Glacier it will even be more expensive to retrieve it for viddler, we are doing them a service
19:16 🔗 xmc service-as-a-service
19:18 🔗 SketchCow Sorry, phone calls with laywers
19:20 🔗 ersi DFJustin: The assumption is that no one watches those free vloggers
19:20 🔗 ersi ie they don't have to pay for 'em
19:20 🔗 ersi :D
19:20 🔗 ersi but then we come along, and fuck them right in the wallet
19:21 🔗 midas well if 1.6T did this the paid vlogs arent watched either
19:23 🔗 ersi probably aren't, but doesn't matter - they get money for those
19:32 🔗 xmc "but then we come along, and fuck them right in the wallet"
19:41 🔗 SketchCow I'll take this to #viddler.
19:43 🔗 Schbirid midas: how do things look on the hdd front?
19:45 🔗 midas Schbirid: been busy, very busy. ill let you know by the end of the week
19:45 🔗 midas i have stacks of old drives, but no new drives inbound atm :p
19:45 🔗 Schbirid ok :)
19:46 🔗 SketchCow Viddler not slowingdown, they say
19:51 🔗 dud1 Can you not kill them all yourself?
20:09 🔗 ersi it will be fun, they sid
20:12 🔗 yipdw dud1: I don't know how I'd do that
20:12 🔗 yipdw that'd require me to have access on each fetch node
20:12 🔗 DFJustin what you mean there isn't a root backdoor on the warrior
20:13 🔗 yipdw I don't work for the NSA, I am not l33t enough
20:17 🔗 Schbirid goto fail
20:20 🔗 yipdw also, yeah, the viddler downloader actually does go straight to the CDN
20:20 🔗 yipdw https://github.com/ArchiveTeam/viddler-grab/blob/master/riddler.py pulls that off
20:24 🔗 SketchCow Yeah
21:01 🔗 maxdamage hello
21:01 🔗 maxdamage I am trying to find out whether the actual file was archived and not just a link through the internet archive database?
21:04 🔗 maxdamage the content is freeware so that is not an issue but I keep getting redirects or the message that the server the file is on is down?
21:05 🔗 maxdamage can anyone please help me? thanks
21:09 🔗 maxdamage hello? anyone?
21:12 🔗 DFJustin that's not how irc works
21:12 🔗 DFJustin give specifics on your problem and then maybe someone will respond
21:13 🔗 DFJustin what file, what url, etc
21:14 🔗 DFJustin but you may have to leave the window open for a while and come back since the exact person who can help you may not be on within the same 5 minute window as you
21:14 🔗 maxdamage here is the link to one of the files I am talking about http://wayback.archive.org/web/20110623013539/http://www.ustrainz.com/files/ACECorona.cdp
21:15 🔗 DFJustin so going to the "all dates" view it appears that url was only archived once http://web.archive.org/web/*/http://www.ustrainz.com/files/ACECorona.cdp
21:15 🔗 DFJustin and that time was a redirect
21:15 🔗 DFJustin so it would appear no, the file was not archived, at least under that url
21:16 🔗 maxdamage is there a way to see if the actual file was saved\archived?
21:16 🔗 DFJustin isn't that what I just said
21:17 🔗 maxdamage this is the main site "archived": http://wayback.archive.org/web/*/http://www.ustrainz.com/*
21:21 🔗 maxdamage the file extention is .cdp and all that was "back-up" are freeware and I have been able to download quite a few of the files for which I am great full but it is a pity that a number of the files weren't properly archived. :(
21:23 🔗 ersi Yeah, we know how URLs work man
21:23 🔗 ersi and see DFJustin's earlier answer
21:24 🔗 ersi archive.org's great and all, but they can't catch all material that are up briefly. That's why this rag tag team of digital anarchy pack rats exist
21:24 🔗 ersi we grab shit, stuff it places (like many times up to archive.org)
21:25 🔗 DFJustin yeah you'll have to try google and hit up community forums or something
21:25 🔗 maxdamage is the a way to actually see if the actual .cdp was archived and not a redirect,etc...?
21:26 🔗 DFJustin if you fill in a * for the date like I did above, you can see how many times the url was archived
21:26 🔗 DFJustin it looks like the actual content was only up until 2007 or so, so grabs after that are useless
21:27 🔗 maxdamage unfortunately I can't get the files elsewhere as the company decides to a controversial move by removing free limited access to the database where the files were move to and I can't afford access atm :(
21:27 🔗 maxdamage will try that
21:27 🔗 DFJustin if you go to http://wayback.archive.org/web/*/http://www.ustrainz.com/* and search for .cdp you can see the earliest archive date for each of the files
21:29 🔗 maxdamage @DFJustin I saw that
22:27 🔗 ivan` "And I am very lucky that I can demonstrate my lack of involvement in the spam links, thanks to numerous screenshots from the Internet Archive (and thank goodness for the Internet Archive)." http://www.sportsmediawatch.com/2014/02/how-google-nuked-sports-media-watch-for-a-crime-it-did-not-commit/
22:29 🔗 maxdamage logout

irclogger-viewer