#archiveteam 2014-02-23,Sun

↑back Search

Time Nickname Message
00:10 🔗 balrog xmc: :(
00:10 🔗 balrog where was this?
00:10 🔗 balrog ahh ... http://en.wikipedia.org/wiki/University_of_Washington_firebombing_incident
00:11 🔗 xmc bingo
00:23 🔗 balrog yeah what.cd has been under ntp and traditional DDoS since early january.
00:24 🔗 BiggieJon ouch
00:28 🔗 dashcloud here's an article from Brian Krebs on the NTP attacks: http://krebsonsecurity.com/2014/02/the-new-normal-200-400-gbps-ddos-attacks/ (he got hit with 200 Gbps or more)
00:31 🔗 balrog oh yeah
00:31 🔗 balrog if you're on iOS, update now
00:52 🔗 DFJustin ouch, was gonna ask what could impact ia with a 20gbit pipe
01:03 🔗 ivan` if you like f.lux on your iOS don't update ;)
01:03 🔗 ivan` or not. huh. http://appadvice.com/appnn/2014/02/apples-ios-7-0-6-breaks-evasi0n7-but-it-can-easily-be-fixed
01:04 🔗 balrog ivan`: evasi0n has been updated.
01:05 🔗 ivan` yeah, incredible that Apple has let the jb ecosystem be happy for over 2 months already
01:38 🔗 namespace So on the subject of cool obscurish websites, has anybody ever grabbed this?
01:38 🔗 namespace http://www.wideweb.com/phonetrips/
01:38 🔗 dashcloud grabbing it now!
01:39 🔗 namespace dashcloud: Make sure to get the groupbell ftp server, that's where the really cool stuff is.
01:39 🔗 namespace Those realaudio files don't work anymore.
01:41 🔗 namespace ftp://ftp.wideweb.com/GroupBell/
01:42 🔗 namespace (In case you're wondering what they are, old phreaker recordings. Pretty cool to listen to. Also includes my favorite hacker story where Evan Doorbell narrates how he became a phone phreak.)
01:43 🔗 namespace dashcloud: So what tools do you use for this? Is there a wget script you have lying around?
01:44 🔗 dashcloud there is a wget command, but for smaller things, there's ArchiveBot
01:44 🔗 dashcloud which grabs sites in the same way every time, and packages them up for upload to IA, and then put in a collection
01:44 🔗 dashcloud eventually the pages make their way into the Wayback Machine
01:44 🔗 namespace Ah.
01:45 🔗 namespace Does it do ftp?
01:45 🔗 dashcloud no- but there's a standard command set for FTP
01:45 🔗 dashcloud http://archiveteam.org/index.php?title=FTP
01:45 🔗 namespace Yeah, then this probably hasn't been grabbed yet, I searched evan doorbell on archive.org and got zip. Any other ways to figure out if somebodies gotten something yet?
01:46 🔗 dashcloud not really- if you know the site address, you can obviously check wayback, but other than that, no
01:47 🔗 DFJustin pretty much all our grabs our going into wayback with a small delay so that's gonna be your best bet
01:47 🔗 DFJustin *are going
01:47 🔗 DFJustin you can also search for the domain in the archive.org collections
01:48 🔗 dashcloud I'm not entirely sure what command archivebot is using, but here's ones we've used in the past to grab sites for inclusion into the wayback machine: http://www.archiveteam.org/index.php?title=Wget
01:48 🔗 BiggieJon .ram file ? vlc plays them fine
01:48 🔗 namespace BiggieJon: Yeah but they're streams.
01:48 🔗 namespace Not the actual audio.
01:48 🔗 namespace And the streaming server seems to have died last time I checked.
01:52 🔗 namespace Interesting, I seem to have found another source for these tapes.
01:52 🔗 namespace http://www.evan-doorbell.com/
01:55 🔗 namespace Let's say I wanted to grab the stuff on that site, where would I start?
01:56 🔗 * namespace is reading the linked page on wget now
02:01 🔗 namespace So the wget man page says that -c isn't required to have the current instance of wget try to redownload a file if it stops downloading.
02:01 🔗 namespace But the wiki claims otherwise. Should an edit be made?
02:05 🔗 DFJustin you can just have archivebot grab them
02:05 🔗 DFJustin come to #archivebot
02:06 🔗 godane i just did that
02:06 🔗 dashcloud not sure, but do feel free to edit the wiki
02:07 🔗 namespace I thought you said archivebot doesn't do ftp though.
02:07 🔗 dashcloud archivebot doesn't- there's separate wget commands for that
02:08 🔗 godane the files look to be on the website
02:08 🔗 godane not a ftp
02:08 🔗 namespace godane: So archivebot will grab files even if they're like 60MB each off a web server?
02:08 🔗 godane yes
02:09 🔗 namespace godane: And you already started the process for the website I linked?
02:09 🔗 namespace (So I shouldn't go start it again...)
02:09 🔗 DFJustin it's in the queue
02:09 🔗 namespace Okay.
02:10 🔗 namespace How do I add a website to the queue?
02:10 🔗 DFJustin read the topic
02:10 🔗 dashcloud join #archivebot first
02:10 🔗 namespace I did.
02:10 🔗 namespace " This dashboard is a Javascript monstrosity, yo." Nice.
02:15 🔗 namespace So I'm looking at an example of a similar website on archive.org (Ravearchive.com) and it doesn't seem to include the tapes, just the skeleton of the website. Of course ravearchive value is in the tapes.
02:15 🔗 dashcloud namespace: I've checked the FTP you mentioned, and it's still up- maybe 4-5 GB space needed to download it all. Here's the instructions on downloading an FTP site: http://archiveteam.org/index.php?title=FTP
02:19 🔗 namespace dashcloud: Which one? Wideweb or evan-doorbell?
02:19 🔗 dashcloud wideweb
02:21 🔗 namespace I'm not sure I understand the two lines about tar'ing the files after wgetting them.
02:24 🔗 namespace Okay okay, that third line is making a list of files included.
02:25 🔗 namespace And the second line is making a tar file, but I'm not sure I understand what of. It uses the ftp link of the website, but that should already be handled by wget, unless of course the link is really a directory named after the website.
02:27 🔗 DFJustin yeah wget creates a directory with the domain name
02:27 🔗 namespace DFJustin: Got it. I'd start grabbing but I think dashcloud already said he was.
02:28 🔗 dashcloud if you do that, make sure to check the file listing in case you need to change/remove the user/group part of the file
02:28 🔗 dashcloud I haven't started the FTP site
02:28 🔗 namespace dashcloud: Cool, I got this then.
03:12 🔗 namespace https://news.ycombinator.com/item?id=7284541
03:12 🔗 namespace Interesting.
03:12 🔗 namespace HN is hard to grab because of the Arc servers in-built limitations on page loading. Do we already have a copy anyway?
03:16 🔗 namespace Also I'm finished with the wideweb grab, what does Jason Scott mean by adding metadata?
04:16 🔗 namespace Like when I go to upload to archive.org (which I'm basically ready to do once I've got this sorted out) how should the files be arranged? One zip file containing the file list and the tar, or should those be uploaded as two seperate files? Is there anything I should add to the tar to describe the files? Or do I add that info to the archive.org website after it's uploaded?
04:21 🔗 chfoo namespace: i think the file list file and tar file should be separate files but uploaded as under one item. info about the item should be entered as the item description.
04:22 🔗 namespace chfoo: Thanks.
04:22 🔗 namespace Doing it now.
04:26 🔗 SketchCow s3 seems back.
04:28 🔗 godane i hope so
04:28 🔗 godane cause i'm uploading items for cnnmoney archive
04:29 🔗 namespace SketchCow: Hey, how do I get involved in the bitsavers/etc metadata project?
04:29 🔗 namespace I have experience in this area, sort of.
04:29 🔗 godane but looks like it since my new items are appearing there
04:35 🔗 SketchCow Well, bitsavers does itself except the descriptions.
04:36 🔗 namespace SketchCow: Yes, I meant the entire metadata thing you were talking about being a bottleneck back in 2011, and more recently.
04:36 🔗 namespace You said you had all this stuff with no metadata, and needed people to help tag it.
04:36 🔗 SketchCow Do you mean you specifically want to do metadata for bitsavers or general metadata in a variety of collections I have?
04:37 🔗 SketchCow Just understanding your interests.
04:37 🔗 namespace SketchCow: General Metadata.
04:37 🔗 SketchCow It's lonely work.
04:37 🔗 namespace SketchCow: I know, I've sat hunched over a table for hours at a time taking photos of books.
04:38 🔗 SketchCow I'm thinking of something you can take on.
04:38 🔗 namespace Okay.
04:38 🔗 SketchCow Do you have an internet archive.org account?
04:38 🔗 namespace Yes.
04:38 🔗 namespace Just signed up.
04:39 🔗 namespace (It's 'bandpass')
04:40 🔗 dashcloud SketchCow: is the Apple II Asimov stuff mirrored on IA somewhere such that I can point to individual files, rather than having to link to the FTP?
04:40 🔗 SketchCow Well, it's in a big collection that can be linked to within
04:43 🔗 dashcloud okay
04:43 🔗 SketchCow namespace: Do you want to describe items you can read, or research metadata from the world?
04:43 🔗 namespace SketchCow: What would the latter look like?
04:43 🔗 SketchCow http://ia601704.us.archive.org/zipview.php?zip=/13/items/asimov.apple.archive.2013.03/asimov.apple.archive.2013.03.zip
04:44 🔗 SketchCow dashcloud: Example:
04:45 🔗 SketchCow http://archive.org/download/asimov.apple.archive.2013.03/asimov.apple.archive.2013.03.zip/apple_II%2Fimages%2Fgames%2Ffile_based%2Fdiamondmine_dungbeetles_ladytut_minotaur.dsk
04:45 🔗 dashcloud that's pretty awesome
04:46 🔗 DFJustin yeah just add a / on the end of any zip file on an archive.org item to get a browse view
04:46 🔗 DFJustin they really need to add a link on the page to that
04:46 🔗 SketchCow That's why I make these sites zipped and sitting around.
04:46 🔗 namespace How about tar files?
04:46 🔗 DFJustin works on zip, tar, and iso/cdr
04:46 🔗 SketchCow They're huge ice blocks I can use
04:47 🔗 DFJustin does not work on tar.*, rar, bin
04:47 🔗 SketchCow namespace: Example: https://archive.org/details/computermagazines <--- description of each magazine
04:48 🔗 SketchCow Not each ISSUE, just each title, so it has some context
04:51 🔗 namespace Hmm. So you mean I'd have to go research what these magazines are?
04:52 🔗 namespace I don't see how that's different from just reading them and then describing the contents.
04:52 🔗 namespace Or rather, why that approach wouldn't work here.
04:52 🔗 dashcloud check out this collection of magazines: http://archive.org/search.php?query=collection%3Abig-k-magazine&sort=-publicdate
04:53 🔗 namespace dashcloud: What about them?
04:53 🔗 dashcloud they have metadata entered for them- the table of contents page(s) fully transcribed
04:53 🔗 namespace And you want me to do that to a different collection of magazines?
04:54 🔗 dashcloud if you're interested in that sort of thing
04:54 🔗 namespace dashcloud: Okay, I think I can do that. Not right now though.
04:54 🔗 dashcloud I think the other thing SketchCow was talking about is metadata work on projects like Console Living room or Business Case
04:55 🔗 * namespace idly wonders if we could kickstart a fund to pay people on mechanical turk or whatever to do this
04:57 🔗 dashcloud if you're more interested in digging up information on items, and researching a thing, rather than describing a thing, you might be interested in working on something like the Business Case: https://archive.org/details/businesscase
06:22 🔗 namespace https://archive.org/details/2014.02.ftp.wideweb.com
06:22 🔗 namespace Any suggestions?
06:22 🔗 * namespace has to go do stuff, will be back in a bit
06:53 🔗 xmc looks good to me, namespace
06:53 🔗 xmc thanks for grabbing that
07:02 🔗 namespace xmc: I think I'm gonna add a short list of some of the notable items featured within. (3.8 gigs is a lot of material after all)
07:02 🔗 xmc ure
07:02 🔗 xmc sure
07:08 🔗 namespace Also, I messed up on what category to put it in (should go into "computers and technology")
07:08 🔗 namespace (Put it into community audio by accident.)
07:08 🔗 namespace But the editor won't let me change it.
08:11 🔗 DFJustin as a new user you can only put stuff in the community areas
08:25 🔗 namespace DFJustin: How long does that take to wear off? Can I get an upgrade?
08:27 🔗 DFJustin oh I phrased that poorly, it's as a non-admin user
08:28 🔗 DFJustin an admin can move items on your behalf or specifically grant you access to collections on a case by case basis
08:28 🔗 namespace DFJustin: Got it.
08:29 🔗 DFJustin but it's moreso geared towards small specific collections of stuff you uploaded, computers and tech is a big general area that you wouldn't get access to
08:30 🔗 DFJustin but if you upload 100 episodes of a computer podcast or something then you can get a collection made for that under computers and tech
08:30 🔗 namespace Okay.
15:03 🔗 unbeholde guys I have quite a problem, after taking the advice of you guys I got the UT3 mods. The full list of files I drew up and several have come up corrupted (after fully downloading) I tryed to download them twice (once with Chrome and once with Download Accelerator Plus): List of archive site links that show up as corrupted: I've placed all the bad links here in a text file. http://depositfiles.com/files/31bq5xkpn
15:04 🔗 Schbirid please post text on sites like https://pastee.org/ or pastebin.com or else
15:06 🔗 unbeholde ...ok.. https://pastee.org/n5ckj
15:07 🔗 Schbirid hm, getting an 502 error on that
15:09 🔗 unbeholde ugh fine I'll try the other site: http://pastebin.com/kGiwAPqH
15:09 🔗 Schbirid ah, you came to the right guy :)
15:13 🔗 Schbirid i highly suggest NOT using tools like Download Accelerator Plus, they stress servers and can suck
15:13 🔗 Schbirid race.zip is fine for me
15:14 🔗 * Schbirid downloads some more
15:18 🔗 Schbirid prometheus_v3.zip also fine
15:20 🔗 Schbirid talus.zip too
15:20 🔗 Schbirid i will host them elsewhere for you if you promise not to use DAP anymore ;)
15:21 🔗 Schbirid try http://www.freedownloadmanager.org/ but without that stupid multiple connection stuff
15:26 🔗 unbeholde I see. Thank you kind sir.
15:27 🔗 Schbirid https://www.quaddicted.com/files/temp/fp/ they are done when there is a file called "done"
15:27 🔗 Schbirid downloading now
15:42 🔗 Schbirid are those all huge files?
15:45 🔗 Schbirid btw, did you use https://www.quaddicted.com/stuff/temp/fileplanet-postgres.php?filename=test to find them? i havent advertised that as it is work in progress but it is handy
15:46 🔗 Asparagir Need ops in ArchiveBot plz.
15:46 🔗 Asparagir More Ukrainian sites to add...
15:48 🔗 unbeholde yeah that was the recommended way to get em. I shall visit the temp fp once you have made some more progress.
15:49 🔗 Schbirid nice
15:49 🔗 Schbirid if you stay in this channel, i will poke you
16:23 🔗 xmc Schbirid: what does download accelerator plus do?
16:23 🔗 xmc does it open a bunch of connections in parallel?
16:24 🔗 xmc there's a debian package for that, 'axel'
16:24 🔗 Schbirid probably
16:24 🔗 Schbirid yeah, don't use those unless you must
16:24 🔗 Schbirid aria2c is also nice for it
16:45 🔗 joepie91 oh man
16:45 🔗 joepie91 download accelerator plus
16:45 🔗 joepie91 that's still around?
16:45 🔗 * joepie91 usually recommends Orbit for windows
16:52 🔗 Nemo_bis right, I used Orbit too
16:52 🔗 Nemo_bis though that was ages ago
17:08 🔗 joepie91 I remember NetAnt
17:14 🔗 SadDM Anybody know if there's a way to change an item's media type on IA?
17:15 🔗 Nemo_bis metadata api, mostly
17:15 🔗 dashcloud if you're an admin of a collection, yes, otherwise you're pretty much stuck with whatever choices the interface gives you
17:17 🔗 SadDM daaang... not the end of the world I suppose. Thanks.
17:21 🔗 Schbirid * unbeholde has quit (Quit: Page closed)
17:21 🔗 Schbirid oh great
17:21 🔗 dashcloud I guess I should clarify- if IA has marked your book as an image, that you can change; if IA has put your item in a collection and you want to change that collection, you're stuck
17:35 🔗 SadDM dashcloud: that's exactly my case... a book marked as an image. Can I change it through the web somewhere, or do I need to poke around at a lower level (and if so, where should I start)?
17:35 🔗 dashcloud go to your item, and choose edit item
17:36 🔗 SadDM yup... go on
17:36 🔗 dashcloud you want to edit stuff about your item (should be the left option)
17:37 🔗 SadDM uh huh, and the "mediatype" is displayed (as "image") but not editable.
17:38 🔗 dashcloud what's the link to the item on IA?
17:39 🔗 SadDM https://archive.org/details/romeos_quest
17:42 🔗 dashcloud you could try changing the mediatype for the PDF to texts
17:44 🔗 SadDM oh... like on a per-file basis. I'll give that a try in a bit. Either way it doesn't *really* matter, but thanks for the ideas.
18:25 🔗 joepie91 dashcloud: IA thinks my software is all books :(
19:46 🔗 DFJustin you can't change it using the web interface but you should be able to using other methods like https://pypi.python.org/pypi/internetarchive
20:06 🔗 arkiver is it possible to set a number of retries for https://pypi.python.org/pypi/internetarchive?
20:12 🔗 arkiver the number of retries for an upload
20:22 🔗 joepie91 :P
20:22 🔗 joepie91 while true
20:22 🔗 ersi Well, I think he means on the command line. The answer is most likely "No"
20:23 🔗 ersi there's no option for retries if I do just `ia upload`
20:24 🔗 arkiver hmm
20:24 🔗 arkiver but for the other s3 uploade there is
20:25 🔗 arkiver but I found out the internetarchive uploader is far more faster then the ias3upload uploader
20:25 🔗 arkiver or is there a way to increase the upload spead of te ias3upload?
20:26 🔗 ersi My brain hurts while reading what you wrote
20:27 🔗 arkiver haha
20:27 🔗 ersi Well, you could, if inclined, take a look at how they differ and see what makes one of 'em go faster than the other. I dunno.
22:28 🔗 yipdw arkiver: one of these days you're going to have to learn that there is much more to the Internet than what software you personally are using
22:28 🔗 yipdw in this particular case, IA's S3 endpoint and/or something upstream is getting really fucked right now
22:40 🔗 namespace :P
22:40 🔗 namespace yipdw: My first recommendation for faster speeds would have been to find an IA server close to you and specify that IP.
22:40 🔗 namespace (If the software will let you do that.)
22:41 🔗 namespace (And assuming IA has regional servers.)
22:41 🔗 yipdw because I have never seen one
22:41 🔗 yipdw namespace: if you know of any other IA S3 endpoint that is not s3-lb0.us.archive.org, please tell me what it is
22:41 🔗 namespace yipdw: Ah, suspected as much.
22:41 🔗 yipdw suggesting geographical optimization is a fine idea, but is only useful if such a thing even exists
22:41 🔗 yipdw and it's not the path length, anyway
22:42 🔗 yipdw I mean
22:42 🔗 DFJustin hmm I always just use s3.us.archive.org
22:42 🔗 yipdw https://gist.github.com/yipdw/70b1464deb1a9fd5a093
22:42 🔗 DFJustin which may be a load balancer to more than one endpoint
22:42 🔗 yipdw you really can't get much shorter than that
22:42 🔗 DFJustin but they're all gonna be in the bay area
22:42 🔗 yipdw at least not transatlantic
22:43 🔗 yipdw but it's much slower than what I usually see and I have no idea what it is
22:44 🔗 yipdw it's possible that other paths into IA are fine
22:44 🔗 yipdw actually, they all look okay: https://monitor.archive.org/weathermap/weathermap.png
22:44 🔗 yipdw must be on my end
22:44 🔗 yipdw interesting
22:48 🔗 namespace I'm lucky in that I live right next to Cali.
22:52 🔗 yipdw actually, I wonder if the recent outbreak of NTP DDoS attacks are related
22:52 🔗 yipdw IA got hit, but they're not the only victim
23:01 🔗 namespace yipdw: I think freenode is dealing with that right now.
23:22 🔗 namespace I have my entire high school careers worth of schoolwork sitting in this massive pile, but I'd be stepping on like 1000 peoples copyrights if I scanned it and put it up, what should I do with it?
23:22 🔗 namespace (Obviously this wouldn't go on archive, this would be hosted somewhere else.)
23:22 🔗 namespace (As a snapshot of schoolwork in 2010-1014.)
23:25 🔗 dashcloud if you've got any good code bits, put them up
23:25 🔗 namespace dashcloud: Good code bits? No.
23:26 🔗 namespace I have code bits, but I didn't understand how data structures work. (I still don't.)
23:26 🔗 namespace So nothing substantial.
23:26 🔗 namespace I don't think I ever got something to work.
23:26 🔗 namespace That was more trivial than like a function.
23:26 🔗 namespace *less trivial
23:28 🔗 namespace Mainly the value is in having something to look at in terms of what grade school education looks like at the time I went to school at my particular school/district.
23:29 🔗 namespace It probably falls under the category of "weird data" that won't be routinely saved until we have like 256TB hard drives.
23:31 🔗 namespace (I collect tons of "weird data" like this, I recently threw out a little bowl of fortunes from fortune cookies.)
23:31 🔗 namespace (If I'd gone through and counted those out I'd have a lower bound on how many times I've had teriyaki in my lifetime.)

irclogger-viewer