#archiveteam 2014-02-23,Sun

↑back Search

Time	Nickname	Message
00:10 ^🔗	balrog	xmc: :(
00:10 ^🔗	balrog	where was this?
00:10 ^🔗	balrog	ahh ... http://en.wikipedia.org/wiki/University_of_Washington_firebombing_incident
00:11 ^🔗	xmc	bingo
00:23 ^🔗	balrog	yeah what.cd has been under ntp and traditional DDoS since early january.
00:24 ^🔗	BiggieJon	ouch
00:28 ^🔗	dashcloud	here's an article from Brian Krebs on the NTP attacks: http://krebsonsecurity.com/2014/02/the-new-normal-200-400-gbps-ddos-attacks/ (he got hit with 200 Gbps or more)
00:31 ^🔗	balrog	oh yeah
00:31 ^🔗	balrog	if you're on iOS, update now
00:52 ^🔗	DFJustin	ouch, was gonna ask what could impact ia with a 20gbit pipe
01:03 ^🔗	ivan`	if you like f.lux on your iOS don't update ;)
01:03 ^🔗	ivan`	or not. huh. http://appadvice.com/appnn/2014/02/apples-ios-7-0-6-breaks-evasi0n7-but-it-can-easily-be-fixed
01:04 ^🔗	balrog	ivan`: evasi0n has been updated.
01:05 ^🔗	ivan`	yeah, incredible that Apple has let the jb ecosystem be happy for over 2 months already
01:38 ^🔗	namespace	So on the subject of cool obscurish websites, has anybody ever grabbed this?
01:38 ^🔗	namespace	http://www.wideweb.com/phonetrips/
01:38 ^🔗	dashcloud	grabbing it now!
01:39 ^🔗	namespace	dashcloud: Make sure to get the groupbell ftp server, that's where the really cool stuff is.
01:39 ^🔗	namespace	Those realaudio files don't work anymore.
01:41 ^🔗	namespace	ftp://ftp.wideweb.com/GroupBell/
01:42 ^🔗	namespace	(In case you're wondering what they are, old phreaker recordings. Pretty cool to listen to. Also includes my favorite hacker story where Evan Doorbell narrates how he became a phone phreak.)
01:43 ^🔗	namespace	dashcloud: So what tools do you use for this? Is there a wget script you have lying around?
01:44 ^🔗	dashcloud	there is a wget command, but for smaller things, there's ArchiveBot
01:44 ^🔗	dashcloud	which grabs sites in the same way every time, and packages them up for upload to IA, and then put in a collection
01:44 ^🔗	dashcloud	eventually the pages make their way into the Wayback Machine
01:44 ^🔗	namespace	Ah.
01:45 ^🔗	namespace	Does it do ftp?
01:45 ^🔗	dashcloud	no- but there's a standard command set for FTP
01:45 ^🔗	dashcloud	http://archiveteam.org/index.php?title=FTP
01:45 ^🔗	namespace	Yeah, then this probably hasn't been grabbed yet, I searched evan doorbell on archive.org and got zip. Any other ways to figure out if somebodies gotten something yet?
01:46 ^🔗	dashcloud	not really- if you know the site address, you can obviously check wayback, but other than that, no
01:47 ^🔗	DFJustin	pretty much all our grabs our going into wayback with a small delay so that's gonna be your best bet
01:47 ^🔗	DFJustin	*are going
01:47 ^🔗	DFJustin	you can also search for the domain in the archive.org collections
01:48 ^🔗	dashcloud	I'm not entirely sure what command archivebot is using, but here's ones we've used in the past to grab sites for inclusion into the wayback machine: http://www.archiveteam.org/index.php?title=Wget
01:48 ^🔗	BiggieJon	.ram file ? vlc plays them fine
01:48 ^🔗	namespace	BiggieJon: Yeah but they're streams.
01:48 ^🔗	namespace	Not the actual audio.
01:48 ^🔗	namespace	And the streaming server seems to have died last time I checked.
01:52 ^🔗	namespace	Interesting, I seem to have found another source for these tapes.
01:52 ^🔗	namespace	http://www.evan-doorbell.com/
01:55 ^🔗	namespace	Let's say I wanted to grab the stuff on that site, where would I start?
01:56 ^🔗	*	namespace is reading the linked page on wget now
02:01 ^🔗	namespace	So the wget man page says that -c isn't required to have the current instance of wget try to redownload a file if it stops downloading.
02:01 ^🔗	namespace	But the wiki claims otherwise. Should an edit be made?
02:05 ^🔗	DFJustin	you can just have archivebot grab them
02:05 ^🔗	DFJustin	come to #archivebot
02:06 ^🔗	godane	i just did that
02:06 ^🔗	dashcloud	not sure, but do feel free to edit the wiki
02:07 ^🔗	namespace	I thought you said archivebot doesn't do ftp though.
02:07 ^🔗	dashcloud	archivebot doesn't- there's separate wget commands for that
02:08 ^🔗	godane	the files look to be on the website
02:08 ^🔗	godane	not a ftp
02:08 ^🔗	namespace	godane: So archivebot will grab files even if they're like 60MB each off a web server?
02:08 ^🔗	godane	yes
02:09 ^🔗	namespace	godane: And you already started the process for the website I linked?
02:09 ^🔗	namespace	(So I shouldn't go start it again...)
02:09 ^🔗	DFJustin	it's in the queue
02:09 ^🔗	namespace	Okay.
02:10 ^🔗	namespace	How do I add a website to the queue?
02:10 ^🔗	DFJustin	read the topic
02:10 ^🔗	dashcloud	join #archivebot first
02:10 ^🔗	namespace	I did.
02:10 ^🔗	namespace	" This dashboard is a Javascript monstrosity, yo." Nice.
02:15 ^🔗	namespace	So I'm looking at an example of a similar website on archive.org (Ravearchive.com) and it doesn't seem to include the tapes, just the skeleton of the website. Of course ravearchive value is in the tapes.
02:15 ^🔗	dashcloud	namespace: I've checked the FTP you mentioned, and it's still up- maybe 4-5 GB space needed to download it all. Here's the instructions on downloading an FTP site: http://archiveteam.org/index.php?title=FTP
02:19 ^🔗	namespace	dashcloud: Which one? Wideweb or evan-doorbell?
02:19 ^🔗	dashcloud	wideweb
02:21 ^🔗	namespace	I'm not sure I understand the two lines about tar'ing the files after wgetting them.
02:24 ^🔗	namespace	Okay okay, that third line is making a list of files included.
02:25 ^🔗	namespace	And the second line is making a tar file, but I'm not sure I understand what of. It uses the ftp link of the website, but that should already be handled by wget, unless of course the link is really a directory named after the website.
02:27 ^🔗	DFJustin	yeah wget creates a directory with the domain name
02:27 ^🔗	namespace	DFJustin: Got it. I'd start grabbing but I think dashcloud already said he was.
02:28 ^🔗	dashcloud	if you do that, make sure to check the file listing in case you need to change/remove the user/group part of the file
02:28 ^🔗	dashcloud	I haven't started the FTP site
02:28 ^🔗	namespace	dashcloud: Cool, I got this then.
03:12 ^🔗	namespace	https://news.ycombinator.com/item?id=7284541
03:12 ^🔗	namespace	Interesting.
03:12 ^🔗	namespace	HN is hard to grab because of the Arc servers in-built limitations on page loading. Do we already have a copy anyway?
03:16 ^🔗	namespace	Also I'm finished with the wideweb grab, what does Jason Scott mean by adding metadata?
04:16 ^🔗	namespace	Like when I go to upload to archive.org (which I'm basically ready to do once I've got this sorted out) how should the files be arranged? One zip file containing the file list and the tar, or should those be uploaded as two seperate files? Is there anything I should add to the tar to describe the files? Or do I add that info to the archive.org website after it's uploaded?
04:21 ^🔗	chfoo	namespace: i think the file list file and tar file should be separate files but uploaded as under one item. info about the item should be entered as the item description.
04:22 ^🔗	namespace	chfoo: Thanks.
04:22 ^🔗	namespace	Doing it now.
04:26 ^🔗	SketchCow	s3 seems back.
04:28 ^🔗	godane	i hope so
04:28 ^🔗	godane	cause i'm uploading items for cnnmoney archive
04:29 ^🔗	namespace	SketchCow: Hey, how do I get involved in the bitsavers/etc metadata project?
04:29 ^🔗	namespace	I have experience in this area, sort of.
04:29 ^🔗	godane	but looks like it since my new items are appearing there
04:35 ^🔗	SketchCow	Well, bitsavers does itself except the descriptions.
04:36 ^🔗	namespace	SketchCow: Yes, I meant the entire metadata thing you were talking about being a bottleneck back in 2011, and more recently.
04:36 ^🔗	namespace	You said you had all this stuff with no metadata, and needed people to help tag it.
04:36 ^🔗	SketchCow	Do you mean you specifically want to do metadata for bitsavers or general metadata in a variety of collections I have?
04:37 ^🔗	SketchCow	Just understanding your interests.
04:37 ^🔗	namespace	SketchCow: General Metadata.
04:37 ^🔗	SketchCow	It's lonely work.
04:37 ^🔗	namespace	SketchCow: I know, I've sat hunched over a table for hours at a time taking photos of books.
04:38 ^🔗	SketchCow	I'm thinking of something you can take on.
04:38 ^🔗	namespace	Okay.
04:38 ^🔗	SketchCow	Do you have an internet archive.org account?
04:38 ^🔗	namespace	Yes.
04:38 ^🔗	namespace	Just signed up.
04:39 ^🔗	namespace	(It's 'bandpass')
04:40 ^🔗	dashcloud	SketchCow: is the Apple II Asimov stuff mirrored on IA somewhere such that I can point to individual files, rather than having to link to the FTP?
04:40 ^🔗	SketchCow	Well, it's in a big collection that can be linked to within
04:43 ^🔗	dashcloud	okay
04:43 ^🔗	SketchCow	namespace: Do you want to describe items you can read, or research metadata from the world?
04:43 ^🔗	namespace	SketchCow: What would the latter look like?
04:43 ^🔗	SketchCow	http://ia601704.us.archive.org/zipview.php?zip=/13/items/asimov.apple.archive.2013.03/asimov.apple.archive.2013.03.zip
04:44 ^🔗	SketchCow	dashcloud: Example:
04:45 ^🔗	SketchCow	http://archive.org/download/asimov.apple.archive.2013.03/asimov.apple.archive.2013.03.zip/apple_II%2Fimages%2Fgames%2Ffile_based%2Fdiamondmine_dungbeetles_ladytut_minotaur.dsk
04:45 ^🔗	dashcloud	that's pretty awesome
04:46 ^🔗	DFJustin	yeah just add a / on the end of any zip file on an archive.org item to get a browse view
04:46 ^🔗	DFJustin	they really need to add a link on the page to that
04:46 ^🔗	SketchCow	That's why I make these sites zipped and sitting around.
04:46 ^🔗	namespace	How about tar files?
04:46 ^🔗	DFJustin	works on zip, tar, and iso/cdr
04:46 ^🔗	SketchCow	They're huge ice blocks I can use
04:47 ^🔗	DFJustin	does not work on tar.*, rar, bin
04:47 ^🔗	SketchCow	namespace: Example: https://archive.org/details/computermagazines <--- description of each magazine
04:48 ^🔗	SketchCow	Not each ISSUE, just each title, so it has some context
04:51 ^🔗	namespace	Hmm. So you mean I'd have to go research what these magazines are?
04:52 ^🔗	namespace	I don't see how that's different from just reading them and then describing the contents.
04:52 ^🔗	namespace	Or rather, why that approach wouldn't work here.
04:52 ^🔗	dashcloud	check out this collection of magazines: http://archive.org/search.php?query=collection%3Abig-k-magazine&sort=-publicdate
04:53 ^🔗	namespace	dashcloud: What about them?
04:53 ^🔗	dashcloud	they have metadata entered for them- the table of contents page(s) fully transcribed
04:53 ^🔗	namespace	And you want me to do that to a different collection of magazines?
04:54 ^🔗	dashcloud	if you're interested in that sort of thing
04:54 ^🔗	namespace	dashcloud: Okay, I think I can do that. Not right now though.
04:54 ^🔗	dashcloud	I think the other thing SketchCow was talking about is metadata work on projects like Console Living room or Business Case
04:55 ^🔗	*	namespace idly wonders if we could kickstart a fund to pay people on mechanical turk or whatever to do this
04:57 ^🔗	dashcloud	if you're more interested in digging up information on items, and researching a thing, rather than describing a thing, you might be interested in working on something like the Business Case: https://archive.org/details/businesscase
06:22 ^🔗	namespace	https://archive.org/details/2014.02.ftp.wideweb.com
06:22 ^🔗	namespace	Any suggestions?
06:22 ^🔗	*	namespace has to go do stuff, will be back in a bit
06:53 ^🔗	xmc	looks good to me, namespace
06:53 ^🔗	xmc	thanks for grabbing that
07:02 ^🔗	namespace	xmc: I think I'm gonna add a short list of some of the notable items featured within. (3.8 gigs is a lot of material after all)
07:02 ^🔗	xmc	ure
07:02 ^🔗	xmc	sure
07:08 ^🔗	namespace	Also, I messed up on what category to put it in (should go into "computers and technology")
07:08 ^🔗	namespace	(Put it into community audio by accident.)
07:08 ^🔗	namespace	But the editor won't let me change it.
08:11 ^🔗	DFJustin	as a new user you can only put stuff in the community areas
08:25 ^🔗	namespace	DFJustin: How long does that take to wear off? Can I get an upgrade?
08:27 ^🔗	DFJustin	oh I phrased that poorly, it's as a non-admin user
08:28 ^🔗	DFJustin	an admin can move items on your behalf or specifically grant you access to collections on a case by case basis
08:28 ^🔗	namespace	DFJustin: Got it.
08:29 ^🔗	DFJustin	but it's moreso geared towards small specific collections of stuff you uploaded, computers and tech is a big general area that you wouldn't get access to
08:30 ^🔗	DFJustin	but if you upload 100 episodes of a computer podcast or something then you can get a collection made for that under computers and tech
08:30 ^🔗	namespace	Okay.
15:03 ^🔗	unbeholde	guys I have quite a problem, after taking the advice of you guys I got the UT3 mods. The full list of files I drew up and several have come up corrupted (after fully downloading) I tryed to download them twice (once with Chrome and once with Download Accelerator Plus): List of archive site links that show up as corrupted: I've placed all the bad links here in a text file. http://depositfiles.com/files/31bq5xkpn
15:04 ^🔗	Schbirid	please post text on sites like https://pastee.org/ or pastebin.com or else
15:06 ^🔗	unbeholde	...ok.. https://pastee.org/n5ckj
15:07 ^🔗	Schbirid	hm, getting an 502 error on that
15:09 ^🔗	unbeholde	ugh fine I'll try the other site: http://pastebin.com/kGiwAPqH
15:09 ^🔗	Schbirid	ah, you came to the right guy :)
15:13 ^🔗	Schbirid	i highly suggest NOT using tools like Download Accelerator Plus, they stress servers and can suck
15:13 ^🔗	Schbirid	race.zip is fine for me
15:14 ^🔗	*	Schbirid downloads some more
15:18 ^🔗	Schbirid	prometheus_v3.zip also fine
15:20 ^🔗	Schbirid	talus.zip too
15:20 ^🔗	Schbirid	i will host them elsewhere for you if you promise not to use DAP anymore ;)
15:21 ^🔗	Schbirid	try http://www.freedownloadmanager.org/ but without that stupid multiple connection stuff
15:26 ^🔗	unbeholde	I see. Thank you kind sir.
15:27 ^🔗	Schbirid	https://www.quaddicted.com/files/temp/fp/ they are done when there is a file called "done"
15:27 ^🔗	Schbirid	downloading now
15:42 ^🔗	Schbirid	are those all huge files?
15:45 ^🔗	Schbirid	btw, did you use https://www.quaddicted.com/stuff/temp/fileplanet-postgres.php?filename=test to find them? i havent advertised that as it is work in progress but it is handy
15:46 ^🔗	Asparagir	Need ops in ArchiveBot plz.
15:46 ^🔗	Asparagir	More Ukrainian sites to add...
15:48 ^🔗	unbeholde	yeah that was the recommended way to get em. I shall visit the temp fp once you have made some more progress.
15:49 ^🔗	Schbirid	nice
15:49 ^🔗	Schbirid	if you stay in this channel, i will poke you
16:23 ^🔗	xmc	Schbirid: what does download accelerator plus do?
16:23 ^🔗	xmc	does it open a bunch of connections in parallel?
16:24 ^🔗	xmc	there's a debian package for that, 'axel'
16:24 ^🔗	Schbirid	probably
16:24 ^🔗	Schbirid	yeah, don't use those unless you must
16:24 ^🔗	Schbirid	aria2c is also nice for it
16:45 ^🔗	joepie91	oh man
16:45 ^🔗	joepie91	download accelerator plus
16:45 ^🔗	joepie91	that's still around?
16:45 ^🔗	*	joepie91 usually recommends Orbit for windows
16:52 ^🔗	Nemo_bis	right, I used Orbit too
16:52 ^🔗	Nemo_bis	though that was ages ago
17:08 ^🔗	joepie91	I remember NetAnt
17:14 ^🔗	SadDM	Anybody know if there's a way to change an item's media type on IA?
17:15 ^🔗	Nemo_bis	metadata api, mostly
17:15 ^🔗	dashcloud	if you're an admin of a collection, yes, otherwise you're pretty much stuck with whatever choices the interface gives you
17:17 ^🔗	SadDM	daaang... not the end of the world I suppose. Thanks.
17:21 ^🔗	Schbirid	* unbeholde has quit (Quit: Page closed)
17:21 ^🔗	Schbirid	oh great
17:21 ^🔗	dashcloud	I guess I should clarify- if IA has marked your book as an image, that you can change; if IA has put your item in a collection and you want to change that collection, you're stuck
17:35 ^🔗	SadDM	dashcloud: that's exactly my case... a book marked as an image. Can I change it through the web somewhere, or do I need to poke around at a lower level (and if so, where should I start)?
17:35 ^🔗	dashcloud	go to your item, and choose edit item
17:36 ^🔗	SadDM	yup... go on
17:36 ^🔗	dashcloud	you want to edit stuff about your item (should be the left option)
17:37 ^🔗	SadDM	uh huh, and the "mediatype" is displayed (as "image") but not editable.
17:38 ^🔗	dashcloud	what's the link to the item on IA?
17:39 ^🔗	SadDM	https://archive.org/details/romeos_quest
17:42 ^🔗	dashcloud	you could try changing the mediatype for the PDF to texts
17:44 ^🔗	SadDM	oh... like on a per-file basis. I'll give that a try in a bit. Either way it doesn't really matter, but thanks for the ideas.
18:25 ^🔗	joepie91	dashcloud: IA thinks my software is all books :(
19:46 ^🔗	DFJustin	you can't change it using the web interface but you should be able to using other methods like https://pypi.python.org/pypi/internetarchive
20:06 ^🔗	arkiver	is it possible to set a number of retries for https://pypi.python.org/pypi/internetarchive?
20:12 ^🔗	arkiver	the number of retries for an upload
20:22 ^🔗	joepie91	:P
20:22 ^🔗	joepie91	while true
20:22 ^🔗	ersi	Well, I think he means on the command line. The answer is most likely "No"
20:23 ^🔗	ersi	there's no option for retries if I do just `ia upload`
20:24 ^🔗	arkiver	hmm
20:24 ^🔗	arkiver	but for the other s3 uploade there is
20:25 ^🔗	arkiver	but I found out the internetarchive uploader is far more faster then the ias3upload uploader
20:25 ^🔗	arkiver	or is there a way to increase the upload spead of te ias3upload?
20:26 ^🔗	ersi	My brain hurts while reading what you wrote
20:27 ^🔗	arkiver	haha
20:27 ^🔗	ersi	Well, you could, if inclined, take a look at how they differ and see what makes one of 'em go faster than the other. I dunno.
22:28 ^🔗	yipdw	arkiver: one of these days you're going to have to learn that there is much more to the Internet than what software you personally are using
22:28 ^🔗	yipdw	in this particular case, IA's S3 endpoint and/or something upstream is getting really fucked right now
22:40 ^🔗	namespace	:P
22:40 ^🔗	namespace	yipdw: My first recommendation for faster speeds would have been to find an IA server close to you and specify that IP.
22:40 ^🔗	namespace	(If the software will let you do that.)
22:41 ^🔗	namespace	(And assuming IA has regional servers.)
22:41 ^🔗	yipdw	because I have never seen one
22:41 ^🔗	yipdw	namespace: if you know of any other IA S3 endpoint that is not s3-lb0.us.archive.org, please tell me what it is
22:41 ^🔗	namespace	yipdw: Ah, suspected as much.
22:41 ^🔗	yipdw	suggesting geographical optimization is a fine idea, but is only useful if such a thing even exists
22:41 ^🔗	yipdw	and it's not the path length, anyway
22:42 ^🔗	yipdw	I mean
22:42 ^🔗	DFJustin	hmm I always just use s3.us.archive.org
22:42 ^🔗	yipdw	https://gist.github.com/yipdw/70b1464deb1a9fd5a093
22:42 ^🔗	DFJustin	which may be a load balancer to more than one endpoint
22:42 ^🔗	yipdw	you really can't get much shorter than that
22:42 ^🔗	DFJustin	but they're all gonna be in the bay area
22:42 ^🔗	yipdw	at least not transatlantic
22:43 ^🔗	yipdw	but it's much slower than what I usually see and I have no idea what it is
22:44 ^🔗	yipdw	it's possible that other paths into IA are fine
22:44 ^🔗	yipdw	actually, they all look okay: https://monitor.archive.org/weathermap/weathermap.png
22:44 ^🔗	yipdw	must be on my end
22:44 ^🔗	yipdw	interesting
22:48 ^🔗	namespace	I'm lucky in that I live right next to Cali.
22:52 ^🔗	yipdw	actually, I wonder if the recent outbreak of NTP DDoS attacks are related
22:52 ^🔗	yipdw	IA got hit, but they're not the only victim
23:01 ^🔗	namespace	yipdw: I think freenode is dealing with that right now.
23:22 ^🔗	namespace	I have my entire high school careers worth of schoolwork sitting in this massive pile, but I'd be stepping on like 1000 peoples copyrights if I scanned it and put it up, what should I do with it?
23:22 ^🔗	namespace	(Obviously this wouldn't go on archive, this would be hosted somewhere else.)
23:22 ^🔗	namespace	(As a snapshot of schoolwork in 2010-1014.)
23:25 ^🔗	dashcloud	if you've got any good code bits, put them up
23:25 ^🔗	namespace	dashcloud: Good code bits? No.
23:26 ^🔗	namespace	I have code bits, but I didn't understand how data structures work. (I still don't.)
23:26 ^🔗	namespace	So nothing substantial.
23:26 ^🔗	namespace	I don't think I ever got something to work.
23:26 ^🔗	namespace	That was more trivial than like a function.
23:26 ^🔗	namespace	*less trivial
23:28 ^🔗	namespace	Mainly the value is in having something to look at in terms of what grade school education looks like at the time I went to school at my particular school/district.
23:29 ^🔗	namespace	It probably falls under the category of "weird data" that won't be routinely saved until we have like 256TB hard drives.
23:31 ^🔗	namespace	(I collect tons of "weird data" like this, I recently threw out a little bowl of fortunes from fortune cookies.)
23:31 ^🔗	namespace	(If I'd gone through and counted those out I'd have a lower bound on how many times I've had teriyaki in my lifetime.)

irclogger-viewer