#archiveteam 2013-07-28,Sun

↑back Search

Time	Nickname	Message
01:17 ^🔗	balrog	please archive http://polytroncorporation.com asap
01:18 ^🔗	balrog	and https://twitter.com/polytron if possible
01:31 ^🔗	godane	balrog: i'm grabing the site right now
01:31 ^🔗	balrog	ok
09:06 ^🔗	cathalgar	Hello all: looking to talk to someone in UrlTeam? :)
09:07 ^🔗	cathalgar	Have a userscript project I was told ye might like: https://gitorious.org/cguserscripts/unbitly
09:07 ^🔗	cathalgar	Online API keeps a cache of recent URLs, but I could come up with something cleaner if it could be integrated with UrlTeam's efforts, for a more permanent backup of cached URLs: https://cathalgarvey.pythonanywhere.com/unbitly/dump
09:07 ^🔗	cathalgar	It was designed as a privacy shiv, not an archivist solution, y'see.
09:09 ^🔗	BlueMax	cathalgar, /join #urlteam
09:10 ^🔗	omf_	cathalgar, I pasted your conversion into #urlteam
09:21 ^🔗	cathalgar	K, thanks
09:33 ^🔗	godane	going to see if i can get a 2tb hard drive for under $8
10:05 ^🔗	godane	so looks like all my tasks are waiting
10:06 ^🔗	godane	its not even archived yet in there machines
19:57 ^🔗	dashcloud	hi guys, just saw that via.me is dropping their filehosting part to focus on photo effects, effective August 1. There's a file download feature that will supposedly give you all of your stuff in a zipfile.
20:05 ^🔗	joepie91	dashcloud: ugh :/
20:42 ^🔗	Asparagir	Hello, good peoples. I'm about to upload my first ever Archive Team panicgrab to IA! Whee!
20:42 ^🔗	Asparagir	It's a 17.3 GB backup of a major genealogy website whose depths are not well-represented (yet) in the Wayback Machine.
20:43 ^🔗	Asparagir	I am planning to use this code to do it. Could someone please let me know if this looks okay?
20:43 ^🔗	SmileyG	Asparagir: awesome :)
20:43 ^🔗	Asparagir	curl --location --header 'x-amz-auto-make-bucket:1' \ --header 'x-archive-meta01-collection:archiveteam' \ --header 'x-archive-meta-mediatype:web' \ --header 'x-archive-meta-subject:genealogy;family history;family tree;research;website' \ --header 'x-archive-meta-title:Genealogy website crawl: JewishGen.org (July 2013) ' \ --header 'x-archive-meta-description:Archive of the Jewish genealogy website JewishGen.org, see <a href=http://www.jewishgen.org
20:43 ^🔗	SmileyG	errr code to upload it?
20:43 ^🔗	SmileyG	use s3upload script
20:44 ^🔗	Asparagir	Whoops, bottom part got cut off:
20:44 ^🔗	Asparagir	--header 'x-archive-size-hint:18565000000' \ --header "authorization: LOW $accesskey:$secret" \ --upload-file /home/archiveteam/jewishgen.org-panicgrab-20130710.warc.gz \ http://s3.us.archive.org/genealogy-website-crawls/jewishgen.org-panicgrab-20130710.warc.gz
20:44 ^🔗	Asparagir	s3upload script?
20:45 ^🔗	SmileyG	yah
20:45 ^🔗	SmileyG	Hopefully some awesome helpful person will point you to it shortly
20:45 ^🔗	SmileyG	or maybe you can google it, I believe it's called ia3upload.p
20:45 ^🔗	Asparagir	Is there any downside to using curl instead?
20:46 ^🔗	SmileyG	Asparagir: only ease of use afaik.
20:46 ^🔗	Asparagir	This is not on my home computer, it's a cloud server, so I don't use it for anything else.
20:46 ^🔗	Asparagir	It just chugs along and does its thing.
20:46 ^🔗	Asparagir	I have crappy Internet speeds at my house.
20:47 ^🔗	Asparagir	If this one goes well, I intend on doing wget/WARC backups of a lot more genealogy and family history websites, so they can get into the Wayback Machine.
20:47 ^🔗	Asparagir	We can't let the only cultural content that future historians see be gaming forums. :-)
20:50 ^🔗	Jonimus	lol why not :P
21:04 ^🔗	xmc	because vocal gamers are often miserable people
21:04 ^🔗	xmc	we have to present our best image to the future
21:15 ^🔗	Asparagir	Hahahahaha.
21:44 ^🔗	ersi	xmc: s/best image/not the suckiest/
22:04 ^🔗	DFJustin	as a gamer who dabbles in genealogy I approve of this
22:13 ^🔗	Asparagir	thumbs up
22:16 ^🔗	DFJustin	looks like there are some issues in your code though
22:16 ^🔗	DFJustin	you will not have permission to set the collection to archiveteam, stick with "opensource" (community texts)
22:17 ^🔗	DFJustin	also it looks like you are using "genealogy-website-crawls" as the item name, it would be preferable to have a separate item for each one, particularly if they are 17GB
22:22 ^🔗	DFJustin	you may want to do a trial item with something small first (e.g. a pdf) before going for something that big
22:29 ^🔗	xmc	yes, you need a different item per .warc.gz file for things to work out properly
22:46 ^🔗	Asparagir	Okay, so change this line to this: --header 'x-archive-meta01-collection:opensource'
22:47 ^🔗	Asparagir	And change the item name to the actual item name (even if that's redundant?) like this: http://s3.us.archive.org/jewishgen.org-panicgrab-20130710/jewishgen.org-panicgrab-20130710.warc.gz
22:47 ^🔗	Asparagir	?
22:48 ^🔗	Asparagir	And then after it eventually uploads, come back to IRC and let someone know they should move it to the ArchiveTeam collection?
22:48 ^🔗	DFJustin	yep
22:50 ^🔗	DFJustin	you may need to fix up the description and stuff later but you can do all that through the web interface
22:50 ^🔗	Asparagir	Okey dokey. Thanks for the help!
22:50 ^🔗	Asparagir	Also, I think I'm going to add --trace-ascii and --trace-time to curl to see if I can get some kind of feedback monitoring going on, for such a big upload.
22:51 ^🔗	DFJustin	what OS are you on
22:51 ^🔗	Asparagir	I'm SSH'ins into an Ubuntu 11 box.
22:51 ^🔗	DFJustin	k
22:53 ^🔗	Asparagir	Dumb question: does --upload-file /home/archiveteam/jewishgen.org-panicgrab-20130710.warc.gz need quote marks around the path and file name? Everything else has quote marks.
22:53 ^🔗	DFJustin	don't think so, unless there are spaces in the path
22:54 ^🔗	Asparagir	Okay, thanks.
23:06 ^🔗	godane	does anyone want to help fix my scanner to work in linux?
23:07 ^🔗	godane	i need to work on linux cause i want to be more productive
23:08 ^🔗	godane	i can't scan things in linux but can upload
23:08 ^🔗	godane	can't upload in windows but can scan
23:08 ^🔗	godane	i need to be able to do bother

irclogger-viewer