#archiveteam 2013-07-28,Sun

↑back Search

Time Nickname Message
01:17 🔗 balrog please archive http://polytroncorporation.com asap
01:18 🔗 balrog and https://twitter.com/polytron if possible
01:31 🔗 godane balrog: i'm grabing the site right now
01:31 🔗 balrog ok
09:06 🔗 cathalgar Hello all: looking to talk to someone in UrlTeam? :)
09:07 🔗 cathalgar Have a userscript project I was told ye might like: https://gitorious.org/cguserscripts/unbitly
09:07 🔗 cathalgar Online API keeps a cache of recent URLs, but I could come up with something cleaner if it could be integrated with UrlTeam's efforts, for a more permanent backup of cached URLs: https://cathalgarvey.pythonanywhere.com/unbitly/dump
09:07 🔗 cathalgar It was designed as a privacy shiv, not an archivist solution, y'see.
09:09 🔗 BlueMax cathalgar, /join #urlteam
09:10 🔗 omf_ cathalgar, I pasted your conversion into #urlteam
09:21 🔗 cathalgar K, thanks
09:33 🔗 godane going to see if i can get a 2tb hard drive for under $8
10:05 🔗 godane so looks like all my tasks are waiting
10:06 🔗 godane its not even archived yet in there machines
19:57 🔗 dashcloud hi guys, just saw that via.me is dropping their filehosting part to focus on photo effects, effective August 1. There's a file download feature that will supposedly give you all of your stuff in a zipfile.
20:05 🔗 joepie91 dashcloud: ugh :/
20:42 🔗 Asparagir Hello, good peoples. I'm about to upload my first ever Archive Team panicgrab to IA! Whee!
20:42 🔗 Asparagir It's a 17.3 GB backup of a major genealogy website whose depths are not well-represented (yet) in the Wayback Machine.
20:43 🔗 Asparagir I am planning to use this code to do it. Could someone please let me know if this looks okay?
20:43 🔗 SmileyG Asparagir: awesome :)
20:43 🔗 Asparagir curl --location --header 'x-amz-auto-make-bucket:1' \ --header 'x-archive-meta01-collection:archiveteam' \ --header 'x-archive-meta-mediatype:web' \ --header 'x-archive-meta-subject:genealogy;family history;family tree;research;website' \ --header 'x-archive-meta-title:Genealogy website crawl: JewishGen.org (July 2013) ' \ --header 'x-archive-meta-description:Archive of the Jewish genealogy website JewishGen.org, see <a href=http://www.jewishgen.org
20:43 🔗 SmileyG errr code to upload it?
20:43 🔗 SmileyG use s3upload script
20:44 🔗 Asparagir Whoops, bottom part got cut off:
20:44 🔗 Asparagir --header 'x-archive-size-hint:18565000000' \ --header "authorization: LOW $accesskey:$secret" \ --upload-file /home/archiveteam/jewishgen.org-panicgrab-20130710.warc.gz \ http://s3.us.archive.org/genealogy-website-crawls/jewishgen.org-panicgrab-20130710.warc.gz
20:44 🔗 Asparagir s3upload script?
20:45 🔗 SmileyG yah
20:45 🔗 SmileyG Hopefully some awesome helpful person will point you to it shortly
20:45 🔗 SmileyG or maybe you can google it, I believe it's called ia3upload.p
20:45 🔗 Asparagir Is there any downside to using curl instead?
20:46 🔗 SmileyG Asparagir: only ease of use afaik.
20:46 🔗 Asparagir This is not on my home computer, it's a cloud server, so I don't use it for anything else.
20:46 🔗 Asparagir It just chugs along and does its thing.
20:46 🔗 Asparagir I have crappy Internet speeds at my house.
20:47 🔗 Asparagir If this one goes well, I intend on doing wget/WARC backups of a lot more genealogy and family history websites, so they can get into the Wayback Machine.
20:47 🔗 Asparagir We can't let the *only* cultural content that future historians see be gaming forums. :-)
20:50 🔗 Jonimus lol why not :P
21:04 🔗 xmc because vocal gamers are often miserable people
21:04 🔗 xmc we have to present our best image to the future
21:15 🔗 Asparagir Hahahahaha.
21:44 🔗 ersi xmc: s/best image/not the suckiest/
22:04 🔗 DFJustin as a gamer who dabbles in genealogy I approve of this
22:13 🔗 Asparagir *thumbs up*
22:16 🔗 DFJustin looks like there are some issues in your code though
22:16 🔗 DFJustin you will not have permission to set the collection to archiveteam, stick with "opensource" (community texts)
22:17 🔗 DFJustin also it looks like you are using "genealogy-website-crawls" as the item name, it would be preferable to have a separate item for each one, particularly if they are 17GB
22:22 🔗 DFJustin you may want to do a trial item with something small first (e.g. a pdf) before going for something that big
22:29 🔗 xmc yes, you need a different item per .warc.gz file for things to work out properly
22:46 🔗 Asparagir Okay, so change this line to this: --header 'x-archive-meta01-collection:opensource'
22:47 🔗 Asparagir And change the item name to the actual item name (even if that's redundant?) like this: http://s3.us.archive.org/jewishgen.org-panicgrab-20130710/jewishgen.org-panicgrab-20130710.warc.gz
22:47 🔗 Asparagir ?
22:48 🔗 Asparagir And then after it eventually uploads, come back to IRC and let someone know they should move it to the ArchiveTeam collection?
22:48 🔗 DFJustin yep
22:50 🔗 DFJustin you may need to fix up the description and stuff later but you can do all that through the web interface
22:50 🔗 Asparagir Okey dokey. Thanks for the help!
22:50 🔗 Asparagir Also, I think I'm going to add --trace-ascii and --trace-time to curl to see if I can get some kind of feedback monitoring going on, for such a big upload.
22:51 🔗 DFJustin what OS are you on
22:51 🔗 Asparagir I'm SSH'ins into an Ubuntu 11 box.
22:51 🔗 DFJustin k
22:53 🔗 Asparagir Dumb question: does --upload-file /home/archiveteam/jewishgen.org-panicgrab-20130710.warc.gz need quote marks around the path and file name? Everything else has quote marks.
22:53 🔗 DFJustin don't think so, unless there are spaces in the path
22:54 🔗 Asparagir Okay, thanks.
23:06 🔗 godane does anyone want to help fix my scanner to work in linux?
23:07 🔗 godane i need to work on linux cause i want to be more productive
23:08 🔗 godane i can't scan things in linux but can upload
23:08 🔗 godane can't upload in windows but can scan
23:08 🔗 godane i need to be able to do bother

irclogger-viewer