[01:17] please archive http://polytroncorporation.com asap [01:18] and https://twitter.com/polytron if possible [01:31] balrog: i'm grabing the site right now [01:31] ok [09:06] Hello all: looking to talk to someone in UrlTeam? :) [09:07] Have a userscript project I was told ye might like: https://gitorious.org/cguserscripts/unbitly [09:07] Online API keeps a cache of recent URLs, but I could come up with something cleaner if it could be integrated with UrlTeam's efforts, for a more permanent backup of cached URLs: https://cathalgarvey.pythonanywhere.com/unbitly/dump [09:07] It was designed as a privacy shiv, not an archivist solution, y'see. [09:09] cathalgar, /join #urlteam [09:10] cathalgar, I pasted your conversion into #urlteam [09:21] K, thanks [09:33] going to see if i can get a 2tb hard drive for under $8 [10:05] so looks like all my tasks are waiting [10:06] its not even archived yet in there machines [19:57] hi guys, just saw that via.me is dropping their filehosting part to focus on photo effects, effective August 1. There's a file download feature that will supposedly give you all of your stuff in a zipfile. [20:05] dashcloud: ugh :/ [20:42] Hello, good peoples. I'm about to upload my first ever Archive Team panicgrab to IA! Whee! [20:42] It's a 17.3 GB backup of a major genealogy website whose depths are not well-represented (yet) in the Wayback Machine. [20:43] I am planning to use this code to do it. Could someone please let me know if this looks okay? [20:43] Asparagir: awesome :) [20:43] curl --location --header 'x-amz-auto-make-bucket:1' \ --header 'x-archive-meta01-collection:archiveteam' \ --header 'x-archive-meta-mediatype:web' \ --header 'x-archive-meta-subject:genealogy;family history;family tree;research;website' \ --header 'x-archive-meta-title:Genealogy website crawl: JewishGen.org (July 2013) ' \ --header 'x-archive-meta-description:Archive of the Jewish genealogy website JewishGen.org, see errr code to upload it? [20:43] use s3upload script [20:44] Whoops, bottom part got cut off: [20:44] --header 'x-archive-size-hint:18565000000' \ --header "authorization: LOW $accesskey:$secret" \ --upload-file /home/archiveteam/jewishgen.org-panicgrab-20130710.warc.gz \ http://s3.us.archive.org/genealogy-website-crawls/jewishgen.org-panicgrab-20130710.warc.gz [20:44] s3upload script? [20:45] yah [20:45] Hopefully some awesome helpful person will point you to it shortly [20:45] or maybe you can google it, I believe it's called ia3upload.p [20:45] Is there any downside to using curl instead? [20:46] Asparagir: only ease of use afaik. [20:46] This is not on my home computer, it's a cloud server, so I don't use it for anything else. [20:46] It just chugs along and does its thing. [20:46] I have crappy Internet speeds at my house. [20:47] If this one goes well, I intend on doing wget/WARC backups of a lot more genealogy and family history websites, so they can get into the Wayback Machine. [20:47] We can't let the *only* cultural content that future historians see be gaming forums. :-) [20:50] lol why not :P [21:04] because vocal gamers are often miserable people [21:04] we have to present our best image to the future [21:15] Hahahahaha. [21:44] xmc: s/best image/not the suckiest/ [22:04] as a gamer who dabbles in genealogy I approve of this [22:13] *thumbs up* [22:16] looks like there are some issues in your code though [22:16] you will not have permission to set the collection to archiveteam, stick with "opensource" (community texts) [22:17] also it looks like you are using "genealogy-website-crawls" as the item name, it would be preferable to have a separate item for each one, particularly if they are 17GB [22:22] you may want to do a trial item with something small first (e.g. a pdf) before going for something that big [22:29] yes, you need a different item per .warc.gz file for things to work out properly [22:46] Okay, so change this line to this: --header 'x-archive-meta01-collection:opensource' [22:47] And change the item name to the actual item name (even if that's redundant?) like this: http://s3.us.archive.org/jewishgen.org-panicgrab-20130710/jewishgen.org-panicgrab-20130710.warc.gz [22:47] ? [22:48] And then after it eventually uploads, come back to IRC and let someone know they should move it to the ArchiveTeam collection? [22:48] yep [22:50] you may need to fix up the description and stuff later but you can do all that through the web interface [22:50] Okey dokey. Thanks for the help! [22:50] Also, I think I'm going to add --trace-ascii and --trace-time to curl to see if I can get some kind of feedback monitoring going on, for such a big upload. [22:51] what OS are you on [22:51] I'm SSH'ins into an Ubuntu 11 box. [22:51] k [22:53] Dumb question: does --upload-file /home/archiveteam/jewishgen.org-panicgrab-20130710.warc.gz need quote marks around the path and file name? Everything else has quote marks. [22:53] don't think so, unless there are spaces in the path [22:54] Okay, thanks. [23:06] does anyone want to help fix my scanner to work in linux? [23:07] i need to work on linux cause i want to be more productive [23:08] i can't scan things in linux but can upload [23:08] can't upload in windows but can scan [23:08] i need to be able to do bother