Time |
Nickname |
Message |
01:17
🔗
|
balrog |
please archive http://polytroncorporation.com asap |
01:18
🔗
|
balrog |
and https://twitter.com/polytron if possible |
01:31
🔗
|
godane |
balrog: i'm grabing the site right now |
01:31
🔗
|
balrog |
ok |
09:06
🔗
|
cathalgar |
Hello all: looking to talk to someone in UrlTeam? :) |
09:07
🔗
|
cathalgar |
Have a userscript project I was told ye might like: https://gitorious.org/cguserscripts/unbitly |
09:07
🔗
|
cathalgar |
Online API keeps a cache of recent URLs, but I could come up with something cleaner if it could be integrated with UrlTeam's efforts, for a more permanent backup of cached URLs: https://cathalgarvey.pythonanywhere.com/unbitly/dump |
09:07
🔗
|
cathalgar |
It was designed as a privacy shiv, not an archivist solution, y'see. |
09:09
🔗
|
BlueMax |
cathalgar, /join #urlteam |
09:10
🔗
|
omf_ |
cathalgar, I pasted your conversion into #urlteam |
09:21
🔗
|
cathalgar |
K, thanks |
09:33
🔗
|
godane |
going to see if i can get a 2tb hard drive for under $8 |
10:05
🔗
|
godane |
so looks like all my tasks are waiting |
10:06
🔗
|
godane |
its not even archived yet in there machines |
19:57
🔗
|
dashcloud |
hi guys, just saw that via.me is dropping their filehosting part to focus on photo effects, effective August 1. There's a file download feature that will supposedly give you all of your stuff in a zipfile. |
20:05
🔗
|
joepie91 |
dashcloud: ugh :/ |
20:42
🔗
|
Asparagir |
Hello, good peoples. I'm about to upload my first ever Archive Team panicgrab to IA! Whee! |
20:42
🔗
|
Asparagir |
It's a 17.3 GB backup of a major genealogy website whose depths are not well-represented (yet) in the Wayback Machine. |
20:43
🔗
|
Asparagir |
I am planning to use this code to do it. Could someone please let me know if this looks okay? |
20:43
🔗
|
SmileyG |
Asparagir: awesome :) |
20:43
🔗
|
Asparagir |
curl --location --header 'x-amz-auto-make-bucket:1' \ --header 'x-archive-meta01-collection:archiveteam' \ --header 'x-archive-meta-mediatype:web' \ --header 'x-archive-meta-subject:genealogy;family history;family tree;research;website' \ --header 'x-archive-meta-title:Genealogy website crawl: JewishGen.org (July 2013) ' \ --header 'x-archive-meta-description:Archive of the Jewish genealogy website JewishGen.org, see <a href=http://www.jewishgen.org |
20:43
🔗
|
SmileyG |
errr code to upload it? |
20:43
🔗
|
SmileyG |
use s3upload script |
20:44
🔗
|
Asparagir |
Whoops, bottom part got cut off: |
20:44
🔗
|
Asparagir |
--header 'x-archive-size-hint:18565000000' \ --header "authorization: LOW $accesskey:$secret" \ --upload-file /home/archiveteam/jewishgen.org-panicgrab-20130710.warc.gz \ http://s3.us.archive.org/genealogy-website-crawls/jewishgen.org-panicgrab-20130710.warc.gz |
20:44
🔗
|
Asparagir |
s3upload script? |
20:45
🔗
|
SmileyG |
yah |
20:45
🔗
|
SmileyG |
Hopefully some awesome helpful person will point you to it shortly |
20:45
🔗
|
SmileyG |
or maybe you can google it, I believe it's called ia3upload.p |
20:45
🔗
|
Asparagir |
Is there any downside to using curl instead? |
20:46
🔗
|
SmileyG |
Asparagir: only ease of use afaik. |
20:46
🔗
|
Asparagir |
This is not on my home computer, it's a cloud server, so I don't use it for anything else. |
20:46
🔗
|
Asparagir |
It just chugs along and does its thing. |
20:46
🔗
|
Asparagir |
I have crappy Internet speeds at my house. |
20:47
🔗
|
Asparagir |
If this one goes well, I intend on doing wget/WARC backups of a lot more genealogy and family history websites, so they can get into the Wayback Machine. |
20:47
🔗
|
Asparagir |
We can't let the *only* cultural content that future historians see be gaming forums. :-) |
20:50
🔗
|
Jonimus |
lol why not :P |
21:04
🔗
|
xmc |
because vocal gamers are often miserable people |
21:04
🔗
|
xmc |
we have to present our best image to the future |
21:15
🔗
|
Asparagir |
Hahahahaha. |
21:44
🔗
|
ersi |
xmc: s/best image/not the suckiest/ |
22:04
🔗
|
DFJustin |
as a gamer who dabbles in genealogy I approve of this |
22:13
🔗
|
Asparagir |
*thumbs up* |
22:16
🔗
|
DFJustin |
looks like there are some issues in your code though |
22:16
🔗
|
DFJustin |
you will not have permission to set the collection to archiveteam, stick with "opensource" (community texts) |
22:17
🔗
|
DFJustin |
also it looks like you are using "genealogy-website-crawls" as the item name, it would be preferable to have a separate item for each one, particularly if they are 17GB |
22:22
🔗
|
DFJustin |
you may want to do a trial item with something small first (e.g. a pdf) before going for something that big |
22:29
🔗
|
xmc |
yes, you need a different item per .warc.gz file for things to work out properly |
22:46
🔗
|
Asparagir |
Okay, so change this line to this: --header 'x-archive-meta01-collection:opensource' |
22:47
🔗
|
Asparagir |
And change the item name to the actual item name (even if that's redundant?) like this: http://s3.us.archive.org/jewishgen.org-panicgrab-20130710/jewishgen.org-panicgrab-20130710.warc.gz |
22:47
🔗
|
Asparagir |
? |
22:48
🔗
|
Asparagir |
And then after it eventually uploads, come back to IRC and let someone know they should move it to the ArchiveTeam collection? |
22:48
🔗
|
DFJustin |
yep |
22:50
🔗
|
DFJustin |
you may need to fix up the description and stuff later but you can do all that through the web interface |
22:50
🔗
|
Asparagir |
Okey dokey. Thanks for the help! |
22:50
🔗
|
Asparagir |
Also, I think I'm going to add --trace-ascii and --trace-time to curl to see if I can get some kind of feedback monitoring going on, for such a big upload. |
22:51
🔗
|
DFJustin |
what OS are you on |
22:51
🔗
|
Asparagir |
I'm SSH'ins into an Ubuntu 11 box. |
22:51
🔗
|
DFJustin |
k |
22:53
🔗
|
Asparagir |
Dumb question: does --upload-file /home/archiveteam/jewishgen.org-panicgrab-20130710.warc.gz need quote marks around the path and file name? Everything else has quote marks. |
22:53
🔗
|
DFJustin |
don't think so, unless there are spaces in the path |
22:54
🔗
|
Asparagir |
Okay, thanks. |
23:06
🔗
|
godane |
does anyone want to help fix my scanner to work in linux? |
23:07
🔗
|
godane |
i need to work on linux cause i want to be more productive |
23:08
🔗
|
godane |
i can't scan things in linux but can upload |
23:08
🔗
|
godane |
can't upload in windows but can scan |
23:08
🔗
|
godane |
i need to be able to do bother |