Time |
Nickname |
Message |
00:02
π
|
link343 |
Has anyone contacted this Jason Scott fellow about his 4chan archive? |
00:03
π
|
link343 |
He claimed to have 10 Million threads back in 2009 |
00:03
π
|
link343 |
http://ascii.textfiles.com/archives/2083 |
00:03
π
|
DFJustin |
he's here as SketchCow |
00:03
π
|
link343 |
oh |
00:03
π
|
DFJustin |
as far as I know he was persuaded not to spread it for the time being but eventually the internet archive will get |
00:03
π
|
link343 |
I understand |
00:03
π
|
DFJustin |
it |
00:04
π
|
omf_ |
I also have a newer 109gb snap shot of images and posts |
00:04
π
|
link343 |
I just shot the guy who runs rbt.asia a email about his archive |
00:04
π
|
link343 |
he archives /w/, /soc/, /mu/, /clg/, and /g/ |
00:05
π
|
link343 |
I've got some /mu/ threads from 2011 backed up |
00:05
π
|
link343 |
about a gig. Unfortunately in HTML |
00:05
π
|
link343 |
but they work |
00:05
π
|
DFJustin |
there was some talk recently about the chanarchive.org collection too, don't recall the outcome |
00:07
π
|
link343 |
Is there a suggested compression setting scheme for 7zip anywhere? |
00:07
π
|
link343 |
I have ultra on, but I see there are other options. |
00:09
π
|
lart |
also set lzma2 you can play with the dictonary, word, and block size but ultra normally does the best imho. |
00:12
π
|
link343 |
alright |
00:12
π
|
ivan` |
I have hundreds of GB of 4chanarchive in httrack format |
00:13
π
|
link343 |
cool |
00:14
π
|
link343 |
I've been backing up a few MediaWikis in the last few days |
00:19
π
|
ivan` |
heh, google reader does not canonicalize https:// and http:// feed URLs for wordpress blogs |
00:19
π
|
ivan` |
that was confusing |
00:19
π
|
ivan` |
guess we'll have to grab everything ;) |
02:52
π
|
ivan` |
is there a wget-lua with gzip support? |
02:52
π
|
ivan` |
something like https://github.com/kravietz/wget-gzip |
02:54
π
|
ivan` |
or https://github.com/ptolts/wget-with-gzip-compression |
02:57
π
|
ivan` |
which for some reason is forked off a 10 year old wget :/ |
03:26
π
|
ivan` |
someone have a good channel name for the greader grab? :) |
03:36
π
|
ivan` |
also, can someone fork https://github.com/ludios/greader-grab into ArchiveTeam and give `github.com/ivan` write access? |
03:38
π
|
ivan` |
also, is there a convenient existing thing that could be used for collecting .opml files and feed URLs from users? |
03:38
π
|
ivan` |
perhaps a pastebin under archiveteam control |
03:42
π
|
DFJustin |
http://paste.archivingyoursh.it/ |
03:43
π
|
godane |
so i'm getting g4 confessions of a booth babe |
03:48
π
|
ivan` |
DFJustin: thank |
03:48
π
|
ivan` |
s |
04:07
π
|
GLaDOS |
ivan`: done |
04:07
π
|
GLaDOS |
https://github.com/archiveteam/greader-grab |
04:09
π
|
ivan` |
"howdoireadgoogle grants 1 user push access to 1 repository" heh thanks :) |
04:12
π
|
ivan` |
I should experiment with my own universal-tracker instance, right? |
04:13
π
|
GLaDOS |
I can set up a test tracker for you |
04:13
π
|
GLaDOS |
Just give me a few items |
04:14
π
|
ivan` |
thanks, will let you know when I have something useful |
08:08
π
|
omf_ |
SketchCow, before you talked about a 'hint' field in metadata so you can tell IA how large something is going to be |
08:30
π
|
DFJustin |
x-archive-size-hint:19327352832 |
08:33
π
|
omf_ |
Do I have to send that in the header or is that a metadata.csv thing |
08:35
π
|
DFJustin |
would have to be in the header, all the metadata.csv stuff is of the form x-archive-meta-xxxx |
08:35
π
|
DFJustin |
and if you're uploading multiple files it needs to be in the first request that creates the bucket |
09:04
π
|
ivan` |
is #livingandloving too obscure a reference for the channel name? ;) |
09:21
π
|
ivan` |
http://www.archiveteam.org/index.php?title=Google_Reader |
09:53
π
|
omf_ |
ivan`, what does your wget --version look like |
09:53
π
|
omf_ |
mine has gzip support via libz |
09:53
π
|
omf_ |
unless that only works for making the gz warc files |
09:53
π
|
GLaDOS |
https://twitter.com/at_warrior/status/336052787404238848 lets get this thing on the road |
09:54
π
|
Smiley |
GLaDOS: oh your alive! |
09:55
π
|
GLaDOS |
hi |
09:55
π
|
Smiley |
1. we need a channel? |
09:57
π
|
Smiley |
2. the takeout gives you something not listed on that site |
09:57
π
|
Smiley |
i guess you wuld want the subscriptions.xml |
09:58
π
|
Smiley |
{"files":[{"name":"subscriptions.xml","size":3484}]} |
09:58
π
|
Smiley |
Don't know if that worked either... |
09:58
π
|
GLaDOS |
https://twitter.com/at_warrior/status/336058209263562752 fixed |
09:59
π
|
Smiley |
lol hmmm |
09:59
π
|
Smiley |
yeah but have you done an upload? |
09:59
π
|
Smiley |
{"files":[{"name":"subscriptions.xml","size":3484}]} << thats not a helpful return page |
09:59
π
|
GLaDOS |
I never really used reader |
10:00
π
|
GLaDOS |
ivan`: ples asplen |
10:00
π
|
Smiley |
when I upload, thats what I get back |
10:00
π
|
Smiley |
Even a "Thanks for your upload" would be better. |
10:02
π
|
Smiley |
So yeah, we are asking users for OPML files, yet google takeout doesn't provide thoes. |
10:03
π
|
ivan` |
omf_: right, only the warc |
10:03
π
|
ivan` |
Smiley: if you run all of the JavaScript, it's friendlier |
10:03
π
|
ivan` |
I'll try to fix it for the other case, it was a rush job |
10:04
π
|
Smiley |
ivan`: did the second time |
10:04
π
|
Smiley |
and still got that page back D: |
10:04
π
|
Smiley |
Oh it didn't load the rest, weird. |
10:04
π
|
Smiley |
Ah ok thats better :) |
10:05
π
|
ivan` |
GLaDOS: "I have always believed that technology should do the hard workΓ’ΒΒdiscovery, organization, communicationΓ’ΒΒso users can do what makes them happiest: living and loving, not messing with annoying computers!" https://investor.google.com/corporate/2012/ceo-letter.html |
10:05
π
|
ivan` |
yeah, I need to serve all the JavaScript from my domain |
10:05
π
|
ivan` |
Smiley: takeout provides the OPML file inside the .zip |
10:05
π
|
GLaDOS |
hue |
10:06
π
|
Smiley |
7 json files + 1 xml |
10:06
π
|
antomatic |
I like Google Reader, still use it every day... suppose I really need to hop to an alternative sooner rather than later, though... |
10:07
π
|
Smiley |
ivan`: not for me :/ |
10:07
π
|
ivan` |
the .xml is the OPML file |
10:07
π
|
ivan` |
is it missing in your zip? |
10:07
π
|
Smiley |
i have the .xml file |
10:08
π
|
Smiley |
no where does it say anything about opml.... D: |
10:08
π
|
ivan` |
<opml version="1.0"> |
10:08
π
|
Smiley |
don't expect readers to read. |
10:08
π
|
Smiley |
:/ |
10:15
π
|
TrojanEel |
For the URL collector, after you upload a file, the process is done? |
10:15
π
|
ivan` |
yes, I'll go mention this |
10:16
π
|
TrojanEel |
Please :-) |
10:16
π
|
TrojanEel |
I added the spokenword.org archive of RSS feeds |
10:16
π
|
ivan` |
thanks |
10:21
π
|
ivan` |
I've starting backing up submissions to my machine, I have 5 so far |
10:23
π
|
TrojanEel |
you can get the entire list of feeds if you find a way to crawl https://www.google.com/reader/directory/search?q=english |
10:23
π
|
TrojanEel |
(and other keywords) |
10:24
π
|
ivan` |
nice find |
10:24
π
|
ivan` |
there's also the recommendations feature |
10:26
π
|
TrojanEel |
we really need a channel though - #googleread? #donereading? |
10:26
π
|
ivan` |
I like #donereading |
10:27
π
|
antomatic |
#readingisfundamental ? |
10:27
π
|
antomatic |
:) |
10:28
π
|
omf_ |
just call it #googleburner |
10:30
π
|
antomatic |
#fahrenheit451 |
10:30
π
|
ivan` |
donereading is pretty clever |
10:31
π
|
ivan` |
and subdued instead of irritated |
10:32
π
|
* |
ivan` updates the wiki |
12:52
π
|
omf_ |
music.aol.com and www.theboot.com have been backed up. The other AOL music sites are in progress |
13:18
π
|
Howlin1 |
So Rapidshare might be closing down soon(ish) |
13:32
π
|
PepsiMax |
heh |
16:38
π
|
none295 |
Not sure if this is in the Archive Team's boundaries, but there is a building, a museum, which is slated to be demolished. It's only 12 years old, so it's a little unusual. I'm offering the architectural autoCAD drawings and specifications. http://mafa.noneinc.com The ReadMe.nfo has some links to newspaper articles on the issue. |
16:45
π
|
dashcloud |
if you've got the items in your possession, you should reach out to SketchCow |
16:45
π
|
dashcloud |
I didn't realize the items were already uploaded to that site |
16:51
π
|
omf_ |
I am grabbing those few files now |
16:51
π
|
none295 |
Yes, it's that 23mb RAR file. Haven't heard of any group working to save architectural drawings. Typically there are copyright concerns as with everything else, but as this is a building which is to be demolished, prematurely, wondering if makes for a good example to see if anyone wants to get into the conversation. |
16:52
π
|
asie |
hey, are we backuping tumblr yet? |
16:53
π
|
omf_ |
no |
16:57
π
|
hneio |
asie: not enough space |
16:57
π
|
hneio |
plain and simple |
16:57
π
|
asie |
hneio: at least part of it, we managed a part of geocities |
16:57
π
|
asie |
and i have this feeling tumblr will go through the same fate |
16:57
π
|
asie |
geocities web 2.0 |
17:00
π
|
ivan` |
the google reader grab will grab tumblr's text content, heh |
17:01
π
|
ivan` |
then someone can buy the exabytes of disks needed to store all the porn |
17:06
π
|
Smiley |
lol |
17:32
π
|
blueskin |
having a few upload problems... server keps hitting max connections. |
17:33
π
|
Smiley |
blueskin: hmmmmm we are too successful,. |
17:33
π
|
Smiley |
Just leave it running and it'll eventually go through,. |
17:39
π
|
blueskin |
well, at least it shows plenty of people working, indeed. |
17:40
π
|
hneio |
Smiley: will the upload server remain up for some days after the deadline? |
17:46
π
|
Smiley |
hneio: upload server is ours |
17:46
π
|
Smiley |
it'll remain there until theres nothing left to upload. |
17:51
π
|
blueskin |
archive all the things! |
17:52
π
|
Smiley |
Indeed. |
18:21
π
|
pronoiac |
I know the rsync server is swamped due to Formspring. |
18:22
π
|
pronoiac |
I took a shot at implementing an exponential backoff with failed attempts. |
18:22
π
|
pronoiac |
I posted a pull request on seesaw. |
18:22
π
|
pronoiac |
But my naive attempt doesn't work with concurrent items. |
18:25
π
|
pronoiac |
Maybe someone else will find this helpful, or at least a step in the right direction. |
18:28
π
|
ivan` |
how about raising the connection limit? ;) |
19:13
π
|
godane |
i found the gwbush intereview with zdtv |
19:13
π
|
godane |
or techtv |
19:14
π
|
godane |
this was before the election |
23:10
π
|
dashcloud |
hi folks, I'd like to remind everyone that there is an AOL archiving project (yes, that AOL- the one you used to dial into) in the works, and we could really use your help in #aohell. Happy to answer your questions here or there. |