#archiveteam 2013-05-19,Sun

↑back Search

Time Nickname Message
00:02 πŸ”— link343 Has anyone contacted this Jason Scott fellow about his 4chan archive?
00:03 πŸ”— link343 He claimed to have 10 Million threads back in 2009
00:03 πŸ”— link343 http://ascii.textfiles.com/archives/2083
00:03 πŸ”— DFJustin he's here as SketchCow
00:03 πŸ”— link343 oh
00:03 πŸ”— DFJustin as far as I know he was persuaded not to spread it for the time being but eventually the internet archive will get
00:03 πŸ”— link343 I understand
00:03 πŸ”— DFJustin it
00:04 πŸ”— omf_ I also have a newer 109gb snap shot of images and posts
00:04 πŸ”— link343 I just shot the guy who runs rbt.asia a email about his archive
00:04 πŸ”— link343 he archives /w/, /soc/, /mu/, /clg/, and /g/
00:05 πŸ”— link343 I've got some /mu/ threads from 2011 backed up
00:05 πŸ”— link343 about a gig. Unfortunately in HTML
00:05 πŸ”— link343 but they work
00:05 πŸ”— DFJustin there was some talk recently about the chanarchive.org collection too, don't recall the outcome
00:07 πŸ”— link343 Is there a suggested compression setting scheme for 7zip anywhere?
00:07 πŸ”— link343 I have ultra on, but I see there are other options.
00:09 πŸ”— lart also set lzma2 you can play with the dictonary, word, and block size but ultra normally does the best imho.
00:12 πŸ”— link343 alright
00:12 πŸ”— ivan` I have hundreds of GB of 4chanarchive in httrack format
00:13 πŸ”— link343 cool
00:14 πŸ”— link343 I've been backing up a few MediaWikis in the last few days
00:19 πŸ”— ivan` heh, google reader does not canonicalize https:// and http:// feed URLs for wordpress blogs
00:19 πŸ”— ivan` that was confusing
00:19 πŸ”— ivan` guess we'll have to grab everything ;)
02:52 πŸ”— ivan` is there a wget-lua with gzip support?
02:52 πŸ”— ivan` something like https://github.com/kravietz/wget-gzip
02:54 πŸ”— ivan` or https://github.com/ptolts/wget-with-gzip-compression
02:57 πŸ”— ivan` which for some reason is forked off a 10 year old wget :/
03:26 πŸ”— ivan` someone have a good channel name for the greader grab? :)
03:36 πŸ”— ivan` also, can someone fork https://github.com/ludios/greader-grab into ArchiveTeam and give `github.com/ivan` write access?
03:38 πŸ”— ivan` also, is there a convenient existing thing that could be used for collecting .opml files and feed URLs from users?
03:38 πŸ”— ivan` perhaps a pastebin under archiveteam control
03:42 πŸ”— DFJustin http://paste.archivingyoursh.it/
03:43 πŸ”— godane so i'm getting g4 confessions of a booth babe
03:48 πŸ”— ivan` DFJustin: thank
03:48 πŸ”— ivan` s
04:07 πŸ”— GLaDOS ivan`: done
04:07 πŸ”— GLaDOS https://github.com/archiveteam/greader-grab
04:09 πŸ”— ivan` "howdoireadgoogle grants 1 user push access to 1 repository" heh thanks :)
04:12 πŸ”— ivan` I should experiment with my own universal-tracker instance, right?
04:13 πŸ”— GLaDOS I can set up a test tracker for you
04:13 πŸ”— GLaDOS Just give me a few items
04:14 πŸ”— ivan` thanks, will let you know when I have something useful
08:08 πŸ”— omf_ SketchCow, before you talked about a 'hint' field in metadata so you can tell IA how large something is going to be
08:30 πŸ”— DFJustin x-archive-size-hint:19327352832
08:33 πŸ”— omf_ Do I have to send that in the header or is that a metadata.csv thing
08:35 πŸ”— DFJustin would have to be in the header, all the metadata.csv stuff is of the form x-archive-meta-xxxx
08:35 πŸ”— DFJustin and if you're uploading multiple files it needs to be in the first request that creates the bucket
09:04 πŸ”— ivan` is #livingandloving too obscure a reference for the channel name? ;)
09:21 πŸ”— ivan` http://www.archiveteam.org/index.php?title=Google_Reader
09:53 πŸ”— omf_ ivan`, what does your wget --version look like
09:53 πŸ”— omf_ mine has gzip support via libz
09:53 πŸ”— omf_ unless that only works for making the gz warc files
09:53 πŸ”— GLaDOS https://twitter.com/at_warrior/status/336052787404238848 lets get this thing on the road
09:54 πŸ”— Smiley GLaDOS: oh your alive!
09:55 πŸ”— GLaDOS hi
09:55 πŸ”— Smiley 1. we need a channel?
09:57 πŸ”— Smiley 2. the takeout gives you something not listed on that site
09:57 πŸ”— Smiley i guess you wuld want the subscriptions.xml
09:58 πŸ”— Smiley {"files":[{"name":"subscriptions.xml","size":3484}]}
09:58 πŸ”— Smiley Don't know if that worked either...
09:58 πŸ”— GLaDOS https://twitter.com/at_warrior/status/336058209263562752 fixed
09:59 πŸ”— Smiley lol hmmm
09:59 πŸ”— Smiley yeah but have you done an upload?
09:59 πŸ”— Smiley {"files":[{"name":"subscriptions.xml","size":3484}]} << thats not a helpful return page
09:59 πŸ”— GLaDOS I never really used reader
10:00 πŸ”— GLaDOS ivan`: ples asplen
10:00 πŸ”— Smiley when I upload, thats what I get back
10:00 πŸ”— Smiley Even a "Thanks for your upload" would be better.
10:02 πŸ”— Smiley So yeah, we are asking users for OPML files, yet google takeout doesn't provide thoes.
10:03 πŸ”— ivan` omf_: right, only the warc
10:03 πŸ”— ivan` Smiley: if you run all of the JavaScript, it's friendlier
10:03 πŸ”— ivan` I'll try to fix it for the other case, it was a rush job
10:04 πŸ”— Smiley ivan`: did the second time
10:04 πŸ”— Smiley and still got that page back D:
10:04 πŸ”— Smiley Oh it didn't load the rest, weird.
10:04 πŸ”— Smiley Ah ok thats better :)
10:05 πŸ”— ivan` GLaDOS: "I have always believed that technology should do the hard workҀ”discovery, organization, communicationҀ”so users can do what makes them happiest: living and loving, not messing with annoying computers!" https://investor.google.com/corporate/2012/ceo-letter.html
10:05 πŸ”— ivan` yeah, I need to serve all the JavaScript from my domain
10:05 πŸ”— ivan` Smiley: takeout provides the OPML file inside the .zip
10:05 πŸ”— GLaDOS hue
10:06 πŸ”— Smiley 7 json files + 1 xml
10:06 πŸ”— antomatic I like Google Reader, still use it every day... suppose I really need to hop to an alternative sooner rather than later, though...
10:07 πŸ”— Smiley ivan`: not for me :/
10:07 πŸ”— ivan` the .xml is the OPML file
10:07 πŸ”— ivan` is it missing in your zip?
10:07 πŸ”— Smiley i have the .xml file
10:08 πŸ”— Smiley no where does it say anything about opml.... D:
10:08 πŸ”— ivan` <opml version="1.0">
10:08 πŸ”— Smiley don't expect readers to read.
10:08 πŸ”— Smiley :/
10:15 πŸ”— TrojanEel For the URL collector, after you upload a file, the process is done?
10:15 πŸ”— ivan` yes, I'll go mention this
10:16 πŸ”— TrojanEel Please :-)
10:16 πŸ”— TrojanEel I added the spokenword.org archive of RSS feeds
10:16 πŸ”— ivan` thanks
10:21 πŸ”— ivan` I've starting backing up submissions to my machine, I have 5 so far
10:23 πŸ”— TrojanEel you can get the entire list of feeds if you find a way to crawl https://www.google.com/reader/directory/search?q=english
10:23 πŸ”— TrojanEel (and other keywords)
10:24 πŸ”— ivan` nice find
10:24 πŸ”— ivan` there's also the recommendations feature
10:26 πŸ”— TrojanEel we really need a channel though - #googleread? #donereading?
10:26 πŸ”— ivan` I like #donereading
10:27 πŸ”— antomatic #readingisfundamental ?
10:27 πŸ”— antomatic :)
10:28 πŸ”— omf_ just call it #googleburner
10:30 πŸ”— antomatic #fahrenheit451
10:30 πŸ”— ivan` donereading is pretty clever
10:31 πŸ”— ivan` and subdued instead of irritated
10:32 πŸ”— * ivan` updates the wiki
12:52 πŸ”— omf_ music.aol.com and www.theboot.com have been backed up. The other AOL music sites are in progress
13:18 πŸ”— Howlin1 So Rapidshare might be closing down soon(ish)
13:32 πŸ”— PepsiMax heh
16:38 πŸ”— none295 Not sure if this is in the Archive Team's boundaries, but there is a building, a museum, which is slated to be demolished. It's only 12 years old, so it's a little unusual. I'm offering the architectural autoCAD drawings and specifications. http://mafa.noneinc.com The ReadMe.nfo has some links to newspaper articles on the issue.
16:45 πŸ”— dashcloud if you've got the items in your possession, you should reach out to SketchCow
16:45 πŸ”— dashcloud I didn't realize the items were already uploaded to that site
16:51 πŸ”— omf_ I am grabbing those few files now
16:51 πŸ”— none295 Yes, it's that 23mb RAR file. Haven't heard of any group working to save architectural drawings. Typically there are copyright concerns as with everything else, but as this is a building which is to be demolished, prematurely, wondering if makes for a good example to see if anyone wants to get into the conversation.
16:52 πŸ”— asie hey, are we backuping tumblr yet?
16:53 πŸ”— omf_ no
16:57 πŸ”— hneio asie: not enough space
16:57 πŸ”— hneio plain and simple
16:57 πŸ”— asie hneio: at least part of it, we managed a part of geocities
16:57 πŸ”— asie and i have this feeling tumblr will go through the same fate
16:57 πŸ”— asie geocities web 2.0
17:00 πŸ”— ivan` the google reader grab will grab tumblr's text content, heh
17:01 πŸ”— ivan` then someone can buy the exabytes of disks needed to store all the porn
17:06 πŸ”— Smiley lol
17:32 πŸ”— blueskin having a few upload problems... server keps hitting max connections.
17:33 πŸ”— Smiley blueskin: hmmmmm we are too successful,.
17:33 πŸ”— Smiley Just leave it running and it'll eventually go through,.
17:39 πŸ”— blueskin well, at least it shows plenty of people working, indeed.
17:40 πŸ”— hneio Smiley: will the upload server remain up for some days after the deadline?
17:46 πŸ”— Smiley hneio: upload server is ours
17:46 πŸ”— Smiley it'll remain there until theres nothing left to upload.
17:51 πŸ”— blueskin archive all the things!
17:52 πŸ”— Smiley Indeed.
18:21 πŸ”— pronoiac I know the rsync server is swamped due to Formspring.
18:22 πŸ”— pronoiac I took a shot at implementing an exponential backoff with failed attempts.
18:22 πŸ”— pronoiac I posted a pull request on seesaw.
18:22 πŸ”— pronoiac But my naive attempt doesn't work with concurrent items.
18:25 πŸ”— pronoiac Maybe someone else will find this helpful, or at least a step in the right direction.
18:28 πŸ”— ivan` how about raising the connection limit? ;)
19:13 πŸ”— godane i found the gwbush intereview with zdtv
19:13 πŸ”— godane or techtv
19:14 πŸ”— godane this was before the election
23:10 πŸ”— dashcloud hi folks, I'd like to remind everyone that there is an AOL archiving project (yes, that AOL- the one you used to dial into) in the works, and we could really use your help in #aohell. Happy to answer your questions here or there.

irclogger-viewer