#archiveteam 2011-12-31,Sat

↑back Search

Time Nickname Message
00:11 🔗 SketchCow Where's my hug!
00:12 🔗 * arrith looks in couch cushions
00:24 🔗 SketchCow http://imgur.com/bestof2011
00:39 🔗 arrith SketchCow: is there any kind of official policy or procedure for contacting sites about database dumps for Archive Team? as in, are there any people that do it or just anyone? or should it at least be run by someone like you first?
00:41 🔗 dnova no official policy that I know of. I've contacted owners a few times with results ranging from: Wow, this is awesome I am flattered! to: fuck you fuck you fuck you
00:41 🔗 dnova don't ever claim you represent archive.org or jason scott
00:42 🔗 arrith yeah for sure. but what about claiming to be kinda part of Archive Team? i mean AT is quite loose afaik
00:43 🔗 bbot_ sure why not
00:43 🔗 dnova I think that is ok
00:43 🔗 bbot_ when I emailed AO3 I said that I was "a volunteer with" AT
00:43 🔗 bbot_ they of course refused to provide dumps, but I liked that phrase
00:43 🔗 chronomex it'd be a good idea to run such letters by several other people, at least one or two of whom have been around for a while
00:44 🔗 arrith chronomex: is there a list of people like that besides the OPs in this chan?
00:45 🔗 chronomex not really
00:45 🔗 arrith ah alright
00:45 🔗 arrith so run idea and letters by ops
00:47 🔗 arrith if a database dump of a site is acquired, is there an Archive Team server for such things or should the person who has the dump just try to hold onto it, and maybe mention it on the wiki, or make a torrent or something?
00:47 🔗 dnova what is the goal?
00:48 🔗 arrith well for fanfiction.net and reddit.com to maintain a live mirror. if that isn't possible then to maintain an offline mirror.
00:50 🔗 arrith i'm thinking like the reocities-type sites. but if a torrent is the best one can do then i'll take the best there is
00:52 🔗 dnova first of all, there is no way that reddit or fanfiction are going to be in any way okay with you hosting a mirror of their site
00:53 🔗 dnova archiveteam doesn't exactly have a server, but there is some hardware at archive.org that SketchCow uses to ingest the stuff we grab. If you download something and jason wants it, you'll most likely be uploading it there
00:54 🔗 arrith alright
01:45 🔗 SketchCow Archiveteam has a tank
01:46 🔗 bbot_ archivetank?
01:46 🔗 SketchCow You can be part of archive team in terms of asking.
01:51 🔗 arrith neat :)
01:51 🔗 * arrith pins honorary badge on self
02:00 🔗 godane SketchCow: i'm making something called slitaz-tank
02:00 🔗 godane its a archive of linux sources that can maybe rebuild it self without the internet
02:01 🔗 godane or very little of it
02:57 🔗 arrith godane: to what degree?
02:58 🔗 arrith godane: as in does it build binaries for a distro or is it a git repo of the kernel?
03:28 🔗 godane arrith: it builds binary in order of depends
03:28 🔗 godane it also has packages in iso too
03:29 🔗 godane i recompressed the source tarballs with .tar.lzma and compressed png with optipng
03:45 🔗 SketchCow Hey, everyone.
03:46 🔗 SketchCow Pleased to say.... the google group upload of pages and files begins.
03:48 🔗 SketchCow I started in the beginning: kz
03:48 🔗 SketchCow http://www.archive.org/details/archiveteam-googlegroups-kz
04:04 🔗 arrith woo!
04:04 🔗 arrith godane: what distro?
04:10 🔗 godane my own
04:10 🔗 godane called slitaz-tank
04:10 🔗 godane based on the slitaz project
04:10 🔗 arrith ah
04:11 🔗 godane the full iso is like 8gb
04:11 🔗 arrith well if you have the sources for any distro you can rebuild it without the internet
04:12 🔗 godane yes but sometimes its not has clear how to do it
04:12 🔗 arrith godane: a side project might be to put together documentation on how to do it for major distros
04:13 🔗 godane i also mirror the slitaz sites too
04:13 🔗 godane i even fit xkcd and linuxgazette
04:14 🔗 arrith that's good
04:53 🔗 SketchCow Oh yeah, this is going to be NUTS.
04:53 🔗 SketchCow NUTS.
04:53 🔗 godane what is NUTS?
04:53 🔗 Wyatt|Wor Now Uploading This Stuff?
04:54 🔗 SketchCow Blowing Google Groups into archive.org.
04:54 🔗 lemoncell ooooh
04:54 🔗 Wyatt|Wor Delicious.
04:54 🔗 SketchCow http://www.archive.org/details/archiveteam-googlegroups-00&reCache=1
04:54 🔗 Wyatt|Wor What's the derive process for tarballs?
04:54 🔗 lemoncell my posts from 1991 will live FOREVER
04:54 🔗 Wyatt|Wor Or...zips, it looks like?
04:54 🔗 Wyatt|Wor Is there one?
04:55 🔗 SketchCow Not posts.
04:55 🔗 SketchCow These are just the page files, and the file collections, all of them destroyed by Google this year.
04:55 🔗 lemoncell oh
04:55 🔗 lemoncell wow
04:56 🔗 SketchCow On September 22, 2010 Google announced plans for turning off the group pages suggesting users to move their content to Google Docs or Google Sites. Starting in November 2010, the group pages became read-only (allowing only viewing/downloading existing content) while in February 2011 they were turned-off completely.[16]
04:56 🔗 lemoncell sounds sizeable
04:58 🔗 SketchCow I've got scripts, calling scripts, calling scripts.
04:58 🔗 SketchCow It's just going to keep running.
04:58 🔗 SketchCow I am worried about that problematic buffer thing.
04:59 🔗 SketchCow But, like http://www.archive.org/details/archiveteam-googlegroups-00 - that's DONE.
04:59 🔗 lemoncell can it resume?
04:59 🔗 SketchCow Yeah, it can resume fine.
05:01 🔗 lemoncell you've done a man's job...too bad she won't live (blade runner)
05:02 🔗 SketchCow 985383
05:02 🔗 SketchCow root@teamarchive-0:/3/googlegroups# find . -name \*.zip | wc -l
05:02 🔗 SketchCow 985,000 individual sets of files (many groups have a pages.zip and a files.zip)
05:02 🔗 SketchCow That's before the second-third wave of google group uploads.
05:03 🔗 lemoncell impressive
05:03 🔗 lemoncell then again this must be old hat for you now
05:04 🔗 SketchCow Just have to step carefully.
05:04 🔗 SketchCow But then, yeah, I have a program called Groupgrope that makes a collection, then assembles the file, then shoves them into grouphug, which uploads the individual files into the collection and slaps it on the ass to derive.
05:05 🔗 Wyatt|Wor Haha, script naming after my own heart!
05:05 🔗 lemoncell hehe
05:06 🔗 SketchCow The main limit is that you can't have more than 1000 items in a given directory, which means I need to create special cases.
05:06 🔗 SketchCow But a minor thing I can work around.
05:08 🔗 arrith that's an odd limit
05:09 🔗 SketchCow It's related to archive.org and not any filesystems.
05:10 🔗 SketchCow 2242164919 2011-12-04 23:56 ggroups_zipdl-wyatt.tgz
05:10 🔗 SketchCow See? I have that of yours to integrate too.
05:11 🔗 PatC SketchCow, is that first number a file number or unix time or something?
05:12 🔗 Coderjoe size
05:12 🔗 PatC ah
05:12 🔗 Coderjoe then date/time
05:12 🔗 SketchCow Penis length
05:12 🔗 arrith in lightyears
05:12 🔗 PatC lol
05:12 🔗 * SketchCow thrusts into the horsehead nebula
05:12 🔗 Coderjoe haha
05:12 🔗 SketchCow take it horsey, take it
05:12 🔗 PatC ha
05:13 🔗 SketchCow It'll be centuries before you hear the pitiful whinny
05:13 🔗 arrith in space no one can hear you upload all of google groups
05:14 🔗 Coderjoe imagine how long it takes to reel it all in when flacid
05:14 🔗 SketchCow It finished 02!!
05:14 🔗 SketchCow http://www.archive.org/details/archiveteam-googlegroups-02
05:15 🔗 SketchCow http://www.archive.org/details/archiveteam-googlegroups-03
05:15 🔗 SketchCow Also just finished. It's gearing up for 04.
05:15 🔗 SketchCow And so it will go.
05:15 🔗 SketchCow So other than making sure it doesn't go into conniptions, which it will eventually, this is what I'll be having it do for probably two weeks.
05:19 🔗 SketchCow 05.
05:19 🔗 SketchCow Enough live updating. The summary is this is now happening.
05:23 🔗 DFJustin huzzah
05:24 🔗 Coderjoe http://www.techdirt.com/articles/20111229/00243317220/as-godaddy-deals-with-sopa-fallout-hollywood-wants-to-punish-godaddy-enabling-infringement.shtml
05:25 🔗 arrith SketchCow: could script that for an #at-status chan
05:30 🔗 bsmith093 SketchCow: why are the google groups made up of lots or tiny zip files? wouldn't it make more sense to bundle them up, till they were significant sizes, maybe 50mb at least
05:33 🔗 chronomex what's the benefit?
05:34 🔗 bsmith093 how bug is this torrent of url shorteners
05:34 🔗 SketchCow I want someone who wants "files from group whinybitches" to know they just need to go down to archiveteam-googlegroups-wh and it'll be there.
05:34 🔗 bsmith093 chronomex: less files shorter etc
05:35 🔗 chronomex bsmith093: I don't see the benefit there :P
05:35 🔗 SketchCow Otherwise I might as well make one massive-ass tar.bz2
05:35 🔗 SketchCow Which I might
05:36 🔗 bsmith093 well ok then. thats what I would do, is all I'm saying :d
05:36 🔗 bsmith093 and how big would that tar actually be?
05:36 🔗 SketchCow But first, I don't want our stuff to keep following the trend of "12 people in the planet would suffer the pain to extract what they need", espcially with something like this, where millions of people come at it from different angles.
05:37 🔗 SketchCow That tar is at least half a terabyte, at least.
05:37 🔗 bsmith093 oy, well ok then, i hadnt realized it was so freakin huge, wow, thats rather inpressive
05:38 🔗 SketchCow Google. Groups.
05:38 🔗 SketchCow You expected a USB key?
05:39 🔗 bsmith093 hey random thought you know wikitaxi, and hos it takes the pages articles bz2 file from a wikidump, and turns it into a taxi file which is then basicallly portable?
05:39 🔗 bsmith093 Ive got the latest dump all taxied up; wat a copy?
05:40 🔗 bsmith093 its faster than making another and this way they ( wikitaxi users) dont have to do it
05:45 🔗 Wyatt|Wor So SketchCow, what are your plans for MAGFest? Is it just a field test for your new gear, or do you have some interviews lined up?
05:55 🔗 SketchCow Just field test, capture interviews as I can
05:56 🔗 Wyatt|Wor Cool.
05:56 🔗 Wyatt|Wor A lot of musicians I respect end up going there, so I was curious.
06:49 🔗 godane SketchCow: geocities should be sorted like google groups
06:50 🔗 godane mostly cause you can get a a site with out extacting the full geocities backup
10:06 🔗 Nemo_bis But why didn't Google just automatically move all those files to new Google Sites...? (Which is what I did by hand with multiple groups.)
15:42 🔗 Soojin http://i.imgur.com/nYanb.jpg
15:52 🔗 Nemo_bis Soojin, solution here: fireworks forbidden to reduce smog (Milan and other big cities in Italy)
15:52 🔗 Soojin :)
16:19 🔗 Schbirid Nemo_bis: http://kuvaton.com/kuvei/chris_and_sun_comic.jpg
16:31 🔗 Nemo_bis :D
16:31 🔗 Nemo_bis I guess that's chronomex after moving, to upload Splinder data
16:51 🔗 Coderjoe the stick guy would need a computer rather than two halves to a shirt box on top of some other empty furnature box acting like a desk
17:12 🔗 SketchCow Brewster stepped in, I'm doing the google groups slightly differently.
17:26 🔗 Nemo_bis he even restarted a derive of mine today
19:45 🔗 SketchCow Archive.org is looking into making public 798 infomercials.
19:45 🔗 SketchCow Ranging from an hour to multiple hours.
19:46 🔗 dashcloud whoa
19:56 🔗 Nemo_bis wtf? http://www.us.archive.org/log_show.php?task_id=92391694 22 h of deriving to derive nothing?
19:56 🔗 Nemo_bis hm, JPEG Thumb
19:58 🔗 Nemo_bis and 14 h of crwaling
20:56 🔗 Nemo_bis Splinder under maintenance
21:04 🔗 kennethre Nemo_bis: is today the day?
21:06 🔗 Nemo_bis no, still a month, but not available now
21:45 🔗 SketchCow bsmith093: Really? the file is named "applediskimages", no extension?
21:45 🔗 bsmith093 its a zip
21:45 🔗 SketchCow Yeah, just sussed
21:46 🔗 bsmith093 sorry for the possibly horrible organization, wasnt me.
21:49 🔗 Nemo_bis SketchCow, the new format for Google Groups is tidy, but what's the size limit of the zip before zipview.php fails?
21:51 🔗 SketchCow Not clear
21:52 🔗 SketchCow It handled a 3gb fine.
21:59 🔗 Nemo_bis I think the limit is not much more, perhaps 5 or 7 GB
22:04 🔗 Nemo_bis 5 works http://ia600506.us.archive.org/tarview.php?tar=/31/items/wiki.guildwars.com/wikiguildwarscom-20110717-images.tar
22:04 🔗 Nemo_bis (but that's tar)
22:06 🔗 Nemo_bis 19 definitely don't :) http://ia700508.us.archive.org/tarview.php?tar=/29/items/Infictive.com/infictivecom-20110712-images.tar&file=infictivecom-20110712-images/Ztar.jpg
22:55 🔗 Nemo_bis I must say that the fireworks prohibition is not being respected very much here.
23:55 🔗 Soojin interesting observation: commercial music video totally made up of material taken from archive.org (with the exceptio of the dude singing) https://www.youtube.com/watch?v=fK0_PVaF8Pg

irclogger-viewer