Time |
Nickname |
Message |
00:11
🔗
|
SketchCow |
Where's my hug! |
00:12
🔗
|
* |
arrith looks in couch cushions |
00:24
🔗
|
SketchCow |
http://imgur.com/bestof2011 |
00:39
🔗
|
arrith |
SketchCow: is there any kind of official policy or procedure for contacting sites about database dumps for Archive Team? as in, are there any people that do it or just anyone? or should it at least be run by someone like you first? |
00:41
🔗
|
dnova |
no official policy that I know of. I've contacted owners a few times with results ranging from: Wow, this is awesome I am flattered! to: fuck you fuck you fuck you |
00:41
🔗
|
dnova |
don't ever claim you represent archive.org or jason scott |
00:42
🔗
|
arrith |
yeah for sure. but what about claiming to be kinda part of Archive Team? i mean AT is quite loose afaik |
00:43
🔗
|
bbot_ |
sure why not |
00:43
🔗
|
dnova |
I think that is ok |
00:43
🔗
|
bbot_ |
when I emailed AO3 I said that I was "a volunteer with" AT |
00:43
🔗
|
bbot_ |
they of course refused to provide dumps, but I liked that phrase |
00:43
🔗
|
chronomex |
it'd be a good idea to run such letters by several other people, at least one or two of whom have been around for a while |
00:44
🔗
|
arrith |
chronomex: is there a list of people like that besides the OPs in this chan? |
00:45
🔗
|
chronomex |
not really |
00:45
🔗
|
arrith |
ah alright |
00:45
🔗
|
arrith |
so run idea and letters by ops |
00:47
🔗
|
arrith |
if a database dump of a site is acquired, is there an Archive Team server for such things or should the person who has the dump just try to hold onto it, and maybe mention it on the wiki, or make a torrent or something? |
00:47
🔗
|
dnova |
what is the goal? |
00:48
🔗
|
arrith |
well for fanfiction.net and reddit.com to maintain a live mirror. if that isn't possible then to maintain an offline mirror. |
00:50
🔗
|
arrith |
i'm thinking like the reocities-type sites. but if a torrent is the best one can do then i'll take the best there is |
00:52
🔗
|
dnova |
first of all, there is no way that reddit or fanfiction are going to be in any way okay with you hosting a mirror of their site |
00:53
🔗
|
dnova |
archiveteam doesn't exactly have a server, but there is some hardware at archive.org that SketchCow uses to ingest the stuff we grab. If you download something and jason wants it, you'll most likely be uploading it there |
00:54
🔗
|
arrith |
alright |
01:45
🔗
|
SketchCow |
Archiveteam has a tank |
01:46
🔗
|
bbot_ |
archivetank? |
01:46
🔗
|
SketchCow |
You can be part of archive team in terms of asking. |
01:51
🔗
|
arrith |
neat :) |
01:51
🔗
|
* |
arrith pins honorary badge on self |
02:00
🔗
|
godane |
SketchCow: i'm making something called slitaz-tank |
02:00
🔗
|
godane |
its a archive of linux sources that can maybe rebuild it self without the internet |
02:01
🔗
|
godane |
or very little of it |
02:57
🔗
|
arrith |
godane: to what degree? |
02:58
🔗
|
arrith |
godane: as in does it build binaries for a distro or is it a git repo of the kernel? |
03:28
🔗
|
godane |
arrith: it builds binary in order of depends |
03:28
🔗
|
godane |
it also has packages in iso too |
03:29
🔗
|
godane |
i recompressed the source tarballs with .tar.lzma and compressed png with optipng |
03:45
🔗
|
SketchCow |
Hey, everyone. |
03:46
🔗
|
SketchCow |
Pleased to say.... the google group upload of pages and files begins. |
03:48
🔗
|
SketchCow |
I started in the beginning: kz |
03:48
🔗
|
SketchCow |
http://www.archive.org/details/archiveteam-googlegroups-kz |
04:04
🔗
|
arrith |
woo! |
04:04
🔗
|
arrith |
godane: what distro? |
04:10
🔗
|
godane |
my own |
04:10
🔗
|
godane |
called slitaz-tank |
04:10
🔗
|
godane |
based on the slitaz project |
04:10
🔗
|
arrith |
ah |
04:11
🔗
|
godane |
the full iso is like 8gb |
04:11
🔗
|
arrith |
well if you have the sources for any distro you can rebuild it without the internet |
04:12
🔗
|
godane |
yes but sometimes its not has clear how to do it |
04:12
🔗
|
arrith |
godane: a side project might be to put together documentation on how to do it for major distros |
04:13
🔗
|
godane |
i also mirror the slitaz sites too |
04:13
🔗
|
godane |
i even fit xkcd and linuxgazette |
04:14
🔗
|
arrith |
that's good |
04:53
🔗
|
SketchCow |
Oh yeah, this is going to be NUTS. |
04:53
🔗
|
SketchCow |
NUTS. |
04:53
🔗
|
godane |
what is NUTS? |
04:53
🔗
|
Wyatt|Wor |
Now Uploading This Stuff? |
04:54
🔗
|
SketchCow |
Blowing Google Groups into archive.org. |
04:54
🔗
|
lemoncell |
ooooh |
04:54
🔗
|
Wyatt|Wor |
Delicious. |
04:54
🔗
|
SketchCow |
http://www.archive.org/details/archiveteam-googlegroups-00&reCache=1 |
04:54
🔗
|
Wyatt|Wor |
What's the derive process for tarballs? |
04:54
🔗
|
lemoncell |
my posts from 1991 will live FOREVER |
04:54
🔗
|
Wyatt|Wor |
Or...zips, it looks like? |
04:54
🔗
|
Wyatt|Wor |
Is there one? |
04:55
🔗
|
SketchCow |
Not posts. |
04:55
🔗
|
SketchCow |
These are just the page files, and the file collections, all of them destroyed by Google this year. |
04:55
🔗
|
lemoncell |
oh |
04:55
🔗
|
lemoncell |
wow |
04:56
🔗
|
SketchCow |
On September 22, 2010 Google announced plans for turning off the group pages suggesting users to move their content to Google Docs or Google Sites. Starting in November 2010, the group pages became read-only (allowing only viewing/downloading existing content) while in February 2011 they were turned-off completely.[16] |
04:56
🔗
|
lemoncell |
sounds sizeable |
04:58
🔗
|
SketchCow |
I've got scripts, calling scripts, calling scripts. |
04:58
🔗
|
SketchCow |
It's just going to keep running. |
04:58
🔗
|
SketchCow |
I am worried about that problematic buffer thing. |
04:59
🔗
|
SketchCow |
But, like http://www.archive.org/details/archiveteam-googlegroups-00 - that's DONE. |
04:59
🔗
|
lemoncell |
can it resume? |
04:59
🔗
|
SketchCow |
Yeah, it can resume fine. |
05:01
🔗
|
lemoncell |
you've done a man's job...too bad she won't live (blade runner) |
05:02
🔗
|
SketchCow |
985383 |
05:02
🔗
|
SketchCow |
root@teamarchive-0:/3/googlegroups# find . -name \*.zip | wc -l |
05:02
🔗
|
SketchCow |
985,000 individual sets of files (many groups have a pages.zip and a files.zip) |
05:02
🔗
|
SketchCow |
That's before the second-third wave of google group uploads. |
05:03
🔗
|
lemoncell |
impressive |
05:03
🔗
|
lemoncell |
then again this must be old hat for you now |
05:04
🔗
|
SketchCow |
Just have to step carefully. |
05:04
🔗
|
SketchCow |
But then, yeah, I have a program called Groupgrope that makes a collection, then assembles the file, then shoves them into grouphug, which uploads the individual files into the collection and slaps it on the ass to derive. |
05:05
🔗
|
Wyatt|Wor |
Haha, script naming after my own heart! |
05:05
🔗
|
lemoncell |
hehe |
05:06
🔗
|
SketchCow |
The main limit is that you can't have more than 1000 items in a given directory, which means I need to create special cases. |
05:06
🔗
|
SketchCow |
But a minor thing I can work around. |
05:08
🔗
|
arrith |
that's an odd limit |
05:09
🔗
|
SketchCow |
It's related to archive.org and not any filesystems. |
05:10
🔗
|
SketchCow |
2242164919 2011-12-04 23:56 ggroups_zipdl-wyatt.tgz |
05:10
🔗
|
SketchCow |
See? I have that of yours to integrate too. |
05:11
🔗
|
PatC |
SketchCow, is that first number a file number or unix time or something? |
05:12
🔗
|
Coderjoe |
size |
05:12
🔗
|
PatC |
ah |
05:12
🔗
|
Coderjoe |
then date/time |
05:12
🔗
|
SketchCow |
Penis length |
05:12
🔗
|
arrith |
in lightyears |
05:12
🔗
|
PatC |
lol |
05:12
🔗
|
* |
SketchCow thrusts into the horsehead nebula |
05:12
🔗
|
Coderjoe |
haha |
05:12
🔗
|
SketchCow |
take it horsey, take it |
05:12
🔗
|
PatC |
ha |
05:13
🔗
|
SketchCow |
It'll be centuries before you hear the pitiful whinny |
05:13
🔗
|
arrith |
in space no one can hear you upload all of google groups |
05:14
🔗
|
Coderjoe |
imagine how long it takes to reel it all in when flacid |
05:14
🔗
|
SketchCow |
It finished 02!! |
05:14
🔗
|
SketchCow |
http://www.archive.org/details/archiveteam-googlegroups-02 |
05:15
🔗
|
SketchCow |
http://www.archive.org/details/archiveteam-googlegroups-03 |
05:15
🔗
|
SketchCow |
Also just finished. It's gearing up for 04. |
05:15
🔗
|
SketchCow |
And so it will go. |
05:15
🔗
|
SketchCow |
So other than making sure it doesn't go into conniptions, which it will eventually, this is what I'll be having it do for probably two weeks. |
05:19
🔗
|
SketchCow |
05. |
05:19
🔗
|
SketchCow |
Enough live updating. The summary is this is now happening. |
05:23
🔗
|
DFJustin |
huzzah |
05:24
🔗
|
Coderjoe |
http://www.techdirt.com/articles/20111229/00243317220/as-godaddy-deals-with-sopa-fallout-hollywood-wants-to-punish-godaddy-enabling-infringement.shtml |
05:25
🔗
|
arrith |
SketchCow: could script that for an #at-status chan |
05:30
🔗
|
bsmith093 |
SketchCow: why are the google groups made up of lots or tiny zip files? wouldn't it make more sense to bundle them up, till they were significant sizes, maybe 50mb at least |
05:33
🔗
|
chronomex |
what's the benefit? |
05:34
🔗
|
bsmith093 |
how bug is this torrent of url shorteners |
05:34
🔗
|
SketchCow |
I want someone who wants "files from group whinybitches" to know they just need to go down to archiveteam-googlegroups-wh and it'll be there. |
05:34
🔗
|
bsmith093 |
chronomex: less files shorter etc |
05:35
🔗
|
chronomex |
bsmith093: I don't see the benefit there :P |
05:35
🔗
|
SketchCow |
Otherwise I might as well make one massive-ass tar.bz2 |
05:35
🔗
|
SketchCow |
Which I might |
05:36
🔗
|
bsmith093 |
well ok then. thats what I would do, is all I'm saying :d |
05:36
🔗
|
bsmith093 |
and how big would that tar actually be? |
05:36
🔗
|
SketchCow |
But first, I don't want our stuff to keep following the trend of "12 people in the planet would suffer the pain to extract what they need", espcially with something like this, where millions of people come at it from different angles. |
05:37
🔗
|
SketchCow |
That tar is at least half a terabyte, at least. |
05:37
🔗
|
bsmith093 |
oy, well ok then, i hadnt realized it was so freakin huge, wow, thats rather inpressive |
05:38
🔗
|
SketchCow |
Google. Groups. |
05:38
🔗
|
SketchCow |
You expected a USB key? |
05:39
🔗
|
bsmith093 |
hey random thought you know wikitaxi, and hos it takes the pages articles bz2 file from a wikidump, and turns it into a taxi file which is then basicallly portable? |
05:39
🔗
|
bsmith093 |
Ive got the latest dump all taxied up; wat a copy? |
05:40
🔗
|
bsmith093 |
its faster than making another and this way they ( wikitaxi users) dont have to do it |
05:45
🔗
|
Wyatt|Wor |
So SketchCow, what are your plans for MAGFest? Is it just a field test for your new gear, or do you have some interviews lined up? |
05:55
🔗
|
SketchCow |
Just field test, capture interviews as I can |
05:56
🔗
|
Wyatt|Wor |
Cool. |
05:56
🔗
|
Wyatt|Wor |
A lot of musicians I respect end up going there, so I was curious. |
06:49
🔗
|
godane |
SketchCow: geocities should be sorted like google groups |
06:50
🔗
|
godane |
mostly cause you can get a a site with out extacting the full geocities backup |
10:06
🔗
|
Nemo_bis |
But why didn't Google just automatically move all those files to new Google Sites...? (Which is what I did by hand with multiple groups.) |
15:42
🔗
|
Soojin |
http://i.imgur.com/nYanb.jpg |
15:52
🔗
|
Nemo_bis |
Soojin, solution here: fireworks forbidden to reduce smog (Milan and other big cities in Italy) |
15:52
🔗
|
Soojin |
:) |
16:19
🔗
|
Schbirid |
Nemo_bis: http://kuvaton.com/kuvei/chris_and_sun_comic.jpg |
16:31
🔗
|
Nemo_bis |
:D |
16:31
🔗
|
Nemo_bis |
I guess that's chronomex after moving, to upload Splinder data |
16:51
🔗
|
Coderjoe |
the stick guy would need a computer rather than two halves to a shirt box on top of some other empty furnature box acting like a desk |
17:12
🔗
|
SketchCow |
Brewster stepped in, I'm doing the google groups slightly differently. |
17:26
🔗
|
Nemo_bis |
he even restarted a derive of mine today |
19:45
🔗
|
SketchCow |
Archive.org is looking into making public 798 infomercials. |
19:45
🔗
|
SketchCow |
Ranging from an hour to multiple hours. |
19:46
🔗
|
dashcloud |
whoa |
19:56
🔗
|
Nemo_bis |
wtf? http://www.us.archive.org/log_show.php?task_id=92391694 22 h of deriving to derive nothing? |
19:56
🔗
|
Nemo_bis |
hm, JPEG Thumb |
19:58
🔗
|
Nemo_bis |
and 14 h of crwaling |
20:56
🔗
|
Nemo_bis |
Splinder under maintenance |
21:04
🔗
|
kennethre |
Nemo_bis: is today the day? |
21:06
🔗
|
Nemo_bis |
no, still a month, but not available now |
21:45
🔗
|
SketchCow |
bsmith093: Really? the file is named "applediskimages", no extension? |
21:45
🔗
|
bsmith093 |
its a zip |
21:45
🔗
|
SketchCow |
Yeah, just sussed |
21:46
🔗
|
bsmith093 |
sorry for the possibly horrible organization, wasnt me. |
21:49
🔗
|
Nemo_bis |
SketchCow, the new format for Google Groups is tidy, but what's the size limit of the zip before zipview.php fails? |
21:51
🔗
|
SketchCow |
Not clear |
21:52
🔗
|
SketchCow |
It handled a 3gb fine. |
21:59
🔗
|
Nemo_bis |
I think the limit is not much more, perhaps 5 or 7 GB |
22:04
🔗
|
Nemo_bis |
5 works http://ia600506.us.archive.org/tarview.php?tar=/31/items/wiki.guildwars.com/wikiguildwarscom-20110717-images.tar |
22:04
🔗
|
Nemo_bis |
(but that's tar) |
22:06
🔗
|
Nemo_bis |
19 definitely don't :) http://ia700508.us.archive.org/tarview.php?tar=/29/items/Infictive.com/infictivecom-20110712-images.tar&file=infictivecom-20110712-images/Ztar.jpg |
22:55
🔗
|
Nemo_bis |
I must say that the fireworks prohibition is not being respected very much here. |
23:55
🔗
|
Soojin |
interesting observation: commercial music video totally made up of material taken from archive.org (with the exceptio of the dude singing) https://www.youtube.com/watch?v=fK0_PVaF8Pg |