[00:04] *** signius has joined #archiveteam [00:17] *** dashcloud has quit IRC (Read error: Connection reset by peer) [00:17] *** dashcloud has joined #archiveteam [00:28] *** BlueMaxim has joined #archiveteam [00:56] *** Rotab has quit IRC (hub.se irc.du.se) [01:27] *** Boppen has joined #archiveteam [01:30] *** Boppen has quit IRC (hub.se irc.du.se) [01:48] *** xtr-201 has joined #archiveteam [02:02] *** dashcloud has quit IRC (Read error: Connection reset by peer) [02:02] *** dashcloud has joined #archiveteam [02:07] *** mistym has quit IRC (Remote host closed the connection) [02:13] *** primus104 has quit IRC (Leaving.) [02:21] *** mistym has joined #archiveteam [02:28] *** BlueMaxim has quit IRC (Read error: Operation timed out) [02:28] *** BlueMaxim has joined #archiveteam [03:18] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [03:19] *** garyrh has quit IRC (Remote host closed the connection) [03:42] *** garyrh has joined #archiveteam [04:11] *** BlueMaxim has joined #archiveteam [04:30] *** VonGuard_ is now known as VonGuard [05:09] *** antomatic has quit IRC (Read error: Connection reset by peer) [05:09] *** lytv has quit IRC (Read error: Connection reset by peer) [05:09] *** fresco___ has quit IRC (hub.dk efnet.port80.se) [05:09] *** VonGuard has quit IRC (hub.dk efnet.port80.se) [05:09] *** russss has quit IRC (hub.dk efnet.port80.se) [05:09] *** deathy has quit IRC (hub.dk efnet.port80.se) [05:09] *** danneh_ has quit IRC (hub.dk efnet.port80.se) [05:09] *** LittUp has quit IRC (hub.dk efnet.port80.se) [05:09] *** Muad-Dib has quit IRC (hub.dk efnet.port80.se) [05:09] *** Rickster has quit IRC (hub.dk efnet.port80.se) [05:09] *** lhobas has quit IRC (hub.dk efnet.port80.se) [05:09] *** nox has quit IRC (Read error: Operation timed out) [05:09] *** NovaKing_ has quit IRC (Read error: Operation timed out) [05:09] *** yipdw has quit IRC (hub.dk irc.homelien.no) [05:09] *** pikhq has quit IRC (hub.dk irc.homelien.no) [05:09] *** altlabel has quit IRC (hub.dk irc.homelien.no) [05:09] *** ionpulse has quit IRC (hub.dk irc.homelien.no) [05:09] *** antomati_ has joined #archiveteam [05:09] *** NovaKing_ has joined #archiveteam [05:09] *** nox has joined #archiveteam [05:11] *** lytv has joined #archiveteam [05:17] *** antomati_ has quit IRC (hub.efnet.us hub.dk) [05:17] *** Zebranky_ has quit IRC (hub.efnet.us hub.dk) [05:17] *** Fusl has quit IRC (hub.efnet.us hub.dk) [05:17] *** ryan__ has quit IRC (hub.efnet.us hub.dk) [05:17] *** ruukasu has quit IRC (hub.efnet.us hub.dk) [05:17] *** Deewiant has quit IRC (hub.efnet.us hub.dk) [05:17] *** edsu_ has quit IRC (hub.efnet.us hub.dk) [05:17] *** Kazzy has quit IRC (hub.efnet.us hub.dk) [05:17] *** ex-parrot has quit IRC (hub.efnet.us hub.dk) [05:17] *** Gfy has quit IRC (hub.efnet.us hub.dk) [05:17] *** SketchCow has quit IRC (hub.efnet.us hub.dk) [05:17] *** w0rp has quit IRC (hub.efnet.us hub.dk) [05:17] *** Sellyme_ has quit IRC (hub.efnet.us hub.dk) [05:17] *** jk[SVP] has quit IRC (hub.efnet.us hub.dk) [05:17] *** Kniffy has quit IRC (hub.efnet.us hub.dk) [05:17] *** Kenshin has quit IRC (hub.efnet.us hub.dk) [05:17] *** Nemo_bis has quit IRC (hub.efnet.us hub.dk) [05:17] *** yan has quit IRC (hub.efnet.us hub.dk) [05:17] *** nico_32 has quit IRC (hub.efnet.us hub.dk) [05:17] *** raylee has quit IRC (hub.efnet.us hub.dk) [05:17] *** Atluxity has quit IRC (hub.efnet.us hub.dk) [05:17] *** is- has quit IRC (hub.efnet.us hub.dk) [05:17] *** nox has quit IRC (hub.efnet.us hub.dk) [05:17] *** NovaKing_ has quit IRC (hub.efnet.us hub.dk) [05:17] *** espes__ has quit IRC (hub.efnet.us hub.dk) [05:17] *** aNthraXx has quit IRC (hub.efnet.us hub.dk) [05:17] *** cadbury_ has quit IRC (hub.efnet.us hub.dk) [05:17] *** underscor has quit IRC (hub.efnet.us hub.dk) [05:17] *** Sue__ has quit IRC (hub.efnet.us hub.dk) [05:17] *** gibigian1 has quit IRC (hub.efnet.us hub.dk) [05:17] *** kanzure_ has quit IRC (hub.efnet.us hub.dk) [05:17] *** lukeman has quit IRC (hub.efnet.us hub.dk) [05:17] *** warthurto has quit IRC (hub.efnet.us hub.dk) [05:17] *** Sk1d has quit IRC (hub.efnet.us hub.dk) [05:17] *** Void_ has quit IRC (hub.efnet.us hub.dk) [05:21] *** espes___ has joined #archiveteam [05:48] *** dashcloud has quit IRC (Quit: No Ping reply in 210 seconds.) [05:50] *** dashcloud has joined #archiveteam [06:26] *** dashcloud has quit IRC (Read error: Connection reset by peer) [06:27] *** dashcloud has joined #archiveteam [07:15] *** cadbury_ has joined #archiveteam [07:15] *** lhobas has joined #archiveteam [07:15] *** Muad-Dib has joined #archiveteam [07:15] *** Rickster has joined #archiveteam [07:15] *** danneh_ has joined #archiveteam [07:15] *** LittUp has joined #archiveteam [07:15] *** deathy has joined #archiveteam [07:15] *** russss has joined #archiveteam [07:15] *** VonGuard has joined #archiveteam [07:15] *** fresco___ has joined #archiveteam [07:15] *** warthurto has joined #archiveteam [07:15] *** lukeman has joined #archiveteam [07:15] *** Sue__ has joined #archiveteam [07:15] *** aNthraXx has joined #archiveteam [07:15] *** Void_ has joined #archiveteam [07:15] *** Rotab has joined #archiveteam [07:15] *** underscor has joined #archiveteam [07:15] *** ionpulse has joined #archiveteam [07:15] *** altlabel has joined #archiveteam [07:15] *** pikhq has joined #archiveteam [07:15] *** yipdw has joined #archiveteam [07:15] *** gibigiana has joined #archiveteam [07:15] *** Sk1d has joined #archiveteam [07:15] *** antomati_ has joined #archiveteam [07:15] *** Nemo_bis has joined #archiveteam [07:15] *** yan has joined #archiveteam [07:15] *** nico_32 has joined #archiveteam [07:15] *** Fusl has joined #archiveteam [07:15] *** Zebranky_ has joined #archiveteam [07:15] *** ryan__ has joined #archiveteam [07:15] *** is- has joined #archiveteam [07:15] *** ruukasu has joined #archiveteam [07:15] *** Deewiant has joined #archiveteam [07:15] *** raylee has joined #archiveteam [07:15] *** edsu_ has joined #archiveteam [07:15] *** Kazzy has joined #archiveteam [07:15] *** ex-parrot has joined #archiveteam [07:15] *** jk[SVP] has joined #archiveteam [07:15] *** Sellyme_ has joined #archiveteam [07:15] *** w0rp has joined #archiveteam [07:15] *** SketchCow has joined #archiveteam [07:15] *** Gfy has joined #archiveteam [07:15] *** Kenshin has joined #archiveteam [07:15] *** Kniffy has joined #archiveteam [07:15] *** Atluxity has joined #archiveteam [07:15] *** hub.se sets mode: +ooo raylee SketchCow Kenshin [07:15] *** swebb sets mode: +o underscor [07:15] *** swebb sets mode: +o SketchCow [07:17] *** kanzure has joined #archiveteam [07:59] *** Jonimus has quit IRC (Ping timeout: 370 seconds) [08:06] *** mistym has quit IRC (Remote host closed the connection) [08:10] *** Jonimus has joined #archiveteam [09:03] *** schbirid has joined #archiveteam [09:04] *** dashcloud has quit IRC (Read error: Connection reset by peer) [09:04] *** dashcloud has joined #archiveteam [09:17] *** Ymgve has joined #archiveteam [09:19] *** primus104 has joined #archiveteam [09:39] *** antomati_ is now known as antomatic [10:10] *** primus104 has quit IRC (Leaving.) [10:22] *** Sk1d has quit IRC (Ping timeout: 265 seconds) [10:25] *** Sk1d has joined #archiveteam [10:45] *** Sk2d has joined #archiveteam [10:46] *** Sk1d has quit IRC (Read error: Operation timed out) [10:46] *** Sk2d is now known as Sk1d [11:25] *** dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) [11:26] *** dashcloud has joined #archiveteam [11:31] Alright, I want to grab a big-ass mirror of a niche art site that includes a lot of stuff that has been ¨deleted¨ from the net earlier, it´s probably multiple TBs and seems to have limited bandwidth, shall I just put it in archivebot or do we grab this seperately? http://vj5pbopejlhcbz4n.onion.city/indexes [11:32] ¨deleted¨ from the site [11:32] * [11:35] *** Sk1d has quit IRC (Ping timeout: 265 seconds) [11:35] I want a copy of this too [11:36] there's a lot of furry porn in there, lol :D [11:36] long live the internet [11:37] freaky place I'd trade for no other [11:37] Ctrl-S, you have terabytes available ATM? [11:37] *** Sk1d has joined #archiveteam [11:37] I have a feeling this archive this might well pass the 10 TB [11:37] mark [11:38] I'm on a capped connection though :( [11:38] could you host it for a year or so so i can afford to grab a copy? [11:39] looks like there's no archivebot pipeline with enough storage for grabbing it all at once either :C http://dashboard.at.ninjawedding.org/pipelines [11:39] lol [11:39] Ctrl-S, you might as well hire dedi hosting, lol [11:39] for one month, grab everything, and post it to IA [11:39] i'm serious about wanting this mirrored [11:39] me too [11:39] what do i have to do to get it done? [11:40] but its way too much for me to hold [11:40] where do i send the drive money [11:40] It's already kicking up controversy in the art site's community for hosting people's old and deleted stuff [11:40] I don't expect it to be up for long [11:40] because mailing HDDs is the only way i can get a copy of this [11:41] don't expect it to be up for long on clearnet anywa [11:41] y [11:41] >controversy on furaffinity [11:41] IKR [11:41] "OH NO, I POSTED MY STUFF TO THE PUBLIC INTERNET AND I CANT GET RID OF IT ANYMORE" [11:42] can we contact the admin? [11:42] of this mirror i mean [11:42] no one knows who's hosting this [11:42] but it might be site staff, since it includes so many "deleted" files [11:44] I would seriously pay the several hundred dollars for disk space for this [11:44] because I KNOW it's endangered [11:45] would be nice to not archive this via onion.city, but rather do it via tor? looks like a hidden service proxy to me [11:46] maybe I should just throw it in archivebot and see how far it gets [11:46] 10TB is nothing for archivebot [11:46] Atluxity: ideally, yes [11:46] if we want this we can create a warrior project [11:46] arkiver: http://dashboard.at.ninjawedding.org/pipelines [11:46] a warrior project that grabs shit from tor? [11:46] yeah, why not [11:47] arkiver: warrior project getting archiving a tor hidden service? sounds...interesting [11:47] onion.city for now [11:47] won't that require extra dependencies on the warrior VM's? [11:47] We DO need to get around to backing up the tor hidden sites [11:47] correct [11:47] I can create a project for this .onion.city site easily [11:47] but 10TB is a lot [11:47] but IA might not be willing to host hidden services, with good reason [11:47] they hold an especially high degree of cultural relevance due to their often illicit nature [11:47] not sure if IA is willing to take that all in [11:47] talk to the onion.city admins first ;) [11:48] but IA might not be willing to host hidden services, with good reason [11:48] also the opposite [11:48] they might be very willing, with good reason [11:48] I know [11:48] they don't have to provide open access, just hang onto the data [11:48] I think they'd probably be a bit... conflicted about it [11:48] if SketchCow thinks IA is willing to take multiple TB's from http://vj5pbopejlhcbz4n.onion.city/indexes [11:49] if that ^ I'll have a project running soon [11:51] I actually wrote a script to save things from FA, but i'm on a capped connection so i can't save everything [11:53] 10TB is nothing for archivebot [11:53] 3tb max free diskspace isn't agreeing with you, ark http://dashboard.at.ninjawedding.org/pipelines [11:53] what I meant is that a website of 10TB whould [11:53] shouldn't be archived with arcivebot [11:54] oh [11:54] okay [11:54] misinterpretation :P [11:54] yep, I wasn't clear [11:54] >That wonderous feel when you find a copy of something yo'd long thought deleted [11:55] <3 [11:56] whatever the case, this needs backing up right now, and i will do anything in my power to help you do so [11:57] I've seen too many artists go bezerk and delete everything to lose this [11:57] do you have 10TB of free diskspace? [11:57] if you do, we'll start [11:57] maybe, but 1TB/month cap [11:57] Being australian is suffering [12:21] ;_;7 [12:52] *** nox has joined #archiveteam [13:04] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [13:15] * ersi points and laughs [13:26] ;) [13:54] *** primus104 has joined #archiveteam [13:56] *** sankin has joined #archiveteam [14:32] https://www.youtube.com/watch?v=EWCLpaynj4Y fuck my country and its people ;_; [14:32] but at least we dont have bandwidth caps ;) [14:40] rofl Muad-Dib [14:41] white trash, white trash everywhere ;_; [14:41] aye [14:41] glorious YUROP [14:42] *** aNthraXx has quit IRC (Read error: Operation timed out) [14:43] *** aNthraXx has joined #archiveteam [15:19] *** Start has quit IRC (Disconnected.) [15:25] *** Sk2d has joined #archiveteam [15:28] *** Sk1d has quit IRC (Read error: Operation timed out) [15:29] *** Sk1d has joined #archiveteam [15:30] *** Sk2d has quit IRC (Ping timeout: 265 seconds) [15:33] *** Froggypwn has quit IRC (Read error: Operation timed out) [15:34] *** Froggypwn has joined #archiveteam [15:35] midas: are you able to get the list of ftps back online? [15:36] is it offline? [15:37] yeah, 503 [15:37] 502* [15:39] *** Sk1d has quit IRC (Read error: Operation timed out) [15:40] stupid pad crashed [15:43] *** Sk1d has joined #archiveteam [15:51] *** dashcloud has quit IRC (Read error: Connection reset by peer) [15:51] *** Sk1d has quit IRC (Ping timeout: 265 seconds) [15:53] *** Sk1d has joined #archiveteam [15:56] *** dashcloud has joined #archiveteam [15:58] *** dashcloud has quit IRC (Read error: Operation timed out) [16:00] *** mistym has joined #archiveteam [16:01] *** mistym has quit IRC (Remote host closed the connection) [16:04] *** Start has joined #archiveteam [16:05] *** dashcloud has joined #archiveteam [16:06] *** Sk1d has quit IRC (Read error: Operation timed out) [16:06] *** Sk2d has joined #archiveteam [16:06] *** Sk2d is now known as Sk1d [16:20] *** Sk1d has quit IRC (Ping timeout: 265 seconds) [16:21] *** signius has quit IRC (Read error: Operation timed out) [16:23] *** mistym has joined #archiveteam [16:25] *** Sk1d has joined #archiveteam [16:33] fwiw archivebot uploads in 5gb intervals so you don't actually need 10tb of free space [16:34] tasks that run for months can be an issue though as machines need maintenance etc [16:35] *** signius has joined #archiveteam [16:37] so if there's some way to feed in pieces of it one at a time (subdirectories are ideal) [16:51] *** Start has quit IRC (Disconnected.) [16:57] *** danneh_ has quit IRC (Ping timeout: 260 seconds) [17:03] *** danneh_ has joined #archiveteam [17:09] *** Nertsy has quit IRC (Read error: Operation timed out) [17:17] *** mistym has quit IRC (Remote host closed the connection) [17:31] *** mistym has joined #archiveteam [17:35] *** sep332 has quit IRC (bye) [17:37] *** sep332 has joined #archiveteam [17:48] i can probably implement tor for archivebot sometime this week [17:50] chfoo: I do think 10TB websites shouldn't be done with archivebot [17:52] *** Sk1d has quit IRC (Read error: Operation timed out) [17:55] *** Sk1d has joined #archiveteam [18:02] *** Start has joined #archiveteam [18:02] *** Sk1d has quit IRC (Read error: Operation timed out) [18:04] *** Sk1d has joined #archiveteam [18:09] *** Sk1d has quit IRC (Ping timeout: 265 seconds) [18:09] https://about.gitlab.com/2015/03/03/gitlab-acquires-gitorious/ [18:12] *** Sk1d has joined #archiveteam [18:17] *** Sk2d has joined #archiveteam [18:20] *** Sk1d has quit IRC (Read error: Operation timed out) [18:22] *** rolfb has joined #archiveteam [18:22] *** Sk2d has quit IRC (Ping timeout: 265 seconds) [18:23] *** Sk1d has joined #archiveteam [18:24] Hi there. Gitorious has been acquired and gitorious.org will shut down at the end of May. Is there any way to preserve the data? [18:30] *** Sk1d has quit IRC (Ping timeout: 265 seconds) [18:31] arkiver: i'm not really fond of using tor in the warrior because it will involve setting up the latest tor and http proxy and it's likely that a manual script runner will break something. i'm also worried about needing to set up the warriors to use bridges in case the isp blocks tor [18:31] but maybe someone with lots of bandwidth could set up a public tor proxy for archiveteam use [18:33] *** Sk1d has joined #archiveteam [18:39] rolfb: Is there data that isn't already in the WayBackMachine? [18:39] *** Sk2d has joined #archiveteam [18:41] *** Start has quit IRC (Disconnected.) [18:41] As far as I can see, everything they have there other than the repo for the community edition source code is private/paid subscriptopn based. [18:42] *** Sk1d has quit IRC (Read error: Operation timed out) [18:42] *** Sk2d is now known as Sk1d [18:43] git clone everything [18:45] Wait, never mind, it appears they do host some repos [18:49] Apparently, GitLab took enough paying ustomers that Gitorious can't support its self while offering free service. [18:50] *** Sk1d has quit IRC (Ping timeout: 265 seconds) [18:51] *** abartov has joined #archiveteam [19:00] *** Sk2d has joined #archiveteam [19:01] *** kyan_ has joined #archiveteam [19:03] *** kyan has quit IRC (Read error: Operation timed out) [19:05] *** Sk1d- has joined #archiveteam [19:06] *** Sk2d has quit IRC (Ping timeout: 265 seconds) [19:09] *** Sk2d has joined #archiveteam [19:09] *** Sk2d is now known as Sk1d [19:11] *** Sk1d- has quit IRC (Read error: Operation timed out) [19:11] "We don't want to move people's code to another organization without their permission." yes, their open-source, public code [19:14] *** Sk1d has quit IRC (Ping timeout: 265 seconds) [19:14] lol [19:14] sad [19:18] *** Sk1d has joined #archiveteam [19:21] *** sankin has quit IRC (Leaving.) [19:22] *** Sk1d has quit IRC (Ping timeout: 265 seconds) [19:25] *** dashcloud has quit IRC (Read error: Operation timed out) [19:31] *** Sk1d has joined #archiveteam [19:32] *** dashcloud has joined #archiveteam [19:33] chazchaz: sorry for not replying, i don't know what the waybackmachine has, but surely it would be more interesting to have the git repositories, and all the code on .org is available for download, over 100k repositories [19:35] Muad-Dib: maximum diskspace has no effect on maximum job size [19:35] the main problem with 10 TB is justifying shoving 10 TB into IA [19:36] also running up someone's bandwidth bill if empathy is something you believe in [19:43] *** BlueMaxim has joined #archiveteam [19:48] I don't think it's bandwidth that's the problem. It takes more than 30 seconds to start getting data for some of those links [19:57] I was referring also to the node operator's bill [19:57] OVH doesn't seem to care, DO seems to eventually [19:57] in any case a 10 TB job is really just a dick move at present time [20:01] *** Start has joined #archiveteam [20:03] rolfb: waybackmachine = http://web.archive.org/ [20:07] rolfb: are you the rolf the gitlab news is talking about? [20:17] a database and data dump of everything straight from the source would be the most ideal [20:23] second option would be a backdoor for archiveteam [20:24] *** aschmitz has quit IRC (Read error: Operation timed out) [20:28] *** Start has quit IRC (Disconnected.) [20:31] *** aschmitz has joined #archiveteam [20:49] *** dashcloud has quit IRC (Read error: Operation timed out) [20:55] *** dashcloud has joined #archiveteam [21:13] *** Ymgve__ has joined #archiveteam [21:18] *** Nertsy has joined #archiveteam [21:18] *** Ymgve has quit IRC (Ping timeout: 506 seconds) [21:19] *** cbb has joined #archiveteam [21:20] *** Ymgve has joined #archiveteam [21:22] *** Ymgve__ has quit IRC (Ping timeout: 506 seconds) [21:26] *** Ymgve has quit IRC (Remote host closed the connection) [21:26] *** Ymgve has joined #archiveteam [21:28] *** Start has joined #archiveteam [21:29] *** Start has quit IRC (Read error: Connection reset by peer) [21:46] *** Start has joined #archiveteam [21:58] if it's diskspace that's the problem i can donate a few hundred bucks for drives for that FA dump [22:03] *** Sk2d has joined #archiveteam [22:04] *** mistym has quit IRC (Remote host closed the connection) [22:04] chfoo: i am [22:04] *** Sk1d has quit IRC (Read error: Operation timed out) [22:04] *** Sk2d is now known as Sk1d [22:06] *** mistym has joined #archiveteam [22:08] *** SN4T14_ has quit IRC (Read error: Connection reset by peer) [22:09] *** Sk1d has quit IRC (Ping timeout: 265 seconds) [22:09] *** SN4T14 has joined #archiveteam [22:11] Ctrl-S: was that directed at me? [22:11] no [22:11] *** Sk1d has joined #archiveteam [22:12] ok :) [22:12] I odn't think so [22:12] Ctrl-S: what was it about? [22:12] art hosting site backup someone's made with pretty much all the stuff that was deleted from the site included [22:13] ~10 TB was estimated [22:14] rolfb: is possible to just upload the repos directly to archive.org? [22:14] chfoo: we have root, so I guess we can do whatever we want? we don't have much in terms of space to create images though [22:15] how much temporary space would you need? [22:15] xmc: we have 4.5 TB of data [22:16] oh, so a reasonable amount [22:16] always reasonable ;-) [22:16] :) [22:16] I have a slooooooooow 2Tb [22:16] you could probably fire up an amazon instance with a bunch of storage for a few dozen bucks [22:16] and stream it to that for packaging [22:17] *** schbirid has quit IRC (Leaving) [22:19] the b/w in out tho?? [22:19] *** Start has quit IRC (Disconnected.) [22:20] Smiley: bandwidth is adjustable [22:20] atleast on our side [22:21] Nod [22:21] but costs to export from amazon can be wild... [22:21] we could possibly send physical disks [22:21] oooooooooo [22:21] but how would it be made available after? [22:21] SketchCow could maybe accept physical disks [22:21] well, if you have disks I'd think IA would host it [22:21] send disks to IA, IA uploads from the disks [22:21] it's just the fact their storage costs like $1000/Tb [22:21] that much? [22:22] yah due to duplication etc etc [22:22] ten cents a gig a month [22:22] * Smiley can't remember exactly [22:22] IA or S3? [22:22] thousand gigs is a hundred bucks a month [22:22] ish [22:23] killer is transit from AWS, they estimate about 500 bux to get 5T out of AWS [22:23] IA is $2k/TB. not per year, that's forever. [22:23] for ia you have to keep in mind it's amortized out to infinity because you have to replace drives every couple years [22:23] aye [22:23] *** Panasonic has quit IRC (Ping timeout: 370 seconds) [22:23] #archiveteam-bs [22:24] sep332: meaning that if we send disks to IA, we need to pay them $9k to preserve the data? [22:24] no [22:24] no, they have to pay that [22:24] if you send htem disks, they'd be happy [22:24] ok, ok [22:24] if we want them to store the data for us, we might need to look at fundraising... [22:24] you only need the disks, if they can't find the space i presume they'd just keep the data somewhere less expensive [22:25] like in a cupboard [22:25] but ... how would the git repositories be made available? [22:25] best practice for git repos is to export git bundles [22:25] zip of each repo, infopage as html as well? [22:25] IA has been very generous about doing pretty much anything we send them for free, the dollar figures are just to keep things in perspective [22:25] then an IA item would consist of a git bundle and all the other stuff from the repo [22:25] rolfb: what services exactly do you have for each repo? [22:26] i mean, what stuff do you store [22:26] not much aside from the repository [22:26] so not a wiki/bugtracker/filedump like github does [22:27] there's a wiki [22:27] but that's also a repository [22:27] great [22:27] so if i were doing this [22:27] i would create one IA item per repo, containing two git bundles, one each of the source code and of the wiki [22:27] git bundles are, conveniently, bzip'd [22:28] but i'm sure you already know that :) [22:28] xmc, just to complicate things ... we have repositories by project [22:28] project? [22:28] example https://gitorious.org/gitorious/ [22:28] ahh [22:28] but the project name could be metadata for a repo [22:29] right [22:29] *** BlueMaxim has quit IRC (Ping timeout: 370 seconds) [22:29] i'd say put e.g. https://gitorious.org/gitorious/libdolt/ into http://archive.org/details/gitoriousexport_gitorious_libdolt [22:30] so the item names you're creating would be gitoriousexport_$(project)_$(repo) [22:30] and then you'd add various metadata fields to the item as well [22:30] how's this sound? [22:30] sounds good [22:30] cool :) [22:31] you can use almost any characters in IA item names, but it's best practice to restrict to [-_A-Za-z0-9] [22:31] and . [22:31] i'm pretty sure we have similar restrictions ... as names are used as urls [22:31] yeah [22:31] i've not heard of any characters except / breaking things ... but *shrug* [22:32] but, how do we create an IA bundle? [22:32] ia bundle? [22:32] item [22:32] https://pypi.python.org/pypi/internetarchive [22:32] there's a python toolo .. yes [22:32] thanks [22:33] if you have all the items have a shared name prefix, or an identical metadata field, someone at IA can put them into a special collection [22:34] is there a problem uploading 122k bundles? or should we rather send disks? [22:34] ia items* [22:35] 122,000 items / 4.5T? should be fine, i guess? especially if spread out over a month or so [22:36] yup, something like that [22:36] the script that processes uploads will hold your upload until it's allocated space, which usually takes a few tens of seconds [22:36] so you might want to look into mild parallelism [22:36] is this channel logged somewhere? [22:36] yes [22:36] also, i'm not an IA person [22:36] http://badcheese.com/~steve/atlogs/?chan=archiveteam [22:36] just a satisfied customer [22:36] i can give logs if you need them [22:37] rolfb: thanks for being a cool, forward-thinking person <3 [22:37] my client has been logging so i'm all good for relaying information to the experts in my team [22:38] sweet [22:38] xmc: since you are not an IA person, who do I verify that I can do this with? [22:38] SketchCow is an IA employee [22:38] i'd expect him to be in irc within the next few hours [22:38] it's already past my bedtime [22:38] <- norwegian [22:39] ahhh, yes [22:39] xmc: also, thanks for the kind words [22:39] i know a finn elsewhere on efnet who went to bed an hour ago [22:39] trying to make the best of a bad situation [22:39] you're a good sight better than most people in your situation [22:39] ^this [22:40] thanks, i'm just glad there is an alternative like IA [22:40] xmc: will you be staying around till SketchCow arrives? [22:40] jscott@archive.org is his email [22:40] i'll be in and out. i'm working, and in a few hours i'll be going to beer [22:40] ok, great. is it ok that I email him directly then? [22:41] but i'm in irc most of my waking life [22:41] yeah, go for it [22:41] ok, any names I can use as referrals for getting in touch? [22:41] or just use nicknames? [22:41] irc names is good [22:42] saying #archiveteam is probably good enough [22:42] "some people with @ before their name" [22:42] :P [22:42] "i'm trying to rescue my shit" [22:43] http://archiveteam.org/images/e/e6/Archiveteam.jpg [22:44] :) [22:48] *** Panasonic has joined #archiveteam [22:52] *** dashcloud has quit IRC (Read error: Operation timed out) [22:54] *** BlueMaxim has joined #archiveteam [22:59] *** dashcloud has joined #archiveteam [22:59] email sent [22:59] thanks again everyone [23:01] *** mistym has quit IRC (Remote host closed the connection) [23:12] *** rolfb has quit IRC (Linkinus - http://linkinus.com) [23:20] *** mistym has joined #archiveteam [23:21] *** Start has joined #archiveteam [23:22] *** Panasonic has quit IRC (Ping timeout: 606 seconds)