[00:04] anyone that has tracker access: please check #yolohalo for a note re. a user named "dotZIP" [00:04] I'll repost here for convenience: [00:04] [19:03:20] HEY head's up: "dotZIP" has been returning quite a bunch of 0.3 MB items [00:05] [19:03:34] that definitely doesn't seem right, probably want to check the stuff out [00:05] [19:03:45] he isn't using the warrior either [00:05] (this is for the Halo project) [00:18] person was using tor [00:23] *** kyan has quit IRC (Quit: Leaving) [00:32] maybe checkip should error out if facebookcorewwwi.onion resolves [00:35] ^^^ [00:35] that's what I was thinking, a check to maybe check.torproject.org [00:36] but a check for whether or not a specific .onion resolves would do the job just as well [00:41] *** mistym has quit IRC (Remote host closed the connection) [01:15] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [01:35] *** schbirid has quit IRC (Read error: Operation timed out) [01:35] *** schbirid has joined #archiveteam [01:37] *** mistym has joined #archiveteam [01:54] *** Kazzy_ has joined #archiveteam [01:55] *** jk[[SVP]] has joined #archiveteam [01:55] *** fx_ has joined #archiveteam [01:56] *** w0rp_ has joined #archiveteam [01:56] *** Sk2d has joined #archiveteam [01:56] *** raccoon__ has joined #archiveteam [01:58] *** RedType_ has joined #archiveteam [01:59] *** bsmith093 has joined #archiveteam [01:59] *** NovaKing_ has joined #archiveteam [02:00] *** mutoso has quit IRC (hub.se efnet.portlane.se) [02:00] *** lytv has quit IRC (hub.se efnet.portlane.se) [02:00] *** Kazzy has quit IRC (hub.se efnet.portlane.se) [02:00] *** fx__ has quit IRC (hub.se efnet.portlane.se) [02:00] *** RedType has quit IRC (hub.se efnet.portlane.se) [02:00] *** ben__ has quit IRC (hub.se efnet.portlane.se) [02:00] *** nico_32 has quit IRC (hub.se efnet.portlane.se) [02:00] *** NovaKing has quit IRC (hub.se efnet.portlane.se) [02:00] *** Deewiant has quit IRC (hub.se efnet.portlane.se) [02:00] *** jk[SVP] has quit IRC (hub.se efnet.portlane.se) [02:00] *** raccoon_ has quit IRC (hub.se efnet.portlane.se) [02:00] *** w0rp has quit IRC (hub.se efnet.portlane.se) [02:00] *** filippo_ has quit IRC (hub.se efnet.portlane.se) [02:00] *** Sk1d has quit IRC (hub.se efnet.portlane.se) [02:01] *** nico_32_ has joined #archiveteam [02:03] *** filippo__ has joined #archiveteam [02:15] *** w0rp_ is now known as w0rp [02:15] *** Sk2d is now known as Sk1d [02:15] *** jk[[SVP]] is now known as jk[SVP] [02:15] *** lytv has joined #archiveteam [02:15] *** Kazzy_ is now known as Kazzy [02:34] *** nico_32_ is now known as nico_32 [02:38] *** kyan has joined #archiveteam [02:38] *** nico_32 has quit IRC (Quit: Reconnecting) [02:38] *** nico_32 has joined #archiveteam [02:53] *** tux has joined #archiveteam [02:53] Ping-a-ling [02:54] pong-a-long [02:54] Guess what I have [02:54] *** tux is now known as Compresse [02:54] I have a copy of TropicalWikis from February 2013. [02:55] Well, I guess it's better than having no copy. [02:55] sweet! [02:55] in mediawiki xml dump format? [02:55] Only SQL and images [02:55] how big is it? [02:56] It's roughly around 1GB [02:56] https://archive.org/create/ [02:56] I wouldn't want to upload raw SQL obv [02:56] why not? [02:57] passwords [02:57] need to scrub those [02:57] ah, that [03:00] let me start doing XML dumps [03:02] *** Sellyme has quit IRC (Remote host closed the connection) [03:03] *** primus104 has quit IRC (Leaving.) [03:10] *** Infreq has joined #archiveteam [03:14] *** Ymgve has quit IRC () [03:20] *** kyan has quit IRC (Quit: Leaving) [03:31] *** gui7 has joined #archiveteam [03:51] *** Sellyme has joined #archiveteam [03:51] *** Sellyme has quit IRC (Remote host closed the connection) [03:57] *** kyan has joined #archiveteam [04:07] *** VADemon has quit IRC (Quit: left4dead) [04:17] *** khaoohs has quit IRC (Read error: Connection reset by peer) [04:18] *** khaoohs has joined #archiveteam [04:22] xmc, I'm uploading it now [04:22] It will take a while as the images are ~1.1GB [04:23] also, I'm doing this over a residential connection, where upload isn't really good [04:33] *** aaaaaaaaa has quit IRC (Leaving) [04:33] http://www.eurogamer.net/articles/2015-04-26-p-t-is-being-pulled-from-psn-on-wednesday [04:36] *** SketchCow has quit IRC (Read error: Connection reset by peer) [04:36] *** SketchCow has joined #archiveteam [04:36] *** swebb sets mode: +o SketchCow [04:41] Enjoying my night before going to Sweden with some high-test CD Ripping [04:48] nearly 80% done... [04:56] Slicing through CD-ROMs like nothing, really. [05:00] yay, done [05:23] *** yotta has joined #archiveteam [06:13] *** mistym has quit IRC (Remote host closed the connection) [06:39] Compresse: sweet [06:39] Compresse: link? [06:39] https://archive.org/details/tropicalwikis-feb-2013 [06:40] \o/ [06:41] I actually imported one of my older wikis into my current one [06:41] it'd be helpful if you were to upload a plain .tar of the dumps, because then you can browse it online without unzipping [06:41] but thanks! [06:42] xmc, that would have taken longer on my painful connection [06:48] fair enough [06:51] *** mistym has joined #archiveteam [07:01] *** Compresse has quit IRC (Quit: Leaving) [07:12] *** za3k has joined #archiveteam [07:14] Hey. I've never worked on anything involved with the archive team. I want to use Warrior to archive Github. Would this be cool? [07:17] I'm familiar with all of the 5-6 major software tools to archive github already, and understand how to archive it (less reliably and completely than needed for the final version) on a personal scale already, I just don't have adequate resources. [07:17] I asked Github through official channels if they'd just hand over data for public repos and they said no, although I haven't tried personal contacts yet (I'm in the same city) [07:27] *** scyther has joined #archiveteam [07:38] *** kyan has quit IRC (Quit: Leaving) [07:40] *** BlueMaxim has joined #archiveteam [07:40] za3k: the Warrior's main purpose at the moment is grabbing websites, and grabbing what Github sends is an untenable mess due to all the history links etc [07:41] something like github-backup is more appropriate and I guess you could wrap that up in a pipeline [07:41] that said, where would you shove the backup [07:46] *** dPhoenix has quit IRC (Read error: Connection reset by peer) [07:53] yipdw: Yeah, I was talking about github-backup type things when I said existing tools. Wasn't sure if Warrior could deal with those. I thought archive team might have space; that's my limiting factor, as I can probably manage bandwidth. [07:53] it can [07:54] we don't really have space on our own, we use IA's facilities [07:54] I don't know if they'd be wild about taking on terabytes of data for a site that is (A) alive and (B) rapidly changing [07:54] you'd have to ask [07:55] Okay, I'll ask. I'll also get a more accurate size estimate. Do you know of any way to grab the completely list of public repos and gists? That would save me some time and let me do a meaningful statistical sample to get size estimates. [07:56] Also, every existing archive tool is unmaintained, and only one of them even works, so that's my first step--I'll double-check and then update the wiki with a list of tools. [07:56] no, but I do know the latter list changes about every second [07:57] if you run into problems with github-backup, its maintainer is in this channel [07:58] Useful to know. Like I said, re-checking the state of tooling is probably one of my first steps. [07:58] although I'm wondering what's wrong with /gists/public from the github API [07:59] Nothing, neccesarily; I'm familiar with third-party tools, not the github API. I'll check up these leads and get back in a couple days. [07:59] yeah, you should really look throug the github API [07:59] Thanks for the help. [07:59] I assumed I'd have to anyway--most of the tools I saw stopped working because they were on old API versions and unmaintained. [08:00] I remember there being a few projects called github-backup so I might have skipped the linked one thinking it was a duplicate. [08:00] Like I said, I need to do some more research. [08:00] Just scoping out feasibility. [08:00] I also wonder what the goal here is; public github is massive and rapidly changing [08:01] Archive Team projects go for massive but the rapidly changing thing is a bit tough [08:01] 1) There are a lot of research projects you can do with representative code-samples 2) Updating git repos isn't too painful relative to updating other changing sites 3) If github goes down it's really bad [08:01] Yeah, I'm not sure how bad a snapshot is for archive team's goals [08:01] I take it (2) is not really the big deal though [08:02] I mean github is way more than its repos [08:02] Agreed, although I think if you look at amounts of raw data it's mostly repos. [08:03] as far as (3), well, we all have to get burned once in a while [08:03] besides it'd be cool to watch Silicon Valley stop dead for a day [08:03] heh [08:03] maybe they might learn something [08:03] and then make the same mistake again [08:05] a slightly more serious answer to (3) is yeah it definitely would be [08:08] So, another serious answer might be that I could write the snapshot-style script, but let it sit stagnant unless something scary-sounding happens with Github. I don't really trust sites to give advance notice of going down, but I don't really have a feasible answer to space storage yet either. [08:09] that'd be useful, yeah [08:09] I think that's a poor use of time on second thought, actually; directing effort at github-backup or something might be more useful. [08:10] also if you can get in touch with IA folk and get them on board with mirroring github, another option is to have github staff work with IA directly [08:10] Anyway, again thanks for help, and I'll get back after talking to people about storage, size estimates, and with a version that runs well on a single repo. [08:10] SketchCow has IA contacts, you can ask him for an inroad [08:12] Sure, I'd like to have a snapshot size [and if feasible daily delta] to quote at IA first but will do. [08:13] SketchCow: feel free to reply, I'm offline but can grab logs [08:13] *** za3k has quit IRC (Quit: Page closed) [08:18] *** bugfiend has joined #archiveteam [08:19] *** db48x has quit IRC (Read error: Connection reset by peer) [08:23] does anyone here know of cboyardee [08:25] *** primus104 has joined #archiveteam [08:33] *** Infreq has quit IRC (Remote host closed the connection) [08:36] re github: https://www.githubarchive.org/ [08:38] wonder if you can reliable put together repos based on that [08:38] reliably* [08:54] *** mistym has quit IRC (Remote host closed the connection) [08:56] well the git part is easy, just clone. the rest is the issue :D [09:03] is someone able to help me with my stupid newbie question? [09:04] i'm trying to, uh... "wget" an entire ftp directory, but whenever I try, it gives me an error: [09:04] "Error in server response, closing control connection." [09:04] I have no idea what I'm doing [09:05] what i'm trying to back up is ftp://ftp.agdg.me [09:05] it has heaps of useful game development resources in there but it hasnt been updated for months [09:05] user:obama pass:voteforme [09:06] i've been afraid about it tanking all of a sudden due to its inactivity and am trying to gather a working backup [09:07] if somebody can help, that would be very appreciated [09:12] if wget doesn't work you could try wpull or lftp [09:14] alright, i'll look into it [09:16] i've also had some success with fuse-mounting and then doing rsync from one filesystem to another [09:17] that got me a 10x speedup over a naive sftp copy (had to use sftp, don't ask) [09:19] ok got a 1 concurrent wpull job on that [09:20] speaking of wpull, it requires python 3 [09:20] can i install it beside python 2? [09:21] needs python 3 [09:29] *** BlueMaxim has quit IRC (Quit: Leaving) [09:34] *** Infreq has joined #archiveteam [09:47] jesus christ im lost [09:47] maybe i'll save this for another day............ [09:47] *** bugfiend has quit IRC (Quit: till next time) [09:54] *** scyther has quit IRC (Leaving) [09:55] *** mistym has joined #archiveteam [10:01] *** mistym has quit IRC (Read error: Operation timed out) [10:10] *** ivan` has joined #archiveteam [10:23] shit https://dolphin-emu.org/blog/2015/04/25/commemoration-rachel-bryk/ :( [10:25] http://ask.fm/RachelB_ [10:27] damn... [10:31] another trans-hate suicide? [10:33] ftp.agdg.me archive, will upload to ia [10:34] 16gb [10:36] *** Ymgve has joined #archiveteam [10:37] her git https://github.com/RachelBryk [10:44] fuck [11:22] *** app has quit IRC (Ping timeout: 258 seconds) [11:45] *** app has joined #archiveteam [11:46] *** lysobit has quit IRC (quit) [11:51] *** lysobit has joined #archiveteam [12:02] *** Ravenloft has quit IRC (Read error: Connection reset by peer) [12:04] does anyone have a dedicated server that they want to donate to the archivebot cause [12:05] archivebot pipeline nodes mostly eat CPU [12:18] how much cpu/etc do you need? I could spin up a vm [12:20] the main constraint is that it has to stay running for months [12:20] the equivalent of a 2-core i3 is fine [12:21] 6GB RAM [12:22] maybe even less [12:22] ~100GB disk minimum [12:41] OS? [12:41] I prefer debian [12:43] need something with the newest libxml2 [12:44] libxml2 2.9.2, I know Ubuntu 15.04 has it [12:44] ugh, jessie is 2.9.1. Ubuntu it is then [12:44] * trs80 grabs an ios [12:44] * trs80 *iso [12:45] thanks [13:01] ivan`: i wish i could guarantee you few months of CPU uptime/network connectivity :D [13:01] but that is next to impossible [13:03] network connectivity can go down if it eventually comes back up, heh [13:04] http://lolsnaps.com/upload_pic/FutureArcheology-17614.png <-- we are helping these guys :D [13:04] ivan`: do you just need a user on there? [13:04] trs80: yeah, and I guess I'll give you a list of packages I need [13:04] I'll PM you my key in a sec [13:47] ivan`: wish i could help but all my servers are small 1gb guys with minimal disk space [13:48] trs80 has saved the day [13:51] i have a not very used ESXi system with 2TB drive and 24GB RAM, but i cannot guarantee uptime, network connectivity and i would have to set up DMZ for it :D [14:26] *** Infreq has quit IRC (Remote host closed the connection) [14:27] *** Infreq has joined #archiveteam [14:28] *** Infreq has quit IRC (Client Quit) [14:31] *** Infreq has joined #archiveteam [15:17] *** nwf has quit IRC (Read error: Operation timed out) [15:18] *** nwf has joined #archiveteam [15:35] Hi. [15:35] It Github Backup Guy comes back, the answer is: No [15:35] I mean, certainly not alone, with someone just trying it out on their own [15:35] It's too big, too important and too involved for a single person to hope for the best. [15:45] *** habi has joined #archiveteam [15:45] *** habi has left [15:52] *** app has quit IRC (Ping timeout: 258 seconds) [15:53] *** app has joined #archiveteam [15:57] I'm heading out today to Sweden. I'll be back Wednesday and no doubt online during. [16:00] *** mistym has joined #archiveteam [16:02] *** signius has quit IRC (Read error: Operation timed out) [16:05] One other piece [16:05] FOS is getting HAMMERED right now with updates and uploads and the rest. [16:05] It got down to a gig of free space at one point, I slammed that back to something like a terabyte. [16:06] I've got a bunch of processes running on a ton of jobs to deal with it. With luck, it should clear out pretty well. [16:06] But if someone sees bad behavior, that's what it is. [16:09] *** mistym has quit IRC (Read error: Operation timed out) [16:15] *** signius has joined #archiveteam [16:36] *** xtr-107 has joined #archiveteam [16:39] *** xtr-201 has quit IRC (Ping timeout: 370 seconds) [16:41] *** mistym has joined #archiveteam [16:45] *** app103 has joined #archiveteam [16:45] *** app has quit IRC (Ping timeout: 258 seconds) [17:06] *** RichardG has quit IRC (Keyboard not found, press F1 to continue) [17:06] *** RichardG has joined #archiveteam [17:32] *** mistym has quit IRC (Remote host closed the connection) [17:37] *** aaaaaaaaa has joined #archiveteam [17:45] *** mistym has joined #archiveteam [17:50] *** Deewiant has joined #archiveteam [17:50] *** mutoso has joined #archiveteam [18:22] *** lytv has quit IRC (Ping timeout: 260 seconds) [18:32] *** dashcloud has quit IRC (Ping timeout: 260 seconds) [18:33] *** dashcloud has joined #archiveteam [18:37] SketchCow: achip and me are creating and testing a new newsletter project: http://mail3.newsletter.nerds.io/newspoc/ [18:45] *** mistym has quit IRC (Remote host closed the connection) [18:51] *** lytv has joined #archiveteam [18:54] *** mistym has joined #archiveteam [18:58] *** app has joined #archiveteam [18:59] *** app103 has quit IRC (Ping timeout: 258 seconds) [19:01] *** habi has joined #archiveteam [19:07] *** dashcloud has quit IRC (Read error: Operation timed out) [19:10] *** dashcloud has joined #archiveteam [19:12] *** habi has quit IRC (Quit: Leaving.) [19:39] *** habi has joined #archiveteam [19:41] *** habi has left [19:52] *** SN4T14 has joined #archiveteam [19:58] *** SN4T14_ has quit IRC (Ping timeout: 512 seconds) [20:01] i've got a couple of sites that i run that will be dying soon if anyone wants to take a shot at archiving - http://not99chan.org and http://kingofthemem.es [20:02] i'll have to dump an sql db for kingofthemem.es [20:05] *** kyan has joined #archiveteam [20:08] *** nwf has quit IRC (Read error: Operation timed out) [20:09] *** nwf has joined #archiveteam [20:14] *** dashcloud has quit IRC (Read error: Operation timed out) [20:27] *** dashcloud has joined #archiveteam [20:49] *** app103 has joined #archiveteam [20:50] *** app has quit IRC (Read error: Operation timed out) [20:52] added them to the archivebot, not sure if it can do anything with the meme page tho [21:01] *** Stilett0 has joined #archiveteam [21:01] ftp.agdg.me upload to ia [21:02] https://archive.org/details/ftp.agdg.me.warc [21:03] *** dashcloud has quit IRC (Read error: Operation timed out) [21:05] *** RichardG has quit IRC (Ping timeout: 606 seconds) [21:06] midas: yeah, it's a weird one [21:10] *** dashcloud has joined #archiveteam [21:22] *** app103 has quit IRC (Ping timeout: 258 seconds) [21:23] *** app has joined #archiveteam [21:24] *** RichardG has joined #archiveteam [21:39] *** app has quit IRC (Ping timeout: 258 seconds) [21:40] *** app has joined #archiveteam [21:50] *** Dark_Star has quit IRC (Read error: Connection reset by peer) [21:51] *** Dark_Star has joined #archiveteam [21:52] *** Froggypwn has quit IRC (Read error: Connection reset by peer) [22:02] *** app has quit IRC (Ping timeout: 258 seconds) [22:09] *** app has joined #archiveteam [22:13] *** dPhoenix has joined #archiveteam [22:16] *** ivan` has left [22:38] *** mistym has quit IRC (Remote host closed the connection) [22:45] *** app has quit IRC (Ping timeout: 258 seconds) [22:47] *** app has joined #archiveteam [22:52] *** mistym has joined #archiveteam [22:56] *** mistym has quit IRC (Remote host closed the connection) [22:57] *** BlueMaxim has joined #archiveteam [23:00] *** SimpBrain has quit IRC (Ping timeout: 258 seconds) [23:03] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [23:09] *** BlueMaxim has joined #archiveteam [23:26] *** kyan has quit IRC (Quit: Leaving) [23:26] *** kyan has joined #archiveteam [23:27] *** mistym has joined #archiveteam [23:38] *** Morbus has quit IRC (http://www.disobey.com/) [23:41] *** Morbus has joined #archiveteam [23:45] *** app has quit IRC (Ping timeout: 258 seconds) [23:45] *** app has joined #archiveteam [23:45] *** dashcloud has quit IRC (Read error: Operation timed out) [23:47] *** dashcloud has joined #archiveteam [23:58] *** app has quit IRC (Ping timeout: 258 seconds) [23:59] *** BlueMaxim has quit IRC (Ping timeout: 512 seconds)