[00:01] *** winr5r has joined #archiveteam [00:01] *** Fusl_ is now known as Fusl [00:02] *** kisspunc- is now known as kisspunch [00:12] *** mistym has quit IRC (Remote host closed the connection) [00:26] *** mistym has joined #archiveteam [00:31] *** lexicon has quit IRC (Read error: Operation timed out) [00:32] *** Fletcher has quit IRC (Ping timeout: 252 seconds) [00:34] *** SadDM has quit IRC (Remote host closed the connection) [00:34] *** SadDM has joined #archiveteam [00:34] *** swebb sets mode: +o SadDM [00:35] *** lexicon has joined #archiveteam [00:43] *** BlueMaxim has joined #archiveteam [00:54] *** Fletcher has joined #archiveteam [01:18] *** JesseW has joined #archiveteam [01:22] *** schbirid2 has joined #archiveteam [01:24] *** username1 has quit IRC (Read error: Operation timed out) [01:55] *** sunnymilk has joined #archiveteam [01:57] *** McGEE has quit IRC (Quit: Connection closed for inactivity) [01:59] *** ripvanwin has quit IRC (Read error: Operation timed out) [02:13] *** ripvanwin has joined #archiveteam [02:25] *** JesseW has quit IRC (Quit: Leaving.) [02:31] *** McGEE has joined #archiveteam [02:33] *** Nyenkaht has joined #archiveteam [02:33] Hi, I keep trying to assist the pomf.se project, but I keep getting told to "wait" another minute [02:34] It's been over 12 minutes since I've been given something, I have only one connection to this project [02:35] there are rate limits to keep from overloading their servers and you just haven't gotten lucky yet. [02:35] so basically i have to run like 30+ servers in order to become a face on the project? theres people doing like 5+ units like every second [02:38] aaaaaaaaa: I have one unit since I started close to a half hour ago [02:38] that is sad [02:39] I don't know what the traffic looks like, just that they do do limits [02:40] well, clearly those limits arent effective [02:41] Are there any active projects that don't have a restriction on 1 unit per world war [02:46] URLTeam uses a different formula and you may have better luck with halo [02:47] Yeah I've been contributing to URLTeam primarily [02:47] 1 million records scanned and counting since i think sunday [02:48] Nyenkaht the rate limiting is done on the tracker. it hands out x jobs a minute to the first warriors that ask, not limiting you specifically. the limit may go up, but for the next few hours pomf.se'll remain at the same rate so we can make sure the system fairs ok [02:50] *** sirdancea has quit IRC (Read error: Operation timed out) [02:57] achip: looks like i'm gonna fit in well with this halo project, i'm averaging 1MB/s apparently [03:15] *** Ymgve has quit IRC () [03:16] *** bzc6p_ has joined #archiveteam [03:16] *** swebb sets mode: +o bzc6p_ [03:19] *** bzc6p has quit IRC (Read error: Operation timed out) [03:21] *** mistym has quit IRC (Remote host closed the connection) [03:26] *** lexicon has quit IRC (Read error: Operation timed out) [03:26] *** Nyenkaht has quit IRC (Quit: .) [03:29] *** SadDM has quit IRC (Remote host closed the connection) [03:29] *** SadDM has joined #archiveteam [03:29] *** swebb sets mode: +o SadDM [03:30] *** lexicon has joined #archiveteam [03:32] *** yuvadm_ has joined #archiveteam [03:32] *** tephra_ has joined #archiveteam [03:32] *** useretail has quit IRC (Read error: Operation timed out) [03:33] *** garyrh has quit IRC (Read error: Operation timed out) [03:33] *** yuvadm has quit IRC (Read error: Operation timed out) [03:33] *** tephra has quit IRC (Read error: Operation timed out) [03:34] *** winr5r has quit IRC (Ping timeout: 255 seconds) [03:35] *** lytv has quit IRC (Read error: Operation timed out) [03:38] *** lytv has joined #archiveteam [03:42] *** winr4r has joined #archiveteam [03:49] *** mistym has joined #archiveteam [03:57] *** Ravenloft has quit IRC (Ping timeout: 362 seconds) [04:06] *** useretail has joined #archiveteam [04:09] *** yotta has quit IRC (Read error: Operation timed out) [04:09] *** Ctrl-S has quit IRC (Read error: Connection reset by peer) [04:09] *** Ctrl-S_ is now known as Ctrl-S [04:10] *** joepie91 has quit IRC (Read error: Operation timed out) [04:10] *** aMunster has quit IRC (Read error: Operation timed out) [04:10] *** toad1 has quit IRC (Read error: Operation timed out) [04:10] *** phuzion has quit IRC (Read error: Operation timed out) [04:10] *** mutoso has quit IRC (Read error: Operation timed out) [04:10] *** nwf has quit IRC (Read error: Operation timed out) [04:10] *** dinomite_ has quit IRC (Write error: Broken pipe) [04:11] *** dinomite has joined #archiveteam [04:11] *** S[h]O[r]T has quit IRC (Read error: Operation timed out) [04:11] *** marvinw has quit IRC (Read error: Operation timed out) [04:12] *** achip has quit IRC (Read error: Operation timed out) [04:12] *** ripvanwin has quit IRC (Read error: Operation timed out) [04:12] *** joepie91 has joined #archiveteam [04:13] *** vegbrasil has quit IRC (Ping timeout: 600 seconds) [04:16] *** bzc6p_ has quit IRC (Read error: Operation timed out) [04:18] *** mutoso has joined #archiveteam [04:20] *** sep332 has quit IRC (Ping timeout: 600 seconds) [04:22] *** mistym has quit IRC (Remote host closed the connection) [04:29] *** RichardG_ has joined #archiveteam [04:30] *** RichardG has quit IRC (Read error: Connection reset by peer) [04:31] *** Emcy_ has joined #archiveteam [04:32] POMF is being put in the wrong place on FOS, but I will deal. [04:34] *** Emcy has quit IRC (Ping timeout: 306 seconds) [04:34] *** lytv has quit IRC (Ping timeout: 306 seconds) [04:34] Can someone help rip a site out of wayback to give to someone? [04:37] *** lytv has joined #archiveteam [04:38] *** aaaaaaaaa has quit IRC (Leaving) [04:38] *** marvinw has joined #archiveteam [04:38] *** phuzion has joined #archiveteam [04:38] *** ripvanwin has joined #archiveteam [04:38] *** nwf has joined #archiveteam [04:38] *** S[h]O[r]T has joined #archiveteam [04:38] *** vegbrasil has joined #archiveteam [04:39] *** Control-S has joined #archiveteam [04:39] *** achip has joined #archiveteam [04:40] *** aMunster has joined #archiveteam [04:40] *** Froggypwn has quit IRC (Ping timeout: 240 seconds) [04:41] *** sep332 has joined #archiveteam [04:42] *** toad1 has joined #archiveteam [04:44] *** rduser has quit IRC (Read error: Operation timed out) [04:44] *** midas has quit IRC (Read error: Operation timed out) [04:44] *** midas has joined #archiveteam [04:44] *** rduser has joined #archiveteam [04:47] *** SN4T14_ has joined #archiveteam [04:48] *** SN4T14 has quit IRC (Ping timeout: 369 seconds) [04:59] SketchCow: oh oops [05:00] I just put it in /1 because that's where everything else was [05:00] is it supposed to be /0? [05:04] *** mistym has joined #archiveteam [05:05] traditionally it's in /1/CHFOO/warrior but I've already adapted. [05:07] *** McGEE has quit IRC (Quit: Connection closed for inactivity) [05:15] Pomf now being loaded into archive [05:16] halo continues, it's not flooding the system yet [05:16] Did we get clearance off of Furraffinity or is there more? [05:20] bsmith093: Clearance on the fanfiction? [05:25] *** Froggypwn has joined #archiveteam [05:53] *** JesseW has joined #archiveteam [06:02] *** bzc6p_ has joined #archiveteam [06:02] *** swebb sets mode: +o bzc6p_ [06:12] I almost made the same mistake with pomf that I did with baraza [06:12] Items need to be much smaller, not larger. [06:31] *** bzc6p_ is now known as bzc6p [06:34] *** mistym has quit IRC (Remote host closed the connection) [06:40] *** garyrh has joined #archiveteam [06:54] *** JesseW has quit IRC (Quit: Leaving.) [06:54] *** ripvanwin has quit IRC (Read error: Connection reset by peer) [06:54] *** RichardG_ has quit IRC (Remote host closed the connection) [06:55] *** ripvanwin has joined #archiveteam [07:05] *** nox has quit IRC (Ping timeout: 252 seconds) [07:35] *** mistym has joined #archiveteam [07:36] *** godane has quit IRC (Read error: Operation timed out) [07:40] *** mistym has quit IRC (Ping timeout: 252 seconds) [07:40] PSA [07:41] .title https://torrentfreak.com/elsevier-cracks-down-on-pirated-scientific-articles-150609/ [07:41] (no botpie? [07:41] Academic publishing company Elsevier has filed a complaint at a New York District Court, hoping to shut down the Library Genesis project and the SciHub.org search engine. The sites, which are particularly popular in developing nations where access to academic works is relatively expensive, are accused of pirating millions of scientific articles. [07:41] so.. [07:41] yeah, maybe it's time for a copy [07:54] there are several mirrors [07:58] *** godane has joined #archiveteam [08:00] *** khaoohs_ has joined #archiveteam [08:00] *** khaoohs has quit IRC (Read error: Connection reset by peer) [08:07] "net income of more than $1 billion [...] losses, which could run into the millions." [08:08] so it *may* lose 1 of 1000 million dollars [08:28] *** DFJustin has quit IRC (Ping timeout: 740 seconds) [08:32] *** MMovie has joined #archiveteam [08:33] *** Swizzle__ has quit IRC (Read error: Connection reset by peer) [08:35] *** MMovie1 has quit IRC (Ping timeout: 306 seconds) [08:35] *** jmtd has quit IRC (Quit: ZNC - http://znc.in) [08:35] *** primus104 has joined #archiveteam [09:16] *** Froggypwn has quit IRC (Read error: Connection reset by peer) [09:16] *** primus104 has quit IRC (Leaving.) [09:17] *** Froggypwn has joined #archiveteam [09:25] *** mistym has joined #archiveteam [09:38] *** mistym has quit IRC (Read error: Operation timed out) [10:24] *** db48x has quit IRC (Read error: Connection reset by peer) [10:36] *** john4 has quit IRC (Ping timeout: 370 seconds) [10:44] *** vOYtEC has joined #archiveteam [10:47] *** vOYtEC has quit IRC (Read error: Connection reset by peer) [10:47] *** john4 has joined #archiveteam [10:48] *** vOYtEC has joined #archiveteam [10:50] *** vOYtEC has quit IRC (Read error: Connection reset by peer) [10:51] *** vOYtEC has joined #archiveteam [10:51] *** vOYtEC has quit IRC (Read error: Connection reset by peer) [11:14] *** mistym has joined #archiveteam [11:18] *** khaoohs has joined #archiveteam [11:18] *** bryan1 has joined #archiveteam [11:19] hi [11:19] *** bryan1 is now known as _bryan [11:22] *** Ymgve has joined #archiveteam [11:24] *** mistym has quit IRC (Read error: Operation timed out) [11:24] *** khaoohs_ has quit IRC (Read error: Operation timed out) [11:26] *** sirdancea has joined #archiveteam [11:30] *** primus104 has joined #archiveteam [11:44] *** dinomite has quit IRC (Read error: Operation timed out) [11:44] *** dinomite has joined #archiveteam [11:48] *** nox has joined #archiveteam [12:14] this sucks http://classic.xfire.com/ [12:16] still reachable via http://208.88.178.38/profile/%profilename% if someone has the time to grab it [12:40] *** BlueMaxim has quit IRC (Quit: Leaving) [13:03] *** mistym has joined #archiveteam [13:05] *** primus104 has quit IRC (Read error: Connection reset by peer) [13:10] *** mistym has quit IRC (Read error: Operation timed out) [13:14] *** Marc has joined #archiveteam [13:15] *** jmc has quit IRC () [13:51] *** Start has quit IRC (Disconnected.) [13:51] *** Start has joined #archiveteam [13:51] *** Start has quit IRC (Client Quit) [13:55] *** Froggypwn has quit IRC (Ping timeout: 606 seconds) [13:56] *** Froggypwn has joined #archiveteam [13:58] *** primus104 has joined #archiveteam [14:04] *** mistym has joined #archiveteam [14:05] *** mistym has quit IRC (Remote host closed the connection) [14:05] *** mistym has joined #archiveteam [14:07] *** mistym has quit IRC (Remote host closed the connection) [14:09] *** DFJustin has joined #archiveteam [14:21] *** habi has joined #archiveteam [14:22] *** habi has left [14:27] *** habi has joined #archiveteam [14:32] *** mistym has joined #archiveteam [14:43] *** sirdancea has quit IRC (Read error: Operation timed out) [14:44] *** mistym has quit IRC (Remote host closed the connection) [14:47] *** habi has quit IRC (Quit: Leaving.) [14:48] *** Start has joined #archiveteam [14:54] *** primus104 has quit IRC (Leaving.) [14:55] midas: deadline is June 12 [14:55] added to Deathwatch [14:56] *** Start has quit IRC (Disconnected.) [14:58] *** chrki has joined #archiveteam [14:58] hey guys [14:58] *** mistym has joined #archiveteam [15:02] *** bzc6p_ has joined #archiveteam [15:02] *** swebb sets mode: +o bzc6p_ [15:03] *** Start has joined #archiveteam [15:05] *** bzc6p has quit IRC (Read error: Operation timed out) [15:07] I don't know if this fits your project's scope and all, I haven't found a good solution with archive.org (since they only let me archive pages one by one). The Xfire gaming messenger client is shutting down, along with all user profiles on their website. I found a way to access these still (through some Google trial and error), most links on there still work while the official page and links will all redirect to an "export [15:07] your data page" or 404. Some example: http://google.comw.profile.xfire.com/profile/lkayral/ (not my profile) vs. http://xfire.com/profile/lkayral/. The google.comw.xfire.com domain seems to be aimed at search crawlers, Javascript links on there won't work, some things (like full gaming history on profiles) are simply hidden with CSS attributes [15:11] *** bzc6p_ is now known as bzc6p [15:11] chrki: Welcome [15:11] It absolutely fits into ArchiveTeam's scope, thanks for the report! [15:12] You are not the only one: midas just mentioned it and found another way to access: http://208.88.178.38/profile/%profilename% [15:12] (it might be the same as that google.comw whatever) [15:13] chrki: do you have an idea how much that content is (num of profiles, amount in gigabytes) approximately? [15:13] Just to see the scale. [15:14] Wikipedia says 24 million users, each one probably has a profile, a friends and a screenshots page (although I would guess they are mostly empty), some profiles might be private, I don't know how much that could be [15:16] *** RichardG has joined #archiveteam [15:17] There are videos also, aren't they? [15:17] Yes there are [15:17] And we have two days... [15:17] Just checked a 34 second video, that's 3.2MB [15:20] I don't know if we'll be able to set up a project quickly, but by scale it must be a Warrior one. At least let's start the conversation. I'll devote my afternoon to it. [15:20] I suggest the channel #xfired [15:20] chrki: can you stay for some discussion? We need to discover the site structure very first. [15:21] 39505 videos alone in a month for the top 10 games http://webcache.googleusercontent.com/search?q=cache:r0zq17n_ifcJ:de.xfire.com/cms/stats/+&cd=1&hl=de&ct=clnk&gl=de [15:21] sure I'll be here [15:22] please come to #xfired and other interested too. We do what we can. [15:50] *** Start has quit IRC (Disconnected.) [15:51] *** mistym has quit IRC (Remote host closed the connection) [16:08] *** vOYtEC has joined #archiveteam [16:08] *** chrki has quit IRC (Quit: Leaving) [16:08] *** mistym has joined #archiveteam [16:11] *** nertzy has joined #archiveteam [16:22] So these xfire guys gave like 2 days for users to export their stuff (links are already broken) [16:22] Waiting queue is 21 houts [16:22] *hours [16:23] The site may be hosting 360,000,000 videos (short game recordings) [16:23] and who knows how many screenshots [16:23] everything nuked on friday [16:25] yikes [16:27] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [16:30] *** GLaDOS has quit IRC (Ping timeout: 252 seconds) [16:32] yow [16:36] *** GLaDOS has joined #archiveteam [16:49] *** aaaaaaaaa has joined #archiveteam [16:49] *** swebb sets mode: +o aaaaaaaaa [16:52] http://www.bbc.com/news/business-33076527 -> https://drive.google.com/file/d/0B-Kg8JC-9TqnN245SU1rT2Q3VDg/view [17:02] *** Start has joined #archiveteam [17:22] *** sb057 has joined #archiveteam [17:23] emojli ("joke", emoji-based social network) just announced they're shutting down July 30, and deleting everything [17:23] http://emoj.li/ [17:25] sb057: what scale? (e.g. how many users) [17:26] not sure, but it got featured in The Independent and Time, so presumably more than six [17:26] where does all the stuff you guys archive get stored [17:28] sunnymilk: we upload them to the Internet Archive [17:30] sb057: I've added it to our Deathwatch for now. Thanks for reporting. [17:40] *** Start has quit IRC (Disconnected.) [17:45] xfire has broken their profile page btw, cant access my own one. waiting for my data export [17:47] SimpBrain: you may find some useful information on repairing your profile page: http://www.reddit.com/r/Games/comments/39a41v/xfire_social_profiles_shutdown_save_your/ [17:47] cool [17:47] SimpBrain: how long is the queue and the waiting time? [17:47] dunno [17:47] Didn't it inform you? [17:47] im number 3958 [17:48] no time [17:48] well, looking at the comments [17:48] According to that reddit thread, it'll take long [17:48] You are #653 in queue (Approx 1306 minutes) [17:48] I don't know however how accurate that estimation is. [17:49] *** primus104 has joined #archiveteam [17:51] 135 hours if it is [17:52] actually closer to 136 [17:53] We just wondered with achip if we should archive the site even if we could. It would probably just slow down regular users' access even more. [17:57] *** c_b has joined #archiveteam [18:10] *** nertzy has joined #archiveteam [18:18] *** schbirid has joined #archiveteam [18:19] *** mutoso has quit IRC (Quit: leaving) [18:21] 3957 now lol, takes ages for xfire [18:24] Strange. Just 7z-ing a bunch of files. (Although it's just a waste of time to 7z vids and pics.) [18:30] *** sirdancea has joined #archiveteam [18:37] *** primus104 has quit IRC (Leaving.) [18:39] *** c_b has quit IRC (Ping timeout: 252 seconds) [18:50] *** nertzy has quit IRC (This computer has gone to sleep) [18:51] *** habi has joined #archiveteam [18:53] *** Start has joined #archiveteam [18:57] *** sirdancea has quit IRC (Read error: Operation timed out) [19:01] that is probably harder than it looks without affecting regular users. [19:01] *** habi has left [19:20] *** Start has quit IRC (Disconnected.) [19:21] *** Start has joined #archiveteam [19:25] *** Start has quit IRC (Client Quit) [19:25] *** aNthraXx has quit IRC (Read error: Operation timed out) [19:30] *** Start has joined #archiveteam [19:33] so how about those subreddits that just got deleted, eh? [19:34] /r/fatpeoplehate ? [19:35] sb057 drama is delicious [19:35] and others, and almost certainly more to follow [19:35] just stand from the sidelines and watch [19:36] well, I'm obviously no expert on AT's goals, but don't you think reddit might be worth archiving? [19:36] were the others marked as hateful like FPH? [19:36] kniffy there were like 4 other subs [19:36] FPH wasn't "hateful" [19:36] it was "harassing" [19:37] but all under 5k subs [19:37] *** mistym has quit IRC (Remote host closed the connection) [19:37] i dont think fatpeoplehate would be worth archiving [19:38] yeah, i think it would be questionable [19:38] right, but who can tell what's next? [19:38] slippery slope etc i understand [19:38] but finding creepshots of fat people with some ohsowitty caption isnt something we're lacking [19:38] maybe its just not on reddit/in a single subreddit anymore [19:47] *** aNthraXx has joined #archiveteam [19:52] *** mistym has joined #archiveteam [19:54] *** lolwhydoi has joined #archiveteam [19:54] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [19:54] now I look like a fool. [19:54] lolwhydoi: "yahoosucks" (without quotes) [19:55] lolwhydoi: yahoosucks [19:55] aaaaaaaaa: old! [19:55] ahaha that's the best secret word - thanks guys. [19:56] *** lolwhydoi has quit IRC (http://www.kiwiirc.com/ - A hand crafted IRC client) [19:58] *** sb058 has joined #archiveteam [19:59] autist neckbeard [20:00] Apathy_, kniffy, sb057: you can always save a single website with archive.org/save/URL [20:00] go right into the wayback machine [20:00] eeeeexcept if it's robots.txt protected, I don't know Reddit [20:00] yeah, I know [20:00] *** sb057 has quit IRC (Ping timeout: 252 seconds) [20:01] *** sb058 is now known as sb057 [20:01] its just that it can be hard to predict what is going to get the ban hammer next [20:01] yeah, there wasnt any warning of the takedowns [20:02] Once an ArchiveTeam project also touched some Reddit and some panic burst out: [20:02] http://www.reddit.com/r/privacy/comments/1emh4r/urgent_delete_any_old_reddit_posts_you_dont_want/ [20:02] plus, reddit itself shouldn't be that big [20:02] since its entirely hyperlinks and comments [20:03] plus some custom subreddit styling I guess [20:03] if one wanted to grab all images linked to the size would balloon insanely [20:03] I like the part when a guy writes "it's kinda hard to read it when half of the old threads have been butchered by idiots deleting posts that were contributing to the discussion" and then comes like five [deleted] [20:03] *** chazchaz_ has joined #archiveteam [20:03] *** chazchaz_ has quit IRC (Remote host closed the connection) [20:03] yeah kniffy, that would essentially amount to backing up all of imgur [20:03] exactly [20:04] i assume you guys heard of the whole row imgur caused by suddenly not wanting NSFW images [20:04] duno if they're going and deleting stuff [20:04] I thought that only applied to comments? [20:04] and that it was reversed? [20:04] tbh i've got no idea on either of those [20:04] i dont use imgur much [20:08] *** chazchaz_ has joined #archiveteam [20:15] *** jmc has joined #archiveteam [20:16] hah [20:21] *** Start has quit IRC (Disconnected.) [20:23] SimpBrain: someone reported on Reddit that his export finished. Guess if it was a complete export or an 1 mb 7z with zero screenshots out of 4,000. [20:23] (xfire) [20:23] nice [20:24] *** sb058 has joined #archiveteam [20:24] the 1mb export [20:24] I've waited long for an opportunity to post this [20:24] http://m.cdn.blog.hu/aj/ajemdibi/226166_453437977_big.jpg [20:25] *** McGEE has joined #archiveteam [20:25] well done, sir [20:25] what do they actually 7z for hours then, is a mistery [20:26] "We run 7z on an 8080" [20:26] *** n00b897_ has joined #archiveteam [20:26] lol [20:29] *** sb058_ has joined #archiveteam [20:31] *** Start has joined #archiveteam [20:31] *** sb057 has quit IRC (Ping timeout: 492 seconds) [20:32] *** sb058__ has joined #archiveteam [20:32] *** sb058__ is now known as sb057 [20:32] guess I should set up my znc for efnet eh [20:33] Movement is happening with POMF and HALO. [20:34] Usage of disk space is up from 6% to 15% but I suspect that's about to get fixed, so we're holding out. [20:34] yay [20:35] *** sb057 has quit IRC (Client Quit) [20:35] *** sb058_ has quit IRC (Ping timeout: 306 seconds) [20:35] *** sb058 has quit IRC (Read error: Operation timed out) [20:41] *** db48x has joined #archiveteam [20:42] *** mistym has quit IRC (Remote host closed the connection) [20:42] *** sb057 has joined #archiveteam [20:44] well, I'm obviously no expert on AT's goals, but don't you think reddit might be worth archiving? <-- reddit as a whole is really big, we do archive individual subreddits from time to time using #archivebot [20:44] there's nothing we can do if it's already deleted though [20:46] looking at the archivebot dashboard we're currently grabbing /r/IMGXXXX/ /r/thebutton/ /r/Xcom/ /r/internetcollection/ /r/news/ [20:48] if you have nominations that would be good to archive let us know, the normal wayback machine crawls do get quite a bit as well though so it's helpful to check https://web.archive.org/ for completeness first [20:50] ones we've done in the past: http://archive.fart.website/archivebot/viewer/domain/www.reddit.com [20:50] *** primus104 has joined #archiveteam [20:56] *** mistym has joined #archiveteam [21:05] *** sivoais has quit IRC (Remote host closed the connection) [21:07] /r/fatpeoplehate2/ might be a worthy target right now [21:08] considering it's all over /r/all [21:16] *** sivoais has joined #archiveteam [21:17] *** Start has quit IRC (Disconnected.) [21:17] *** DFJustin has quit IRC (Remote host closed the connection) [21:17] *** DFJustin has joined #archiveteam [21:17] *** swebb sets mode: +o DFJustin [21:18] added [21:20] *** Howl has joined #archiveteam [21:31] I always assumed our brothers-in-arms at /r/datahoarders were taking care of buziness. [21:32] 17:31 < BotoX_> dafuq is WARC [21:32] [it begins] [21:32] *** SketchCow changes topic to: Archive Team: We're not archive.org | http://archiveteam.org/ | lengthy/off-topic in #archiveteam-bs | < BotoX_> dafuq is WARC [21:33] haha [21:37] :D [21:44] weeaboos.. [21:47] #datahoarders on freenode [21:47] i dropped out of there a few days ago, hand't said anythong on channel in over 6 months [21:49] sb057: I think the admins said a few years back on world backup day the total size is like ~1 TB [21:49] I'd imagine it's grown to ~5 [21:49] but then again it might not be a complete backup, maybe only a "just save content but don't save things like css images" one [21:49] so real size would likely be bigger [21:50] Statistically, there's got to be one freak out there [21:52] here's the post: http://www.redditblog.com/2013/03/3rd-annual-world-backup-day-whats-in.html [21:52] so a little more over 1 TB [21:52] but it's compressed as hell [21:55] as for "would we not have archived FPH just because it's immoral"...I mean I agree that the subreddit needed to go at some point, but we shouldn't be the judges of whether or not something's moral and whether or not to archive it [21:55] you gotta stay neutral, otherwise bias reeks everywhere [21:55] "immoral" [21:55] nice meme [21:55] What is FPH [21:56] r/fatpeoplehate [21:56] Oh, oh. The fat people one. [21:56] The reddit drama, it's leaking. [21:56] That's the one that reminds me that if I cut someone off on the highway, they probably post on that board [21:56] And I feel better [21:56] Or when you feel bad because someone donated organs and was just a young kid on a motorcycle [21:56] He was probably on FPH [21:57] Then yu go "oh, that almost makes up for it" [21:58] Also, please [21:58] PLEASE [21:58] P L E A S E [21:58] People stop inviting Botox to do anything [21:58] Just let this little grab project finish and move on [22:00] *** Sue_ has quit IRC (Ping timeout: 252 seconds) [22:00] There is one freak in #datahoarder with over 33 TB content, frequents many many IRC networks and channels [22:00] As far as I remember [22:00] SketchCow isnt EVERYONE a valued member of the group? [22:00] No. [22:01] :) [22:01] I've been making much noise recently in these channels. I'll take back. Sorry. [22:02] Good night [22:02] o/ [22:03] WubTheCap: There is one freak in #datahoarder with 1.4PB of content [22:04] PB?! [22:04] a petabyte? thats a lotta hentai [22:04] gitorious continues apace [22:05] A buddy of mine had a gigabyte of disk space in his apartment in 1988. [22:05] He had to haul in ridiculous shit to do it, but he did it! [22:05] Then got bored with it, moved on. [22:05] gitorious: 663GB 42:37:13 [4.31MB/s] [====> ] 13% ETA 275:35:55 [22:05] Gave me some of it. [22:05] my dad sold single gigabytes when he worked for ibm in the 70s [22:06] would it be worth utilising the reddit api to archive subreddits instead of scraping? (if that isn't what's already being used) [22:06] probably [22:10] I think API has a limit of like 1000 posts [22:10] so say if you hit /r/tf2/new, you'd only get the most recent 1000 posts at most [22:10] same goes for profiles (some say it's an anti-dox measure here), past 1000 comments/submissions and you're done [22:11] Public profiles are hardly dox. [22:11] it's a decision that reddit made, not a thing to argue here [22:11] hmm, I think you can define a starting point, not sure if that will go past 1000 [22:16] yeah public profiles are hardly dox, that's true WubTheCap [22:16] (and is also what people said when people were flipping out in /r/privacy over google reader too) [22:20] *** Start has joined #archiveteam [22:21] SketchCow: did you remove the directory of trovebox from fos? [22:22] Making a small update to the scripts so we can finish that project tomorrow, but FOS isn't taking the files [22:23] SketchCow: at least 1gb of data in 1988 could at least be put on 2 cds [22:23] I..... assume [22:23] so there was a way to off load it [22:23] 18:05 <@SketchCow> A buddy of mine had a gigabyte of disk space in his apartment in 1988. [22:23] Maybe it was a terabyte and 1990 [22:23] I am very old [22:23] ok then [22:24] i vaguely remember terabyte parties being a thing [22:24] terabyte in 1990 your just screwed then [22:24] in less it last for a good 10 to 15 years [22:24] that way it can be at least moved [22:25] xmc: euphemism [22:25] oh yeah? [22:26] I just like it as a euphemism [22:26] it could be a good one [22:26] arkiver: I am sure I did. [22:27] SketchCow: ok, probably 50G-100G more will come from trovebox. Are you able to create the rsync again for trovebox? or maybe yipdw? [22:28] Ask yipw to [22:28] I am blasting through things on that machine, cleaning it up [22:29] Ok, so next is last.fm. Looking into the 18 items that keep failing, after those are finished we have saved the full forum in all languages from last.fm. [22:29] That was what we wanted to grab from last.fm right? (user content) [22:30] *** Sue_ has joined #archiveteam [22:32] Yes [22:32] Although I think we grabbed a lot, and it was an insider who thinks they're going to fuck it up bad [22:35] Ok, so only 18 (problematic) items left for lastfm and then that's done. Baraza is done. And I'll sort all the discovered sites of blogger in the coming days, so we can start on that too [22:35] well there is more to last.fm than forums, for example user profiles and comments on songs and artists [22:35] Two big project coming up in the coming months: SourceForge (starting this month), Google Code (starting end of august) [22:36] *** Howl has quit IRC (Quit: afk now) [22:36] I think events have comments too [22:37] SketchCow: I also think it'd be a good idea to start a torrentwebsites project. [22:38] Since torrentsites most of the time go offline without notice and contain a lot of metadata, comments, etc. I think we should start backing them up [22:39] We should then keep the .torrent files in an other pack then the rest of the websites and only add those .torrent files to the wayback machine after they're not working anymore (to prevent IA getting in trouble with them) [22:40] What do you think of that? Size shouldn't be too big, there's no videos, audios and only some images. [22:41] DFJustin: I'll have a look at those, thanks [22:42] Magnet urls might be a problem thought ^^ [22:43] though* [22:52] *** n00b897_ has quit IRC (Quit: Page closed) [23:04] *** chfoo has joined #archiveteam [23:14] ----------------------------------------- [23:15] ARCHIVE.ORG is swapping some internal things [23:15] As a result, things might act a little weird [23:15] if you see some weird today, like missing stuff or timeouts [23:15] now you know why. Not much to do, they're working [23:15] hard to get it done quickly. [23:15] ----------------------------------------- [23:37] *** Ymgve has quit IRC () [23:38] *** Muad-Dib has quit IRC (Ping timeout: 252 seconds) [23:46] *** TheLQ has joined #archiveteam