[00:16] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [00:18] *** signius has quit IRC (Ping timeout: 265 seconds) [00:30] *** signius has joined #archiveteam [00:32] *** primus104 has quit IRC (Leaving.) [00:40] *** mistym has quit IRC (Remote host closed the connection) [00:54] *** mistym has joined #archiveteam [01:08] *** schbirid2 has joined #archiveteam [01:10] *** schbirid has quit IRC (Read error: Operation timed out) [01:11] *** boozehoun has quit IRC (Ping timeout: 258 seconds) [01:19] *** boozehoun has joined #archiveteam [01:42] *** Ymgve has quit IRC () [01:54] *** cadbury_ has joined #archiveteam [01:58] *** brayden has joined #archiveteam [02:03] *** aNthraXx has joined #archiveteam [02:47] *** rejon has joined #archiveteam [02:48] *** lytv has quit IRC (Ping timeout: 252 seconds) [02:52] *** lytv has joined #archiveteam [03:58] *** mistym has quit IRC (Remote host closed the connection) [04:29] *** mistym has joined #archiveteam [05:03] *** Ctrl-S has quit IRC ( HydraIRC -> http://www.hydrairc.com <- In tests, 0x09 out of 0x0A l33t h4x0rz prefer it :)) [05:10] *** Ctrl-S has joined #archiveteam [05:48] *** marvinw has quit IRC (Read error: Operation timed out) [06:13] *** marvinw has joined #archiveteam [06:15] *** Control-S has joined #archiveteam [06:19] *** Ctrl-S has quit IRC (Read error: Operation timed out) [06:19] *** Control-S is now known as Ctrl-S [06:20] *** primus104 has joined #archiveteam [06:51] *** mistym has quit IRC (Remote host closed the connection) [07:11] *** dinomite has quit IRC (Remote host closed the connection) [07:16] *** dinomite has joined #archiveteam [07:19] *** atomotic has joined #archiveteam [07:51] *** mistym has joined #archiveteam [08:02] wp494: kniffy: that grooveshark.io thing is a scam... jesus, have some sense... [08:02] it's disgusting how many "reputable" websites jump on it [08:05] *** mistym has quit IRC (Read error: Operation timed out) [08:07] *** dinomite has quit IRC (Read error: Operation timed out) [08:16] *** dinomite has joined #archiveteam [08:22] *** MMovie has joined #archiveteam [08:24] *** MMovie1 has quit IRC (Ping timeout: 306 seconds) [08:29] *** dinomite has quit IRC (Remote host closed the connection) [08:29] *** dinomite has joined #archiveteam [08:36] clever idea though. jump on a dead site's name and use it for advertising [09:06] or gathering user info [09:06] how many would use the same PW? [09:07] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [09:39] For archivebot: http://datasets.wikimedia.org/ (only root currently available in wayback machine) [09:41] *** mistym has joined #archiveteam [09:55] *** mistym has quit IRC (Read error: Operation timed out) [10:42] *** BlueMaxim has quit IRC (Quit: Leaving) [10:46] *** [Beta] has joined #archiveteam [10:48] *** john1 has quit IRC (Ping timeout: 252 seconds) [10:49] <[Beta]> was anyone able to grab P.T. before it vanished off playstation store? saw the bot grabbed the konami page for it… [10:50] *** primus104 has quit IRC (Leaving.) [11:03] *** john1 has joined #archiveteam [11:28] pt? [11:29] *** mistym has joined #archiveteam [11:32] *** Ymgve has joined #archiveteam [11:39] *** mistym has quit IRC (Read error: Operation timed out) [12:08] that silent hill demo [12:08] Playable Teaser [12:32] *** atomotic has joined #archiveteam [12:34] *** quEt has joined #archiveteam [12:34] *** quEt has quit IRC (Client Quit) [12:47] *** sankin has joined #archiveteam [13:16] *** primus104 has joined #archiveteam [13:51] *** garyrh has quit IRC (http://bnc4free.com/) [13:58] *** Start has quit IRC (Disconnected.) [14:19] *** mistym has joined #archiveteam [14:33] *** mistym has quit IRC (Read error: Operation timed out) [14:35] *** Start has joined #archiveteam [14:38] *** mistym has joined #archiveteam [14:38] *** caber has quit IRC (Read error: Operation timed out) [14:41] *** mistym has quit IRC (Remote host closed the connection) [14:41] *** caber has joined #archiveteam [14:57] *** cirdan_ has joined #archiveteam [14:59] hey all. have a question about trying to archive a drupal site. I'm using httrack and it goes ok, but by the end i have thousands of files like index.html index398.html games894-html. there should only be a few of them, I'm using a rewrite because it uses page= for page numbers [14:59] *** Start has quit IRC (Disconnected.) [15:00] *** goekesmi has quit IRC (Remote host closed the connection) [15:00] any ideas to stop this? It seems to happen at the end of the scrape [15:01] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [15:02] *** goekesmi has joined #archiveteam [15:03] *** Start has joined #archiveteam [15:06] *** mistym has joined #archiveteam [15:13] cirdan_: to me it sounds like a pagination that continually has a "next" link that is + even if there isn't more content, do you have an example? [15:16] *** Start has quit IRC (Disconnected.) [15:20] I'm trying to snapshot http://macintoshgarden.org [15:21] I'm trying something different now so I don't have anything downloaded atm [15:21] it's a drupal 6 site [15:25] I was thinking maybe it takes so long and the site is set to not cache, that it was re-getting all the indices at the end again [15:26] the odd thing also if you give it an invalid page link it'll take you to page 1 [15:36] drupal has a lot of dumb things that mess up crawling, you'd probably be better running it through archivebot which has some anti-drupal measures [15:40] *** nertzy has joined #archiveteam [15:42] or using wpull directly [15:44] *** Start has joined #archiveteam [15:46] :p [15:49] *** Start has quit IRC (Read error: Connection reset by peer) [15:49] *** nertzy has quit IRC (This computer has gone to sleep) [15:50] yeah i'll try wpull [15:50] sny special settings needed? [15:51] well I don't know how much of the smarts are in wpull as opposed to higher layers [15:52] *** mistym has quit IRC (Remote host closed the connection) [16:08] *** primus104 has quit IRC (Leaving.) [16:11] *** mistym has joined #archiveteam [16:13] *** c_b has joined #archiveteam [16:21] *** Start has joined #archiveteam [16:24] *** garyrh has joined #archiveteam [16:39] hmm enabling compression seemed to not work. was telling me server misbehaved [16:39] removing it worked [16:39] odd because the server has compression on [16:43] *** Start has quit IRC (Disconnected.) [16:47] *** signius has quit IRC (Ping timeout: 240 seconds) [16:48] why does --exclude-directories not work? i have --exclude-directories "/sites/macintoshgarden.org/files/games/" and it wants to download from it [16:49] i have 5 I don't want, and 5 --exclude-directories [16:50] you mean --reject-regex ? [16:51] no [16:52] i mean --exclude-domains: don’t download paths in LIST [16:53] err --exclude-directories [16:58] if you can only have one, the command should fail with multiple on the command line [16:59] it also doesn't say how multiple entries should be delimited… space, comma, colon? [17:00] *** signius has joined #archiveteam [17:06] *** aaaaaaaaa has joined #archiveteam [17:07] *** mistym has quit IRC (Remote host closed the connection) [17:12] *** mistym has joined #archiveteam [17:15] *** SimpBrain has joined #archiveteam [17:23] *** primus104 has joined #archiveteam [17:25] *** c_b has quit IRC (Quit: c_b) [17:44] *** primus104 has quit IRC (Leaving.) [17:44] *** xmc has quit IRC (Remote host closed the connection) [17:45] *** xmc has joined #archiveteam [17:48] *** nertzy has joined #archiveteam [18:16] *** nertzy has quit IRC (This computer has gone to sleep) [18:32] *** primus104 has joined #archiveteam [18:37] *** RichardG_ is now known as RichardG [18:47] *** habi has joined #archiveteam [18:49] *** habi has left [18:58] *** godane has quit IRC (Ping timeout: 265 seconds) [19:18] *** godane has joined #archiveteam [19:22] *** cirdan_ has quit IRC (Ping timeout: 240 seconds) [19:26] *** Ymgve has quit IRC (Ping timeout: 506 seconds) [19:30] *** Ymgve has joined #archiveteam [19:51] *** habi has joined #archiveteam [19:54] *** SN4T14_ has joined #archiveteam [19:56] *** habi has left [20:02] *** SN4T14 has quit IRC (Ping timeout: 512 seconds) [20:02] *** mistym has quit IRC (Remote host closed the connection) [20:02] *** mistym has joined #archiveteam [20:08] https://www.reddit.com/r/DataHoarder/comments/3532q9/longterm_retention/ if anybody knows (somebody who knows) how to get data off of those, consider contacting the guy before they're lost [20:12] seems somewhat funny to have someone on datahoarder with 72TB talking about destroying something like that. [20:13] somebody should post IABAK on that subreddit [20:17] those look like 9 track tapes [20:17] there are people who have equipment [20:17] (cctech mailing list, etc) [20:27] Shit, that quantity of tapes you could probably at least find someone willing to take 'em and do the searching on their own. [20:31] *** mistym has quit IRC (Remote host closed the connection) [20:44] *** mistym has joined #archiveteam [20:45] *** BlueMaxim has joined #archiveteam [20:51] *** sankin has quit IRC (Leaving.) [21:08] bexitexit [21:24] balrog DFJustin ersi Lord_Nigh underscor yipdw: spread the @s [21:24] No! [21:24] D: [21:24] *** underscor sets mode: +o xmc [21:27] *** xmc sets mode: +oooo chfoo SketchCow joepie91 closure [21:28] *** SimpBrain has quit IRC (Quit: Leaving) [21:33] *** mistym has quit IRC (Remote host closed the connection) [21:36] FOS is sort of healed from Halo [21:36] I'd like it to be 100% free of Halo before we ramp it up again [21:36] But it's going well in that direction [21:38] It was previously at, like, 85 Halo 40gb units [21:38] Now at 3 [21:38] 5tb free on that partition [21:38] So that bodes well [21:47] *** mistym has joined #archiveteam [21:54] *** Ymgve has quit IRC () [22:06] SketchCow: Kenshin is holding around 1T of google baraza items [22:06] when the project is fully done, do you have room on FOS for them? [22:13] I do [22:16] *** DFJustin has quit IRC (IMHOSTFU) [22:16] *** rolf has joined #archiveteam [22:17] *** DFJustin has joined #archiveteam [22:17] *** Start has joined #archiveteam [22:18] ok, they'll be synced over when the project is done [22:21] *** toad2 has joined #archiveteam [22:27] *** toad1 has quit IRC (Read error: Operation timed out) [22:45] *** rolf has quit IRC (Linkinus - http://linkinus.com) [22:58] SketchCow: halo? as in what exactly? the game? [23:08] Shhhh [23:08] It's handled. [23:08] It's a project that's been going on. It'll come back. I had it going and it flooded our buffer.