#archiveteam 2015-05-06,Wed

↑back Search

Time Nickname Message
00:16 πŸ”— nertzy has quit IRC (Quit: This computer has gone to sleep)
00:18 πŸ”— signius has quit IRC (Ping timeout: 265 seconds)
00:30 πŸ”— signius has joined #archiveteam
00:32 πŸ”— primus104 has quit IRC (Leaving.)
00:40 πŸ”— mistym has quit IRC (Remote host closed the connection)
00:54 πŸ”— mistym has joined #archiveteam
01:08 πŸ”— schbirid2 has joined #archiveteam
01:10 πŸ”— schbirid has quit IRC (Read error: Operation timed out)
01:11 πŸ”— boozehoun has quit IRC (Ping timeout: 258 seconds)
01:19 πŸ”— boozehoun has joined #archiveteam
01:42 πŸ”— Ymgve has quit IRC ()
01:54 πŸ”— cadbury_ has joined #archiveteam
01:58 πŸ”— brayden has joined #archiveteam
02:03 πŸ”— aNthraXx has joined #archiveteam
02:47 πŸ”— rejon has joined #archiveteam
02:48 πŸ”— lytv has quit IRC (Ping timeout: 252 seconds)
02:52 πŸ”— lytv has joined #archiveteam
03:58 πŸ”— mistym has quit IRC (Remote host closed the connection)
04:29 πŸ”— mistym has joined #archiveteam
05:03 πŸ”— Ctrl-S has quit IRC ( HydraIRC -> http://www.hydrairc.com <- In tests, 0x09 out of 0x0A l33t h4x0rz prefer it :))
05:10 πŸ”— Ctrl-S has joined #archiveteam
05:48 πŸ”— marvinw has quit IRC (Read error: Operation timed out)
06:13 πŸ”— marvinw has joined #archiveteam
06:15 πŸ”— Control-S has joined #archiveteam
06:19 πŸ”— Ctrl-S has quit IRC (Read error: Operation timed out)
06:19 πŸ”— Control-S is now known as Ctrl-S
06:20 πŸ”— primus104 has joined #archiveteam
06:51 πŸ”— mistym has quit IRC (Remote host closed the connection)
07:11 πŸ”— dinomite has quit IRC (Remote host closed the connection)
07:16 πŸ”— dinomite has joined #archiveteam
07:19 πŸ”— atomotic has joined #archiveteam
07:51 πŸ”— mistym has joined #archiveteam
08:02 πŸ”— schbirid2 wp494: kniffy: that grooveshark.io thing is a scam... jesus, have some sense...
08:02 πŸ”— schbirid2 it's disgusting how many "reputable" websites jump on it
08:05 πŸ”— mistym has quit IRC (Read error: Operation timed out)
08:07 πŸ”— dinomite has quit IRC (Read error: Operation timed out)
08:16 πŸ”— dinomite has joined #archiveteam
08:22 πŸ”— MMovie has joined #archiveteam
08:24 πŸ”— MMovie1 has quit IRC (Ping timeout: 306 seconds)
08:29 πŸ”— dinomite has quit IRC (Remote host closed the connection)
08:29 πŸ”— dinomite has joined #archiveteam
08:36 πŸ”— BlueMaxim clever idea though. jump on a dead site's name and use it for advertising
09:06 πŸ”— Ctrl-S or gathering user info
09:06 πŸ”— Ctrl-S how many would use the same PW?
09:07 πŸ”— atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
09:39 πŸ”— Nemo_bis For archivebot: http://datasets.wikimedia.org/ (only root currently available in wayback machine)
09:41 πŸ”— mistym has joined #archiveteam
09:55 πŸ”— mistym has quit IRC (Read error: Operation timed out)
10:42 πŸ”— BlueMaxim has quit IRC (Quit: Leaving)
10:46 πŸ”— [Beta] has joined #archiveteam
10:48 πŸ”— john1 has quit IRC (Ping timeout: 252 seconds)
10:49 πŸ”— [Beta] was anyone able to grab P.T. before it vanished off playstation store? saw the bot grabbed the konami page for it…
10:50 πŸ”— primus104 has quit IRC (Leaving.)
11:03 πŸ”— john1 has joined #archiveteam
11:28 πŸ”— midas pt?
11:29 πŸ”— mistym has joined #archiveteam
11:32 πŸ”— Ymgve has joined #archiveteam
11:39 πŸ”— mistym has quit IRC (Read error: Operation timed out)
12:08 πŸ”— Rotab that silent hill demo
12:08 πŸ”— Rotab Playable Teaser
12:32 πŸ”— atomotic has joined #archiveteam
12:34 πŸ”— quEt has joined #archiveteam
12:34 πŸ”— quEt has quit IRC (Client Quit)
12:47 πŸ”— sankin has joined #archiveteam
13:16 πŸ”— primus104 has joined #archiveteam
13:51 πŸ”— garyrh has quit IRC (http://bnc4free.com/)
13:58 πŸ”— Start has quit IRC (Disconnected.)
14:19 πŸ”— mistym has joined #archiveteam
14:33 πŸ”— mistym has quit IRC (Read error: Operation timed out)
14:35 πŸ”— Start has joined #archiveteam
14:38 πŸ”— mistym has joined #archiveteam
14:38 πŸ”— caber has quit IRC (Read error: Operation timed out)
14:41 πŸ”— mistym has quit IRC (Remote host closed the connection)
14:41 πŸ”— caber has joined #archiveteam
14:57 πŸ”— cirdan_ has joined #archiveteam
14:59 πŸ”— cirdan_ hey all. have a question about trying to archive a drupal site. I'm using httrack and it goes ok, but by the end i have thousands of files like index.html index398.html games894-html. there should only be a few of them, I'm using a rewrite because it uses page= for page numbers
14:59 πŸ”— Start has quit IRC (Disconnected.)
15:00 πŸ”— goekesmi has quit IRC (Remote host closed the connection)
15:00 πŸ”— cirdan_ any ideas to stop this? It seems to happen at the end of the scrape
15:01 πŸ”— atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
15:02 πŸ”— goekesmi has joined #archiveteam
15:03 πŸ”— Start has joined #archiveteam
15:06 πŸ”— mistym has joined #archiveteam
15:13 πŸ”— achip cirdan_: to me it sounds like a pagination that continually has a "next" link that is <this page number>+<page offset> even if there isn't more content, do you have an example?
15:16 πŸ”— Start has quit IRC (Disconnected.)
15:20 πŸ”— cirdan_ I'm trying to snapshot http://macintoshgarden.org
15:21 πŸ”— cirdan_ I'm trying something different now so I don't have anything downloaded atm
15:21 πŸ”— cirdan_ it's a drupal 6 site
15:25 πŸ”— cirdan_ I was thinking maybe it takes so long and the site is set to not cache, that it was re-getting all the indices at the end again
15:26 πŸ”— cirdan_ the odd thing also if you give it an invalid page link it'll take you to page 1
15:36 πŸ”— DFJustin drupal has a lot of dumb things that mess up crawling, you'd probably be better running it through archivebot which has some anti-drupal measures
15:40 πŸ”— nertzy has joined #archiveteam
15:42 πŸ”— balrog or using wpull directly
15:44 πŸ”— Start has joined #archiveteam
15:46 πŸ”— cirdan_ :p
15:49 πŸ”— Start has quit IRC (Read error: Connection reset by peer)
15:49 πŸ”— nertzy has quit IRC (This computer has gone to sleep)
15:50 πŸ”— cirdan_ yeah i'll try wpull
15:50 πŸ”— cirdan_ sny special settings needed?
15:51 πŸ”— DFJustin well I don't know how much of the smarts are in wpull as opposed to higher layers
15:52 πŸ”— mistym has quit IRC (Remote host closed the connection)
16:08 πŸ”— primus104 has quit IRC (Leaving.)
16:11 πŸ”— mistym has joined #archiveteam
16:13 πŸ”— c_b has joined #archiveteam
16:21 πŸ”— Start has joined #archiveteam
16:24 πŸ”— garyrh has joined #archiveteam
16:39 πŸ”— cirdan_ hmm enabling compression seemed to not work. was telling me server misbehaved
16:39 πŸ”— cirdan_ removing it worked
16:39 πŸ”— cirdan_ odd because the server has compression on
16:43 πŸ”— Start has quit IRC (Disconnected.)
16:47 πŸ”— signius has quit IRC (Ping timeout: 240 seconds)
16:48 πŸ”— cirdan_ why does --exclude-directories not work? i have --exclude-directories "/sites/macintoshgarden.org/files/games/" and it wants to download from it
16:49 πŸ”— cirdan_ i have 5 I don't want, and 5 --exclude-directories
16:50 πŸ”— balrog you mean --reject-regex ?
16:51 πŸ”— cirdan_ no
16:52 πŸ”— cirdan_ i mean --exclude-domains: don’t download paths in LIST
16:53 πŸ”— cirdan_ err --exclude-directories
16:58 πŸ”— cirdan_ if you can only have one, the command should fail with multiple on the command line
16:59 πŸ”— cirdan_ it also doesn't say how multiple entries should be delimited… space, comma, colon?
17:00 πŸ”— signius has joined #archiveteam
17:06 πŸ”— aaaaaaaaa has joined #archiveteam
17:07 πŸ”— mistym has quit IRC (Remote host closed the connection)
17:12 πŸ”— mistym has joined #archiveteam
17:15 πŸ”— SimpBrain has joined #archiveteam
17:23 πŸ”— primus104 has joined #archiveteam
17:25 πŸ”— c_b has quit IRC (Quit: c_b)
17:44 πŸ”— primus104 has quit IRC (Leaving.)
17:44 πŸ”— xmc has quit IRC (Remote host closed the connection)
17:45 πŸ”— xmc has joined #archiveteam
17:48 πŸ”— nertzy has joined #archiveteam
18:16 πŸ”— nertzy has quit IRC (This computer has gone to sleep)
18:32 πŸ”— primus104 has joined #archiveteam
18:37 πŸ”— RichardG_ is now known as RichardG
18:47 πŸ”— habi has joined #archiveteam
18:49 πŸ”— habi has left
18:58 πŸ”— godane has quit IRC (Ping timeout: 265 seconds)
19:18 πŸ”— godane has joined #archiveteam
19:22 πŸ”— cirdan_ has quit IRC (Ping timeout: 240 seconds)
19:26 πŸ”— Ymgve has quit IRC (Ping timeout: 506 seconds)
19:30 πŸ”— Ymgve has joined #archiveteam
19:51 πŸ”— habi has joined #archiveteam
19:54 πŸ”— SN4T14_ has joined #archiveteam
19:56 πŸ”— habi has left
20:02 πŸ”— SN4T14 has quit IRC (Ping timeout: 512 seconds)
20:02 πŸ”— mistym has quit IRC (Remote host closed the connection)
20:02 πŸ”— mistym has joined #archiveteam
20:08 πŸ”— Deewiant https://www.reddit.com/r/DataHoarder/comments/3532q9/longterm_retention/ if anybody knows (somebody who knows) how to get data off of those, consider contacting the guy before they're lost
20:12 πŸ”— aaaaaaaaa seems somewhat funny to have someone on datahoarder with 72TB talking about destroying something like that.
20:13 πŸ”— Sanqui somebody should post IABAK on that subreddit
20:17 πŸ”— balrog those look like 9 track tapes
20:17 πŸ”— balrog there are people who have equipment
20:17 πŸ”— balrog (cctech mailing list, etc)
20:27 πŸ”— pikhq Shit, that quantity of tapes you could probably at least find someone willing to take 'em and do the searching on their own.
20:31 πŸ”— mistym has quit IRC (Remote host closed the connection)
20:44 πŸ”— mistym has joined #archiveteam
20:45 πŸ”— BlueMaxim has joined #archiveteam
20:51 πŸ”— sankin has quit IRC (Leaving.)
21:08 πŸ”— goekesmi bexitexit
21:24 πŸ”— xmc balrog DFJustin ersi Lord_Nigh underscor yipdw: spread the @s
21:24 πŸ”— ersi No!
21:24 πŸ”— xmc D:
21:24 πŸ”— underscor sets mode: +o xmc
21:27 πŸ”— xmc sets mode: +oooo chfoo SketchCow joepie91 closure
21:28 πŸ”— SimpBrain has quit IRC (Quit: Leaving)
21:33 πŸ”— mistym has quit IRC (Remote host closed the connection)
21:36 πŸ”— SketchCow FOS is sort of healed from Halo
21:36 πŸ”— SketchCow I'd like it to be 100% free of Halo before we ramp it up again
21:36 πŸ”— SketchCow But it's going well in that direction
21:38 πŸ”— SketchCow It was previously at, like, 85 Halo 40gb units
21:38 πŸ”— SketchCow Now at 3
21:38 πŸ”— SketchCow 5tb free on that partition
21:38 πŸ”— SketchCow So that bodes well
21:47 πŸ”— mistym has joined #archiveteam
21:54 πŸ”— Ymgve has quit IRC ()
22:06 πŸ”— arkiver SketchCow: Kenshin is holding around 1T of google baraza items
22:06 πŸ”— arkiver when the project is fully done, do you have room on FOS for them?
22:13 πŸ”— SketchCow I do
22:16 πŸ”— DFJustin has quit IRC (IMHOSTFU)
22:16 πŸ”— rolf has joined #archiveteam
22:17 πŸ”— DFJustin has joined #archiveteam
22:17 πŸ”— Start has joined #archiveteam
22:18 πŸ”— arkiver ok, they'll be synced over when the project is done
22:21 πŸ”— toad2 has joined #archiveteam
22:27 πŸ”— toad1 has quit IRC (Read error: Operation timed out)
22:45 πŸ”— rolf has quit IRC (Linkinus - http://linkinus.com)
22:58 πŸ”— Lord_Nigh SketchCow: halo? as in what exactly? the game?
23:08 πŸ”— SketchCow Shhhh
23:08 πŸ”— SketchCow It's handled.
23:08 πŸ”— SketchCow It's a project that's been going on. It'll come back. I had it going and it flooded our buffer.

irclogger-viewer