#archiveteam 2019-08-03,Sat

↑back Search

Time Nickname Message
00:20 🔗 satoshi has joined #archiveteam
00:37 🔗 hive-mind has joined #archiveteam
00:37 🔗 hive-min1 has quit IRC (Read error: Connection reset by peer)
00:38 🔗 BlueMax has joined #archiveteam
01:56 🔗 sirvy_ has joined #archiveteam
02:16 🔗 killsushi has quit IRC (Quit: Leaving)
02:39 🔗 satoshi has quit IRC (Remote host closed the connection)
02:41 🔗 Raccoon has joined #archiveteam
02:47 🔗 Fusl has quit IRC (Quit: K-Lined)
02:47 🔗 Fusl__ has quit IRC (Quit: K-Lined)
02:48 🔗 Fusl has joined #archiveteam
02:48 🔗 svchfoo3 sets mode: +o Fusl
02:49 🔗 Fusl is now known as Fusl__
02:49 🔗 Fusl_ sets mode: +o Fusl__
02:49 🔗 Fusl has joined #archiveteam
02:49 🔗 Fusl__ sets mode: +o Fusl
02:50 🔗 Fusl_ sets mode: +o Fusl
02:51 🔗 Fusl__ has quit IRC (Client Quit)
02:52 🔗 Fusl__ has joined #archiveteam
02:52 🔗 Fusl_ sets mode: +o Fusl__
02:52 🔗 Fusl sets mode: +o Fusl__
03:41 🔗 m007a83_ has joined #archiveteam
03:44 🔗 qw3rty116 has joined #archiveteam
03:45 🔗 m007a83 has quit IRC (Ping timeout: 252 seconds)
03:50 🔗 qw3rty115 has quit IRC (Ping timeout: 600 seconds)
03:54 🔗 odemgi_ has joined #archiveteam
03:56 🔗 odemg has quit IRC (Read error: Operation timed out)
04:00 🔗 odemgi has quit IRC (Read error: Operation timed out)
04:10 🔗 odemg has joined #archiveteam
04:46 🔗 Flashfloo has quit IRC (The Lounge - https://thelounge.chat)
04:46 🔗 Flashfire has quit IRC (Quit: The Lounge - https://thelounge.chat)
04:46 🔗 kiska has quit IRC (Quit: The Lounge - https://thelounge.chat)
04:46 🔗 Flashfloo has joined #archiveteam
04:46 🔗 kiska has joined #archiveteam
04:46 🔗 Fusl sets mode: +o kiska
04:46 🔗 Fusl__ sets mode: +o kiska
04:46 🔗 Fusl_ sets mode: +o kiska
04:46 🔗 Flashfire has joined #archiveteam
05:21 🔗 cerca has quit IRC (Leaving)
05:37 🔗 dhyan_nat has joined #archiveteam
05:42 🔗 Ivy has quit IRC (Quit: Connection closed for inactivity)
05:50 🔗 m007a83 has joined #archiveteam
05:53 🔗 m007a83_ has quit IRC (Ping timeout: 252 seconds)
07:46 🔗 jut has joined #archiveteam
09:18 🔗 killsushi has joined #archiveteam
09:38 🔗 magus_bgf has joined #archiveteam
09:56 🔗 dhyan_nat has quit IRC (Read error: Operation timed out)
09:58 🔗 magus_bgf has quit IRC (Read error: Operation timed out)
10:01 🔗 magus_bgf has joined #archiveteam
10:14 🔗 magus_bgf Hey guys. I'm looking for advice. Need to archive (continuously) a few dozen sites, up to 100-200 hundred thousand pages. Started with wget/bash, but they no longer cut it. Need something that supports incremental crawls, smart error handling/crawl delays/url parameter handling. Some reports would be nice, but preferably no database. Most importantly, it should be easy to restore a site from the archive, at least in html form
10:14 🔗 magus_bgf (and from what I understand, restoring from warc is not). So, what would be a good tool for this?
10:21 🔗 magus_bgf has quit IRC (Read error: Connection reset by peer)
10:22 🔗 magus_bgf has joined #archiveteam
10:25 🔗 ivan_ it sounds like you have an exciting life of writing web crawler software ahead of you
10:25 🔗 magus_bgf has quit IRC (Remote host closed the connection)
10:28 🔗 magus_bgf has joined #archiveteam
10:29 🔗 Dragnog has quit IRC (Ping timeout: 246 seconds)
10:29 🔗 ivan_ I think Heritrix supports incremental crawls?
10:29 🔗 ivan_ let's take this to #archiveteam-ot
10:29 🔗 magus_bgf is it offtopic here? sorry
10:34 🔗 magus_bgf has left Leaving
11:31 🔗 zhongfu has quit IRC (Quit: cya losers)
11:33 🔗 zhongfu has joined #archiveteam
11:45 🔗 VADemon has quit IRC (Quit: left4dead)
12:10 🔗 godane has joined #archiveteam
12:31 🔗 BlueMax has quit IRC (Quit: Leaving)
14:59 🔗 killsushi has quit IRC (Read error: Connection reset by peer)
15:00 🔗 killsushi has joined #archiveteam
15:29 🔗 BartoCH has quit IRC (Ping timeout: 615 seconds)
15:31 🔗 deetwelve has quit IRC (Ping timeout: 745 seconds)
15:37 🔗 deetwelve has joined #archiveteam
15:46 🔗 killsushi has quit IRC (Quit: Leaving)
16:05 🔗 dhyan_nat has joined #archiveteam
16:42 🔗 godane has quit IRC (Ping timeout: 600 seconds)
16:44 🔗 Selanda has quit IRC (Quit: Lost terminal)
16:46 🔗 dhyan_nat has quit IRC (Read error: Operation timed out)
17:27 🔗 satoshi has joined #archiveteam
18:02 🔗 bsmith093 has joined #archiveteam
18:14 🔗 cerca has joined #archiveteam
18:27 🔗 Selanda has joined #archiveteam
18:49 🔗 BartoCH has joined #archiveteam
19:01 🔗 thejsa_ has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.)
19:02 🔗 thejsa has joined #archiveteam
20:03 🔗 Cameron_D has quit IRC (Read error: Operation timed out)
20:53 🔗 Ivy has joined #archiveteam
21:57 🔗 Cameron_D has joined #archiveteam
22:41 🔗 Pixi has quit IRC (Quit: Pixi)
23:01 🔗 Pixi has joined #archiveteam

irclogger-viewer