#newsgrabber 2017-10-28,Sat

Logs of this channel are not protected. You can protect them by a password.

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)


WhoWhatWhen
***kyan has joined #newsgrabber [00:18]
.......... (idle for 46mn)
Kazthat was a fun one.. put my ESXi host into maintenance mode before rebooting it, so none of the VMs ever came back online
so I couldn't get back into anything because pfSense wasn't online, so windows thought it was just not connected to anything
[01:04]
........ (idle for 37mn)
***nightowl has quit IRC (Ping timeout: 492 seconds) [01:42]
............................................................................... (idle for 6h32mn)
HCross2New discovery node in: Melbourne, Australia will be online soonTM [08:14]
............. (idle for 1h3mn)
***GK_1WM-SU has joined #newsgrabber
GK-1WM-SU has joined #newsgrabber
GK_1WM-SU has left
GK-1WM-SU has left
[09:17]
........................................ (idle for 3h19mn)
GK-1WM-SU has joined #newsgrabber
GK-1WM-SU has left
[12:37]
................... (idle for 1h31mn)
JensRexhttps://bpaste.net/show/5ad86addb74c
It's stuck here.
[14:08]
***JensRex has quit IRC (Remote host closed the connection)
JensRex has joined #newsgrabber
[14:12]
HCross2arkiver: also, the more discovery servers I add, the less URLs we get [14:25]
arkivermeans some server are not working
we'll fix that though
need to fix some things here anyway
and I have time again :)
[14:33]
HCross2JAA: was it you who was doing the wiki stuff? [14:35]
JAAHCross2: Negative. jrwr did that, I believe. [14:37]
HCross2ahh thanks, you both have similar names :p [14:37]
JAATrue [14:37]
................. (idle for 1h20mn)
HCross2arkiver: writing some Rwandan services now [15:57]
............ (idle for 58mn)
KazHCross2: share ops [16:55]
***HCross2 sets mode: +o Kaz [16:56]
arkiverHCross2: nice [16:56]
Kazhuh
irccloud is being weird, i don't see a part msg for arkiver
wait you're here
[16:56]
***Kaz sets mode: +o arkiver [16:57]
arkiveryep I'm here [16:57]
Kazhttps://s.kurt.gg/171dLKPa.png [16:58]
***Kaz has left
Kaz has joined #newsgrabber
[16:58]
Kazmuch better
was showing HCross2 as the only op on my end..
[16:58]
arkiverhuh
strange
little split I guess
hmm you never left though
bug in your client?
[16:59]
Kazi have some frankenstein setup
irccloud -> znc -> efnet
could really cut znc out of it these days, but nice to keep around
[16:59]
***svchfoo3 sets mode: +o Kaz [17:01]
........................... (idle for 2h11mn)
arkiverI'm experimenting with a decentralized crawler idea
could see it as a combination between warrior and archivebot
plan here is that there's no single point of failure anymore
while it's possible to run a 'normal' wget kind of crawl
basically set of servers that hold the parts of the database (each with some overlap to reduce change of data loss when a server is down)
crawler announces itself to one of these servers
get's connected to other server, etc.
crawlers themselves are not directly connected, but through these servers that hold the databases and manage the URLs
it would especially be useful in the case of large URLlists, or large websites
or wide crawls of the internet
[19:12]
this uses wget-lua, not wpull
at least that's the plan
[19:22]
....... (idle for 30mn)
JAAWhy wget-lua?
wpull is way more flexible. Also, wget-lua hasn't been updated in years, and I believe there have been some security issues in wget (which are probably unfixed in wget-lua).
[19:52]
arkiverwpull has 'strange' issues every now and then
for example with FTP, but also other crawls
[19:57]
JAAThis is probably the security issue I was thinking about: https://lists.debian.org/debian-security-announce/2014/msg00250.html
True, but it would be a good idea to fix those anyway.
[19:58]
arkiveryes [19:58]
JAAThe FTP issues should actually be fairly simple to fix.
I'm more worried about the random hangs.
And the "too many open files" issue.
[19:59]
arkiverfor now I'm using wget-lua in the test
but I can add wpull later on too
or both and make it optional which one to run
well, optional for someone who started a crawl to choose between the two
[19:59]
JAAI see. [20:00]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)