#newsgrabber 2017-07-22,Sat

Logs of this channel are not protected. You can protect them by a password.

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)


WhoWhatWhen
***Aranje has joined #newsgrabber [02:17]
........................................... (idle for 3h34mn)
Aranje has quit IRC (Ping timeout: 245 seconds) [05:51]
............................................................................................ (idle for 7h38mn)
HCross has quit IRC (Read error: Connection reset by peer)
HCross has joined #newsgrabber
[13:29]
............................ (idle for 2h17mn)
trvzshould my E3s with 40 threads be at 26+ load?
and 400GB cached
[15:46]
arkiveris that a warrior for this project? [15:55]
trvzyes, the usual from github
427G /home/archiveteam
and
358G /home/archiveteam
[15:57]
it's fine, let's see if they finish before the 2TB drives run full [16:02]
HCross2Hmm. arkiver I don't think all warriors are uploading sometimes [16:10]
................. (idle for 1h22mn)
arkiverHCross2: that's strange
trvz: did you edit the scripts in any way?
[17:32]
trvzno [17:36]
arkiverok, I'm looking into it
uploading now works for the screencapture bot :)
jrwr: have you had time yet for https://www.mediawiki.org/wiki/Extension:ReplaceSet ?
also HCross HCross2 ^
our first automatically uploaded screencapture is here: https://wiki.newsbuddy.net/File:Http_www_sercano_com.png
[17:44]
HCross2LOL. Top half is missing flashplayer [17:45]
arkiveryeah... I'm looking into flash support for cutycapt
though that'd also mean running the script is not secure
do we want that?
[17:46]
HCross2hm - could we run it in an isolated enviroment?
Im going to look into moving the Wiki images onto a CDN.. things may be dodgy for a second or so
[17:47]
arkiverI think so
nice
[17:47]
............ (idle for 58mn)
jrwrarkiver: Installed
HCross2: Meh
just need diskspace
no need for a CDN
/dev/sda1 9.8G 3.7G 5.6G 40% /
hrm
Might want to attach another disk :)
[18:46]
HCross2jrwr: arkiver please save what you are doing - disk space is coming
I need to shut the VM down
[18:55]
arkiverok
just uploaded a bit, but won't upload more
we're having a problem with the size of some images https://wiki.newsbuddy.net/File:Www_iwacu_burundi_org.png
[18:55]
jrwrI see you are running a backup client HCross2
Nice
[18:56]
HCross2jrwr: or trying too
I just shut the VM down - itll be back in a moment
[18:56]
arkiverthanks!
Can one of you please also set this? https://www.mediawiki.org/wiki/Manual:Errors_and_symptoms#Error_creating_thumbnail:_File_with_dimensions_greater_than_12.5_MP
It should fix the thumbnail creation problem we have with some images
[18:56]
HCross2disk upgrade is happening atm [19:00]
arkiverHCross2 :D
how much will we have?
[19:01]
HCross2check now :) /dev/sda1 20G 3.8G 15G 20% / [19:01]
jrwrupdated arkiver for thumbnails [19:02]
arkiveryes :) it's workingnow
screencaptures are now automatically added, working on the example log now
[19:02]
jrwrCool
Wow, its taking forever to convert these big ass images
[19:07]
arkiverI think they're all converted already...
the site is just down at the moment
[19:07]
HCross2load is at 2 atm [19:08]
arkiverare you still working on it HCross2? [19:08]
HCross2Im SSH'ed in and thats it
site loads for me
[19:08]
***blitzed has quit IRC (Remote host closed the connection) [19:09]
arkiveryep, working fine here [19:09]
jrwrit ram out of ram [19:09]
arkiverwas probably loading stuff [19:09]
jrwrimagemagik convert was eating 2GB of ram
well, trying to
bash: xmalloc: cannot allocate 4112 bytes (180224 bytes allocated)
[19:09]
arkiverfrom the website it looks like everything went fine... [19:10]
jrwrya
I added a 1GB swapfile
[19:10]
arkiverI'm removing the status field, since we have our own way of checking new pages now [19:12]
HCross2jrwr: at the moment - its using urbackup, but I want to see if I can get it onto Asigra at somepoint [19:12]
jrwrawesome [19:13]
HCross2arkiver: is there a way we can get it to pre screenshot all the sites we have in Git? [19:19]
arkiveryes
only problem is that we have to give all the sites names by hand...
[19:19]
HCross2ah right [19:19]
arkiverand check if the front page is really first in the list of URLs [19:19]
jrwrhttps://www.youtube.com/watch?v=atuFSv2bLa8 [19:23]
arkiverjust created https://wiki.newsbuddy.net/El_Heraldo screencapture should be there in a bit [19:23]
jrwrdo you like my logo arkiver :) [19:24]
HCross2So.. if I magically "vanish" you know why now. https://usercontent.irccloud-cdn.com/file/6KjHwxCx/image.png
I've somehow tripped their WAF
[19:24]
jrwrHA [19:25]
arkiverOH GOD
RIP HCross
[19:25]
HCross2By just pressing the back button on my browser [19:25]
jrwrhahhaa [19:25]
arkiverit's a fun logo jrwr :P [19:25]
HCross2I pressed back, it asked if I was sure as the site is a JS mess... I pressed yes and up the message came
The site now wont load
[19:25]
jrwrchange your useragent
and see if it loads
and clear cookies
[19:26]
HCross2flat out dead now [19:26]
jrwrOk archivebot
Claim your tits
[19:26]
HCross2ive stopped it
my faukt
[19:27]
trvzjust use a dedi for it? [19:30]
jrwrWe should have in the useragent (A Distributed Preservation of Service Attack)
for newsbuddy
HCross2: Im just going to do monthlies to IA (XML+Images)
[19:31]
HCross2ok [19:33]
arkiverwe can do dumps with the wikiteam tools
all went fine here :)
https://wiki.newsbuddy.net/El_Heraldo
https://wiki.newsbuddy.net/La_Prensa
[19:40]
jrwrYa
Since we have backend stuff, its much easier
[19:41]
.... (idle for 16mn)
arkiverwell, feel free to try and add some :) screencaptures work
working on example log now
is it possible to create a page like El_Heraldo/Example_discovered_URLs, and embed it with {{:El_Heraldo/Example_discovered_URLs}} in El_Heraldo, but then have the embed in El_Heraldo show for example 20 lines, and when you go to El_Heraldo/Example_discovered_URLs you see all the URLs?
Should we use a <noinclude> for that? like with https://wiki.newsbuddy.net/Form:Services and https://wiki.newsbuddy.net/Template:Services ?
and have 20 lines outside noinclude and the rest inside noinclude, which will only be visible on the page itself?
[19:57]
HCross2https://wiki.newsbuddy.net/BBC_News :) [20:04]
arkiver:D
very nice, looks like it's all working well
[20:05]
HCross2arkiver: https://wiki.newsbuddy.net/images/4/47/Www_bbc_co_uk_news.png hm - see how all the smaller images under "Full Story" do not load [20:06]
arkiverhmm, interesting [20:07]
HCross2and also, it seems to have slightly cut the top red bar [20:07]
jrwrok
all setup
https://wiki.newsbuddy.net/dumps/
[20:07]
HCross2https://usercontent.irccloud-cdn.com/file/7W1h55ON/image.png
is what it should look like
[20:07]
jrwrand it uploads to https://archive.org/details/NewsBuddyWiki-Dumps [20:07]
HCross2jrwr: ideally we'd want an item for each monthb
month, and then eventually a collection
[20:08]
arkiverHCross2: looking into it, I think the problem here is images that only load when they are scrolled by
will see if that can be fixed
[20:08]
HCross2or even a subcollection under the AT Wiki ones
arkiver: does your grabber not do phantomjs?
[20:08]
arkivernot sure
it does some kind of js
[20:09]
jrwrHCross2: its under /root/backup.sh if you want to do that, im not great with the IA tool [20:12]
HCross2jrwr: thanks - we'll need to get some setup done first
arkiver: who is the best person to speak to about an IA subcollection on the AT collection?
[20:12]
arkiverSketchCow [20:13]
HCross2Thanks - ill fling over an email now [20:13]
jrwrits set to run ever 15 days for good measure
every*
and its doing it under my account of course
and my account does have the wikiteam collection permissions :)
[20:18]
HCross2oh nice, pop it in there for now please [20:18]
jrwrI cant change a collection once its made [20:22]
arkiverHCross2: can't get it to work yet...
the images on bbc
might have to use a totally different program than cutycapt
[20:22]
HCross2its not the end of the world if it doesnt work [20:23]
arkiverfor now I'm sticking with cutycapt though
yeah
[20:23]
jrwrarkiver: headless chrome? [20:23]
HCross2arkiver: hm - are you using a Dutch IP? [20:23]
arkiveryes
^HCross2
[20:23]
HCross2it may be doing different things as the Wiki is a UK IP [20:23]
arkiverjrwr: Qt WebKit
HCross: I'm not sure what you mean exactly...
[20:24]
HCross2The BBC site changes content depending on where you view it from, the images may only work sometimes [20:24]
arkiverah yeah
HCross2: I edited the quotation marks out https://wiki.newsbuddy.net/index.php?title=BBC_News&type=revision&diff=293&oldid=291
HCross2: not sure how much we can do about that
[20:25]
HCross2thanks - i may have copied it out of Git :p [20:25]
arkiverI can imagine other websites look different again in other countries
but feel free to upload a new version of the image
this is the command
[20:25]
jrwrWTF
Well, Thats unexcepted
I was on my IA user profile, and went to change windows and hit Delete Account by accident (There is no confrim!)
bwhahha
[20:26]
HCross2OOOPS [20:27]
jrwrOh well
I was 2px off
[20:27]
HCross2jrwr: email info@ and hope the IA have an archive of your acccount [20:27]
jrwrmeh, didn't have anything good on it [20:27]
arkiverHCross2: install cutycapt and run
cutycapt --url=http://www.bbc.co.uk/news --out=www_bbc_co_uk_news.png --min-width=1920 --min-height=1080 --delay=10000
then upload new version of the image
[20:27]
HCross2arkiver: thanks - im not too worried as the content is there [20:28]
arkiveryep [20:28]
jrwrdid you see the new warrior I made HCross2 [20:34]
HCross2I havent [20:34]
jrwrhttps://ia601507.us.archive.org/31/items/AT-Warrior100G/Warrior-100G.ova
it uses Alpine Linux at its core (50MB) then Docker and installs the docker version of the warrior
[20:35]
HCross2hm - Ive got mainly HyperV and Proxmox here [20:35]
jrwryou can Extract the VMDK out
its a Zip file
[20:35]
HCross2ahh right, and then throw that into HyperV [20:36]
jrwrYa [20:36]
HCross2and then hopefully burn my hyperv box soon [20:36]
jrwranyway on first run it downloads all the docker images so its always up to date when its ran
much better system overall, since it combines efforts
anyway, that was a alt account I deleted, good thing my main is still around
[20:36]
arkiverI'm getting an $MW_WALL_CLOCK_LIMIT error on https://wiki.newsbuddy.net/Internet_Archive_Blog , can we please increase the $MW_WALL_CLOCK_LIMIT timeout? [20:43]
jrwrholy shit
what are you doing arkiver
[20:43]
arkiverit's the internet archive blog https://wiki.newsbuddy.net/images/d/d8/Blog_archive_org.png
kind of a big image
[20:44]
HCross2even a 200Mbps line under 15ms from the Wiki took a good 25 secconds [20:44]
jrwrgod
my browser crashed just looking at it
[20:45]
HCross2I always assume arkiver writes a coin miner into all his code :P [20:46]
arkiverhaha :) [20:46]
jrwrI updated it to 500s
and 3GB ram
[20:48]
arkiverawesome :)
jrwr: do you know how we can restart the creation of the image on? https://wiki.newsbuddy.net/Internet_Archive_Blog
[20:49]
jrwrOH
its imagemagic crashing
bwhahaha
[20:53]
HCross2Hehe. We seem to have a thing for breaking things recently [20:55]
jrwrfixed
Im keeping debug logging on
until arkiver breaks the wiki again
[21:01]
arkiver:) thanks [21:03]
jrwrthe debug logs are also included in the HTML commands in the source
so
if you break something on a page you can see it :)
[21:03]
HCross2jrwr: we need a custom "arkiver" setting that has all the limits set as high as they'll go [21:04]
jrwrmeh [21:05]
arkiverhaha [21:05]
jrwrw/e
Ugh
Un hungry and broke
I'm*
[21:05]
.... (idle for 17mn)
arkiverjrwr: are you able to delete user ScreenCaptureBot ?
the problem was that the wiki should have sended a mail with password, but it never came
so now it has no password
arkiver is hungry too
arkiver is off for some food
[21:22]
..... (idle for 20mn)
jrwryou can change its password [21:44]
.... (idle for 19mn)
arkiverstarting to look better and better now https://wiki.newsbuddy.net/El_Reportero
regexes and URLs are now each on a new line
[22:03]
jrwrI turned on subpages [22:12]
..... (idle for 23mn)
***gk_1wm_su has joined #newsgrabber
gk_1wm_su has left
[22:35]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)