#archiveteam-bs 2016-08-21,Sun

↑back Search

Time Nickname Message
00:28 🔗 BlueMaxim has joined #archiveteam-bs
00:54 🔗 dashcloud has quit IRC (Ping timeout: 260 seconds)
00:56 🔗 dashcloud has joined #archiveteam-bs
01:10 🔗 JesseW has joined #archiveteam-bs
01:14 🔗 godane so i'm doing a different brute force method for SBS
01:15 🔗 godane https://archive.org/details/www.sbs.com.au-news-node-190k-20160820
01:16 🔗 godane i'm now doing like 7k url sets at once
01:16 🔗 godane i had to do this cause in the 190k area its going from odd to even back to odd numbers
01:19 🔗 godane also i'm close to been doing with nasa docs for 1983
01:20 🔗 godane turns out +100 pdfs didn't get uploaded
01:34 🔗 godane deals.kinja.com is saved: https://archive.org/details/@chris85?and[]=subject:%22deals.kinja.com%22
01:58 🔗 username1 has joined #archiveteam-bs
02:02 🔗 schbirid2 has quit IRC (Read error: Operation timed out)
02:17 🔗 tomwsmf has joined #archiveteam-bs
02:34 🔗 REiN^ has quit IRC ()
02:54 🔗 tomaspark has joined #archiveteam-bs
03:06 🔗 JesseW has quit IRC (Quit: Leaving.)
03:07 🔗 JesseW has joined #archiveteam-bs
03:40 🔗 godane ez.gizmodo.com is saved and is being uploaded
03:41 🔗 godane *es.gizmodo.com
03:46 🔗 DFJustin has quit IRC (Remote host closed the connection)
03:48 🔗 zyphlar has quit IRC (Quit: Connection closed for inactivity)
04:17 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
04:25 🔗 Sk1d has joined #archiveteam-bs
04:39 🔗 Start has quit IRC (Quit: Disconnected.)
04:40 🔗 Start has joined #archiveteam-bs
06:04 🔗 dashcloud has quit IRC (Read error: Operation timed out)
06:08 🔗 dashcloud has joined #archiveteam-bs
06:11 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
06:12 🔗 RichardG has joined #archiveteam-bs
06:19 🔗 tomwsmf has quit IRC (Ping timeout: 255 seconds)
07:40 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
08:14 🔗 DFJustin has joined #archiveteam-bs
08:23 🔗 Honno has joined #archiveteam-bs
08:59 🔗 GE has joined #archiveteam-bs
09:06 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
09:09 🔗 GE_ has joined #archiveteam-bs
09:10 🔗 GE has quit IRC (Ping timeout: 255 seconds)
09:10 🔗 GE_ is now known as GE
09:16 🔗 wp494 has quit IRC (Read error: Connection reset by peer)
10:11 🔗 GE_ has joined #archiveteam-bs
10:13 🔗 GE has quit IRC (Ping timeout: 255 seconds)
10:13 🔗 GE_ is now known as GE
10:44 🔗 i0npulse has quit IRC (Ping timeout: 244 seconds)
10:55 🔗 i0npulse has joined #archiveteam-bs
11:14 🔗 tuankiet has quit IRC (Quit: Leaving)
11:16 🔗 GE has quit IRC (Ping timeout: 255 seconds)
11:25 🔗 wp494 has joined #archiveteam-bs
11:26 🔗 tuankiet6 has joined #archiveteam-bs
11:31 🔗 tuankiet6 has quit IRC (Quit: Leaving)
11:31 🔗 tuankiet6 has joined #archiveteam-bs
11:31 🔗 tuankiet6 has quit IRC (Remote host closed the connection)
11:32 🔗 tuankiet6 has joined #archiveteam-bs
11:32 🔗 tuankiet6 is now known as tuankiet
11:48 🔗 GE has joined #archiveteam-bs
12:03 🔗 REiN^ has joined #archiveteam-bs
12:03 🔗 GE has quit IRC (Ping timeout: 255 seconds)
12:17 🔗 GE has joined #archiveteam-bs
12:22 🔗 REiN^ has quit IRC (Read error: Connection reset by peer)
12:24 🔗 dashcloud has quit IRC (Read error: Operation timed out)
12:29 🔗 dashcloud has joined #archiveteam-bs
12:32 🔗 dashcloud has quit IRC (Read error: Operation timed out)
12:40 🔗 kristian_ has joined #archiveteam-bs
12:42 🔗 dashcloud has joined #archiveteam-bs
12:54 🔗 GE has quit IRC (Ping timeout: 255 seconds)
13:00 🔗 REiN^ has joined #archiveteam-bs
13:10 🔗 GE has joined #archiveteam-bs
13:42 🔗 RichardG has joined #archiveteam-bs
13:43 🔗 GE has quit IRC (Ping timeout: 255 seconds)
15:40 🔗 GE has joined #archiveteam-bs
15:41 🔗 BlueMaxim has quit IRC (Quit: Leaving)
15:47 🔗 username1 has quit IRC (Remote host closed the connection)
16:41 🔗 kristian_ has quit IRC (Leaving)
16:43 🔗 tuankiet has quit IRC (Remote host closed the connection)
17:05 🔗 JesseW has joined #archiveteam-bs
17:27 🔗 GE_ has joined #archiveteam-bs
17:29 🔗 GE has quit IRC (Ping timeout: 255 seconds)
17:29 🔗 GE_ is now known as GE
17:43 🔗 bzc6p has joined #archiveteam-bs
17:43 🔗 swebb sets mode: +o bzc6p
17:44 🔗 bzc6p Igloo^: can you please look at your dnshistory crawlers? Strange that you return only xn--ses554g (tiny) items.
17:45 🔗 Igloo^ Sure
17:46 🔗 Igloo^ It's reporting 403's bzc6p
17:46 🔗 Igloo^ Though opening in a browser works
17:46 🔗 bzc6p But that browser is from a different IP I guess.
17:46 🔗 Igloo^ Yeah just tyring from same IP 1 mo
17:47 🔗 bzc6p You must have been banned. My question is that is was now (recently) or earlier, in the beginning.
17:48 🔗 Igloo^ In the beginning it was fine
17:48 🔗 Igloo^ Oh yep
17:48 🔗 Igloo^ Banned.
17:48 🔗 bzc6p I mean, when did you restart it? Or haven't stopped it at all?
17:48 🔗 Igloo^ I restarted it when the jobs became available the other day
17:49 🔗 bzc6p But then you were already banned I guess.
17:49 🔗 Igloo^ Possibly.
17:49 🔗 bzc6p Then they are not banning *now*. That's good.
17:49 🔗 bzc6p We've had the exact same situation with another member yesterday.
17:50 🔗 Igloo^ I can only apologise I didn't notice
17:50 🔗 bzc6p Igloo^: Unless you can change IP, please stop your pipeline, because you're taking away all items
17:50 🔗 Igloo^ I've stopped my pipeline
17:50 🔗 bzc6p Thanks
17:50 🔗 Igloo^ Going to check the other server see if it is also banned.
17:50 🔗 Igloo^ http://imgur.com/a/cdDBv
17:51 🔗 Igloo^ Is the error you get BTW.
17:52 🔗 bzc6p yes, they used to be assholes
17:53 🔗 Igloo^ They implemented cloudfare after the shutdown
17:53 🔗 Igloo^ They were still being assholes iirc
17:54 🔗 bzc6p They kept the site up and haven't banned recently, so they are pending
17:54 🔗 bzc6p Assholity Pending
17:55 🔗 bzc6p Now we just need to find who others of us left their pipelines on and take all the yummy items away
17:57 🔗 Igloo^ Do we need more pipelines? I've got one that isn't banned
17:57 🔗 Igloo^ (It never ran dnshistory)
17:58 🔗 bzc6p I think yes we could have some more
17:58 🔗 bzc6p But you don't have any other banned one on, do you?
17:58 🔗 Igloo^ No
17:58 🔗 Igloo^ I only ran it on one pipeline
17:58 🔗 bzc6p ok
17:58 🔗 Igloo^ We suffered really slow crawl rates last time
17:59 🔗 Igloo^ Their site couldn't handle the load
17:59 🔗 bzc6p Let's move to #greatlookup
18:14 🔗 bzc6p Since when does pastebin show captchas when VIEWING content?
18:17 🔗 Frogging can't say I've ever seen that but I guess it might be a rate limit thing?
18:18 🔗 bzc6p I've just seen it now. It says spam filter. But that used to be used when uploading, not when viewing. Can't see the logic but annoying.
18:20 🔗 JesseW Pastebin.com is ad supported -- making sure entities worth money to their advertisers are the only ones initiating page loads seems consistent with that
18:21 🔗 bzc6p Yeah, blocking scrapers. But if I must select store fronts every time I want to see a paste, I'll rapidly stop using their service.
18:21 🔗 JesseW as long as they have enough storage space -- *hosting* content uploaded by bots is fine for them (some advertising-vulnerable entities might even load pages with such content, which is a net win). It's *displaying* pages to non-advertising-vulnerable entities that they want to avoid
18:21 🔗 bzc6p *have to
18:22 🔗 JesseW there are a LOT of pastebins -- I certainly wouldn't use pastebin.com anymore (and I haven't for a while)
18:22 🔗 bzc6p Which is not a net win
18:23 🔗 JesseW yep, they have to balance refusing service to non-advertising-vulnerable entities with providing enough value to entities whose attention they *can* sell to get them to participate
18:23 🔗 bzc6p I don't use it either. I'd like a simple one
18:24 🔗 JesseW I like termbin for stuff I have on the terminal
18:24 🔗 bzc6p One day I'll start my own one
18:24 🔗 JesseW I don't remember one offhand for actual pastes
18:24 🔗 JesseW oh, 0bin
18:25 🔗 bzc6p Yes, problem is sharing a paste is expected to be a very prompt thing, shouldn't take more than a few seconds. This captcha thing makes it too long, that's why I think it's not a good idea, at least for such a service.
18:26 🔗 bzc6p (I'm already accustomed to that letting archivists do their job is already far off the table)
18:27 🔗 JesseW :-P
18:28 🔗 JesseW I don't disagree
18:28 🔗 bzc6p It's just my opinion. We are different in terms of patience.
18:29 🔗 bzc6p (In fact, I'm usually patient but I don't like needless work)
18:33 🔗 alembic https://ybin.me/ is pretty nice for pastes... don't think it does syntax highlighting though
18:34 🔗 bzc6p sets mode: +oooo achip Atluxity chfoo closure
18:34 🔗 bzc6p sets mode: +oooo Coderjoe dashcloud DFJustin FalconK
18:35 🔗 bzc6p sets mode: +oooo GLaDOS godane Infreq JesseW
18:35 🔗 bzc6p sets mode: +oooo JW_work Kaz luckcolor midas
18:35 🔗 bzc6p sets mode: +oooo PurpleSym Sanqui Smiley Start
18:35 🔗 bzc6p sets mode: +oo wp494 yipdw
18:36 🔗 bzc6p What happened to aaaaaaaaa? He's been away, at least with this nickname, since New Year's Eve.
18:38 🔗 JesseW A sudden influx of op...
18:39 🔗 JesseW I have no idea what's up with aaaaaaaa
18:48 🔗 schbirid has joined #archiveteam-bs
18:50 🔗 bzc6p I just found he had github activity in May so he's okay, just stays away from IRC.
18:51 🔗 JesseW good :-)
18:57 🔗 bzc6p has left
19:13 🔗 GE_ has joined #archiveteam-bs
19:13 🔗 tomwsmf has joined #archiveteam-bs
19:14 🔗 GE has quit IRC (Ping timeout: 255 seconds)
19:14 🔗 GE_ is now known as GE
19:42 🔗 JesseW has quit IRC (Read error: Operation timed out)
20:09 🔗 schbirid has quit IRC (Ping timeout: 1208 seconds)
20:20 🔗 bzc6p has joined #archiveteam-bs
20:20 🔗 swebb sets mode: +o bzc6p
20:21 🔗 bzc6p has left
20:31 🔗 kristian_ has joined #archiveteam-bs
20:47 🔗 dashcloud has quit IRC (Read error: Operation timed out)
20:51 🔗 godane look like my first web archive to failed derive : https://catalogd.archive.org/log/553682276
20:51 🔗 dashcloud has joined #archiveteam-bs
20:53 🔗 godane SketchCow: i figure you would want to know about my first web archive to fail derive: https://archive.org/details/www.sbs.com.au-news-node-201k-20160820
21:12 🔗 dashcloud has quit IRC (Read error: Connection reset by peer)
21:17 🔗 dashcloud has joined #archiveteam-bs
21:30 🔗 RichardG has quit IRC (Ping timeout: 244 seconds)
21:35 🔗 alembic has quit IRC (Read error: Operation timed out)
21:37 🔗 alembic has joined #archiveteam-bs
21:37 🔗 Honno has quit IRC (Read error: Operation timed out)
21:45 🔗 alembic has quit IRC (Read error: Operation timed out)
21:46 🔗 alembic has joined #archiveteam-bs
22:04 🔗 godane gzip: 201k/www.sbs.com.au-news-node-201k-20160820.warc.gz: decompression OK, trailing garbage ignored
22:05 🔗 godane i now see the problem
22:05 🔗 godane md5sum is find for everything in that item
22:05 🔗 godane so my try a re-download of those urls
22:05 🔗 Frogging !ao http://populationpyramid.net/static/data/mainData_en.json
22:05 🔗 Frogging oops
22:06 🔗 GE has quit IRC (Ping timeout: 255 seconds)
22:09 🔗 GE has joined #archiveteam-bs
22:09 🔗 JesseW has joined #archiveteam-bs
22:11 🔗 kristian_ has quit IRC (Leaving)
22:12 🔗 kristian_ has joined #archiveteam-bs
23:09 🔗 RichardG has joined #archiveteam-bs
23:16 🔗 kristian_ has quit IRC (Leaving)
23:38 🔗 OpticalSw has joined #archiveteam-bs
23:39 🔗 OpticalSw Hi Joe
23:39 🔗 joepie91 ohai :)
23:39 🔗 arkiver many more big projects are coming yp
23:39 🔗 arkiver up*
23:39 🔗 arkiver flickr, tumblr
23:40 🔗 OpticalSw http://pastebin.com/MxxTj9Lf
23:40 🔗 OpticalSw Oooh might buy a ton of VMs then
23:40 🔗 OpticalSw Some sentris ones likely
23:43 🔗 OpticalSw joepie91?
23:43 🔗 OpticalSw Any luck?
23:48 🔗 joepie91 hold on
23:48 🔗 joepie91 patience, I'm multitasking :)
23:48 🔗 joepie91 errr
23:48 🔗 joepie91 that log doesn't contain an error...
23:49 🔗 joepie91 chfoo: arkiver: who is currently responsible for seesaw?
23:49 🔗 OpticalSw Hangon
23:50 🔗 OpticalSw I was a bit of a retard I think
23:52 🔗 OpticalSw I followed an oldish tutorial
23:52 🔗 OpticalSw for livejournal
23:52 🔗 OpticalSw Nope still failed
23:53 🔗 joepie91 OpticalSw: always follow the instructions for the thing you're setting up, in the README :P
23:53 🔗 OpticalSw I was doing livejournal then you said Orkut haha
23:56 🔗 OpticalSw pythons easy_install worked
23:57 🔗 yipdw that error looks like you're running some ancient Python component
23:57 🔗 yipdw .egg as an archive format isn't new
23:58 🔗 OpticalSw Fresh install on Jessie
23:58 🔗 yipdw I guess it's pip then
23:58 🔗 OpticalSw Will reinstall pip
23:58 🔗 yipdw reinstalling from packages might not help; Debian ships an old version for some reasn
23:59 🔗 OpticalSw ah crap. Recomendation?
23:59 🔗 yipdw virtualenv may make it possible to install one that isn't that old
23:59 🔗 OpticalSw Could you give me some pointers?

irclogger-viewer