Time |
Nickname |
Message |
00:28
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
00:54
🔗
|
|
dashcloud has quit IRC (Ping timeout: 260 seconds) |
00:56
🔗
|
|
dashcloud has joined #archiveteam-bs |
01:10
🔗
|
|
JesseW has joined #archiveteam-bs |
01:14
🔗
|
godane |
so i'm doing a different brute force method for SBS |
01:15
🔗
|
godane |
https://archive.org/details/www.sbs.com.au-news-node-190k-20160820 |
01:16
🔗
|
godane |
i'm now doing like 7k url sets at once |
01:16
🔗
|
godane |
i had to do this cause in the 190k area its going from odd to even back to odd numbers |
01:19
🔗
|
godane |
also i'm close to been doing with nasa docs for 1983 |
01:20
🔗
|
godane |
turns out +100 pdfs didn't get uploaded |
01:34
🔗
|
godane |
deals.kinja.com is saved: https://archive.org/details/@chris85?and[]=subject:%22deals.kinja.com%22 |
01:58
🔗
|
|
username1 has joined #archiveteam-bs |
02:02
🔗
|
|
schbirid2 has quit IRC (Read error: Operation timed out) |
02:17
🔗
|
|
tomwsmf has joined #archiveteam-bs |
02:34
🔗
|
|
REiN^ has quit IRC () |
02:54
🔗
|
|
tomaspark has joined #archiveteam-bs |
03:06
🔗
|
|
JesseW has quit IRC (Quit: Leaving.) |
03:07
🔗
|
|
JesseW has joined #archiveteam-bs |
03:40
🔗
|
godane |
ez.gizmodo.com is saved and is being uploaded |
03:41
🔗
|
godane |
*es.gizmodo.com |
03:46
🔗
|
|
DFJustin has quit IRC (Remote host closed the connection) |
03:48
🔗
|
|
zyphlar has quit IRC (Quit: Connection closed for inactivity) |
04:17
🔗
|
|
Sk1d has quit IRC (Ping timeout: 194 seconds) |
04:25
🔗
|
|
Sk1d has joined #archiveteam-bs |
04:39
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
04:40
🔗
|
|
Start has joined #archiveteam-bs |
06:04
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
06:08
🔗
|
|
dashcloud has joined #archiveteam-bs |
06:11
🔗
|
|
RichardG has quit IRC (Read error: Connection reset by peer) |
06:12
🔗
|
|
RichardG has joined #archiveteam-bs |
06:19
🔗
|
|
tomwsmf has quit IRC (Ping timeout: 255 seconds) |
07:40
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
08:14
🔗
|
|
DFJustin has joined #archiveteam-bs |
08:23
🔗
|
|
Honno has joined #archiveteam-bs |
08:59
🔗
|
|
GE has joined #archiveteam-bs |
09:06
🔗
|
|
RichardG has quit IRC (Read error: Connection reset by peer) |
09:09
🔗
|
|
GE_ has joined #archiveteam-bs |
09:10
🔗
|
|
GE has quit IRC (Ping timeout: 255 seconds) |
09:10
🔗
|
|
GE_ is now known as GE |
09:16
🔗
|
|
wp494 has quit IRC (Read error: Connection reset by peer) |
10:11
🔗
|
|
GE_ has joined #archiveteam-bs |
10:13
🔗
|
|
GE has quit IRC (Ping timeout: 255 seconds) |
10:13
🔗
|
|
GE_ is now known as GE |
10:44
🔗
|
|
i0npulse has quit IRC (Ping timeout: 244 seconds) |
10:55
🔗
|
|
i0npulse has joined #archiveteam-bs |
11:14
🔗
|
|
tuankiet has quit IRC (Quit: Leaving) |
11:16
🔗
|
|
GE has quit IRC (Ping timeout: 255 seconds) |
11:25
🔗
|
|
wp494 has joined #archiveteam-bs |
11:26
🔗
|
|
tuankiet6 has joined #archiveteam-bs |
11:31
🔗
|
|
tuankiet6 has quit IRC (Quit: Leaving) |
11:31
🔗
|
|
tuankiet6 has joined #archiveteam-bs |
11:31
🔗
|
|
tuankiet6 has quit IRC (Remote host closed the connection) |
11:32
🔗
|
|
tuankiet6 has joined #archiveteam-bs |
11:32
🔗
|
|
tuankiet6 is now known as tuankiet |
11:48
🔗
|
|
GE has joined #archiveteam-bs |
12:03
🔗
|
|
REiN^ has joined #archiveteam-bs |
12:03
🔗
|
|
GE has quit IRC (Ping timeout: 255 seconds) |
12:17
🔗
|
|
GE has joined #archiveteam-bs |
12:22
🔗
|
|
REiN^ has quit IRC (Read error: Connection reset by peer) |
12:24
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
12:29
🔗
|
|
dashcloud has joined #archiveteam-bs |
12:32
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
12:40
🔗
|
|
kristian_ has joined #archiveteam-bs |
12:42
🔗
|
|
dashcloud has joined #archiveteam-bs |
12:54
🔗
|
|
GE has quit IRC (Ping timeout: 255 seconds) |
13:00
🔗
|
|
REiN^ has joined #archiveteam-bs |
13:10
🔗
|
|
GE has joined #archiveteam-bs |
13:42
🔗
|
|
RichardG has joined #archiveteam-bs |
13:43
🔗
|
|
GE has quit IRC (Ping timeout: 255 seconds) |
15:40
🔗
|
|
GE has joined #archiveteam-bs |
15:41
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
15:47
🔗
|
|
username1 has quit IRC (Remote host closed the connection) |
16:41
🔗
|
|
kristian_ has quit IRC (Leaving) |
16:43
🔗
|
|
tuankiet has quit IRC (Remote host closed the connection) |
17:05
🔗
|
|
JesseW has joined #archiveteam-bs |
17:27
🔗
|
|
GE_ has joined #archiveteam-bs |
17:29
🔗
|
|
GE has quit IRC (Ping timeout: 255 seconds) |
17:29
🔗
|
|
GE_ is now known as GE |
17:43
🔗
|
|
bzc6p has joined #archiveteam-bs |
17:43
🔗
|
|
swebb sets mode: +o bzc6p |
17:44
🔗
|
bzc6p |
Igloo^: can you please look at your dnshistory crawlers? Strange that you return only xn--ses554g (tiny) items. |
17:45
🔗
|
Igloo^ |
Sure |
17:46
🔗
|
Igloo^ |
It's reporting 403's bzc6p |
17:46
🔗
|
Igloo^ |
Though opening in a browser works |
17:46
🔗
|
bzc6p |
But that browser is from a different IP I guess. |
17:46
🔗
|
Igloo^ |
Yeah just tyring from same IP 1 mo |
17:47
🔗
|
bzc6p |
You must have been banned. My question is that is was now (recently) or earlier, in the beginning. |
17:48
🔗
|
Igloo^ |
In the beginning it was fine |
17:48
🔗
|
Igloo^ |
Oh yep |
17:48
🔗
|
Igloo^ |
Banned. |
17:48
🔗
|
bzc6p |
I mean, when did you restart it? Or haven't stopped it at all? |
17:48
🔗
|
Igloo^ |
I restarted it when the jobs became available the other day |
17:49
🔗
|
bzc6p |
But then you were already banned I guess. |
17:49
🔗
|
Igloo^ |
Possibly. |
17:49
🔗
|
bzc6p |
Then they are not banning *now*. That's good. |
17:49
🔗
|
bzc6p |
We've had the exact same situation with another member yesterday. |
17:50
🔗
|
Igloo^ |
I can only apologise I didn't notice |
17:50
🔗
|
bzc6p |
Igloo^: Unless you can change IP, please stop your pipeline, because you're taking away all items |
17:50
🔗
|
Igloo^ |
I've stopped my pipeline |
17:50
🔗
|
bzc6p |
Thanks |
17:50
🔗
|
Igloo^ |
Going to check the other server see if it is also banned. |
17:50
🔗
|
Igloo^ |
http://imgur.com/a/cdDBv |
17:51
🔗
|
Igloo^ |
Is the error you get BTW. |
17:52
🔗
|
bzc6p |
yes, they used to be assholes |
17:53
🔗
|
Igloo^ |
They implemented cloudfare after the shutdown |
17:53
🔗
|
Igloo^ |
They were still being assholes iirc |
17:54
🔗
|
bzc6p |
They kept the site up and haven't banned recently, so they are pending |
17:54
🔗
|
bzc6p |
Assholity Pending |
17:55
🔗
|
bzc6p |
Now we just need to find who others of us left their pipelines on and take all the yummy items away |
17:57
🔗
|
Igloo^ |
Do we need more pipelines? I've got one that isn't banned |
17:57
🔗
|
Igloo^ |
(It never ran dnshistory) |
17:58
🔗
|
bzc6p |
I think yes we could have some more |
17:58
🔗
|
bzc6p |
But you don't have any other banned one on, do you? |
17:58
🔗
|
Igloo^ |
No |
17:58
🔗
|
Igloo^ |
I only ran it on one pipeline |
17:58
🔗
|
bzc6p |
ok |
17:58
🔗
|
Igloo^ |
We suffered really slow crawl rates last time |
17:59
🔗
|
Igloo^ |
Their site couldn't handle the load |
17:59
🔗
|
bzc6p |
Let's move to #greatlookup |
18:14
🔗
|
bzc6p |
Since when does pastebin show captchas when VIEWING content? |
18:17
🔗
|
Frogging |
can't say I've ever seen that but I guess it might be a rate limit thing? |
18:18
🔗
|
bzc6p |
I've just seen it now. It says spam filter. But that used to be used when uploading, not when viewing. Can't see the logic but annoying. |
18:20
🔗
|
JesseW |
Pastebin.com is ad supported -- making sure entities worth money to their advertisers are the only ones initiating page loads seems consistent with that |
18:21
🔗
|
bzc6p |
Yeah, blocking scrapers. But if I must select store fronts every time I want to see a paste, I'll rapidly stop using their service. |
18:21
🔗
|
JesseW |
as long as they have enough storage space -- *hosting* content uploaded by bots is fine for them (some advertising-vulnerable entities might even load pages with such content, which is a net win). It's *displaying* pages to non-advertising-vulnerable entities that they want to avoid |
18:21
🔗
|
bzc6p |
*have to |
18:22
🔗
|
JesseW |
there are a LOT of pastebins -- I certainly wouldn't use pastebin.com anymore (and I haven't for a while) |
18:22
🔗
|
bzc6p |
Which is not a net win |
18:23
🔗
|
JesseW |
yep, they have to balance refusing service to non-advertising-vulnerable entities with providing enough value to entities whose attention they *can* sell to get them to participate |
18:23
🔗
|
bzc6p |
I don't use it either. I'd like a simple one |
18:24
🔗
|
JesseW |
I like termbin for stuff I have on the terminal |
18:24
🔗
|
bzc6p |
One day I'll start my own one |
18:24
🔗
|
JesseW |
I don't remember one offhand for actual pastes |
18:24
🔗
|
JesseW |
oh, 0bin |
18:25
🔗
|
bzc6p |
Yes, problem is sharing a paste is expected to be a very prompt thing, shouldn't take more than a few seconds. This captcha thing makes it too long, that's why I think it's not a good idea, at least for such a service. |
18:26
🔗
|
bzc6p |
(I'm already accustomed to that letting archivists do their job is already far off the table) |
18:27
🔗
|
JesseW |
:-P |
18:28
🔗
|
JesseW |
I don't disagree |
18:28
🔗
|
bzc6p |
It's just my opinion. We are different in terms of patience. |
18:29
🔗
|
bzc6p |
(In fact, I'm usually patient but I don't like needless work) |
18:33
🔗
|
alembic |
https://ybin.me/ is pretty nice for pastes... don't think it does syntax highlighting though |
18:34
🔗
|
|
bzc6p sets mode: +oooo achip Atluxity chfoo closure |
18:34
🔗
|
|
bzc6p sets mode: +oooo Coderjoe dashcloud DFJustin FalconK |
18:35
🔗
|
|
bzc6p sets mode: +oooo GLaDOS godane Infreq JesseW |
18:35
🔗
|
|
bzc6p sets mode: +oooo JW_work Kaz luckcolor midas |
18:35
🔗
|
|
bzc6p sets mode: +oooo PurpleSym Sanqui Smiley Start |
18:35
🔗
|
|
bzc6p sets mode: +oo wp494 yipdw |
18:36
🔗
|
bzc6p |
What happened to aaaaaaaaa? He's been away, at least with this nickname, since New Year's Eve. |
18:38
🔗
|
JesseW |
A sudden influx of op... |
18:39
🔗
|
JesseW |
I have no idea what's up with aaaaaaaa |
18:48
🔗
|
|
schbirid has joined #archiveteam-bs |
18:50
🔗
|
bzc6p |
I just found he had github activity in May so he's okay, just stays away from IRC. |
18:51
🔗
|
JesseW |
good :-) |
18:57
🔗
|
|
bzc6p has left |
19:13
🔗
|
|
GE_ has joined #archiveteam-bs |
19:13
🔗
|
|
tomwsmf has joined #archiveteam-bs |
19:14
🔗
|
|
GE has quit IRC (Ping timeout: 255 seconds) |
19:14
🔗
|
|
GE_ is now known as GE |
19:42
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
20:09
🔗
|
|
schbirid has quit IRC (Ping timeout: 1208 seconds) |
20:20
🔗
|
|
bzc6p has joined #archiveteam-bs |
20:20
🔗
|
|
swebb sets mode: +o bzc6p |
20:21
🔗
|
|
bzc6p has left |
20:31
🔗
|
|
kristian_ has joined #archiveteam-bs |
20:47
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
20:51
🔗
|
godane |
look like my first web archive to failed derive : https://catalogd.archive.org/log/553682276 |
20:51
🔗
|
|
dashcloud has joined #archiveteam-bs |
20:53
🔗
|
godane |
SketchCow: i figure you would want to know about my first web archive to fail derive: https://archive.org/details/www.sbs.com.au-news-node-201k-20160820 |
21:12
🔗
|
|
dashcloud has quit IRC (Read error: Connection reset by peer) |
21:17
🔗
|
|
dashcloud has joined #archiveteam-bs |
21:30
🔗
|
|
RichardG has quit IRC (Ping timeout: 244 seconds) |
21:35
🔗
|
|
alembic has quit IRC (Read error: Operation timed out) |
21:37
🔗
|
|
alembic has joined #archiveteam-bs |
21:37
🔗
|
|
Honno has quit IRC (Read error: Operation timed out) |
21:45
🔗
|
|
alembic has quit IRC (Read error: Operation timed out) |
21:46
🔗
|
|
alembic has joined #archiveteam-bs |
22:04
🔗
|
godane |
gzip: 201k/www.sbs.com.au-news-node-201k-20160820.warc.gz: decompression OK, trailing garbage ignored |
22:05
🔗
|
godane |
i now see the problem |
22:05
🔗
|
godane |
md5sum is find for everything in that item |
22:05
🔗
|
godane |
so my try a re-download of those urls |
22:05
🔗
|
Frogging |
!ao http://populationpyramid.net/static/data/mainData_en.json |
22:05
🔗
|
Frogging |
oops |
22:06
🔗
|
|
GE has quit IRC (Ping timeout: 255 seconds) |
22:09
🔗
|
|
GE has joined #archiveteam-bs |
22:09
🔗
|
|
JesseW has joined #archiveteam-bs |
22:11
🔗
|
|
kristian_ has quit IRC (Leaving) |
22:12
🔗
|
|
kristian_ has joined #archiveteam-bs |
23:09
🔗
|
|
RichardG has joined #archiveteam-bs |
23:16
🔗
|
|
kristian_ has quit IRC (Leaving) |
23:38
🔗
|
|
OpticalSw has joined #archiveteam-bs |
23:39
🔗
|
OpticalSw |
Hi Joe |
23:39
🔗
|
joepie91 |
ohai :) |
23:39
🔗
|
arkiver |
many more big projects are coming yp |
23:39
🔗
|
arkiver |
up* |
23:39
🔗
|
arkiver |
flickr, tumblr |
23:40
🔗
|
OpticalSw |
http://pastebin.com/MxxTj9Lf |
23:40
🔗
|
OpticalSw |
Oooh might buy a ton of VMs then |
23:40
🔗
|
OpticalSw |
Some sentris ones likely |
23:43
🔗
|
OpticalSw |
joepie91? |
23:43
🔗
|
OpticalSw |
Any luck? |
23:48
🔗
|
joepie91 |
hold on |
23:48
🔗
|
joepie91 |
patience, I'm multitasking :) |
23:48
🔗
|
joepie91 |
errr |
23:48
🔗
|
joepie91 |
that log doesn't contain an error... |
23:49
🔗
|
joepie91 |
chfoo: arkiver: who is currently responsible for seesaw? |
23:49
🔗
|
OpticalSw |
Hangon |
23:50
🔗
|
OpticalSw |
I was a bit of a retard I think |
23:52
🔗
|
OpticalSw |
I followed an oldish tutorial |
23:52
🔗
|
OpticalSw |
for livejournal |
23:52
🔗
|
OpticalSw |
Nope still failed |
23:53
🔗
|
joepie91 |
OpticalSw: always follow the instructions for the thing you're setting up, in the README :P |
23:53
🔗
|
OpticalSw |
I was doing livejournal then you said Orkut haha |
23:56
🔗
|
OpticalSw |
pythons easy_install worked |
23:57
🔗
|
yipdw |
that error looks like you're running some ancient Python component |
23:57
🔗
|
yipdw |
.egg as an archive format isn't new |
23:58
🔗
|
OpticalSw |
Fresh install on Jessie |
23:58
🔗
|
yipdw |
I guess it's pip then |
23:58
🔗
|
OpticalSw |
Will reinstall pip |
23:58
🔗
|
yipdw |
reinstalling from packages might not help; Debian ships an old version for some reasn |
23:59
🔗
|
OpticalSw |
ah crap. Recomendation? |
23:59
🔗
|
yipdw |
virtualenv may make it possible to install one that isn't that old |
23:59
🔗
|
OpticalSw |
Could you give me some pointers? |