Time |
Nickname |
Message |
00:03
🔗
|
|
tomwsmf-a has quit IRC (Read error: Operation timed out) |
00:04
🔗
|
atrocity |
fantastic, this worked: -o %(id)s/%(title)s.%(ext)s |
00:04
🔗
|
atrocity |
now to archive all youtubes |
00:08
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
00:11
🔗
|
|
dashcloud has joined #archiveteam-bs |
00:11
🔗
|
|
bwn__ is now known as bwn |
00:13
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
00:19
🔗
|
bwn |
JW_work: JesseW: i don't use it, but i had come across that warcreate extension, it looked like he was working on adding a 'record' type thing similar to what you were talking about |
00:19
🔗
|
bwn |
but it just did a snapshot of the current page when I had played with it |
00:21
🔗
|
bwn |
https://github.com/machawk1/warcreate |
00:21
🔗
|
|
JesseW has joined #archiveteam-bs |
00:25
🔗
|
|
Balrog_ has joined #archiveteam-bs |
00:30
🔗
|
|
Balrog_ has quit IRC (<TerminusEst13> hung she dong) |
00:30
🔗
|
atrocity |
be awesome if there was a firefox version |
00:31
🔗
|
|
Start has joined #archiveteam-bs |
00:41
🔗
|
ivan` |
Delimiter is still AWOL. would not recommend |
00:41
🔗
|
ivan` |
and I thought OVH had bad customer service |
00:43
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
00:44
🔗
|
JesseW |
bwn: https://github.com/machawk1/warcreate/issues/66 |
00:44
🔗
|
JesseW |
thanks for pointing me at warcreate |
01:03
🔗
|
|
Honno has quit IRC (Quit: Leaving) |
01:06
🔗
|
* |
Yoshimura thanks VADemon. Could use that ;) Wish you as well. |
01:11
🔗
|
ivan` |
joepie91: if you know the lowendtalk guy maybe you can vouch for me, new account ivank |
01:12
🔗
|
|
wp494 has quit IRC (Read error: Connection reset by peer) |
01:17
🔗
|
|
wp494 has joined #archiveteam-bs |
01:33
🔗
|
|
tomwsmf-a has joined #archiveteam-bs |
02:01
🔗
|
|
wp494 has quit IRC (Read error: Operation timed out) |
02:01
🔗
|
|
wp494 has joined #archiveteam-bs |
02:06
🔗
|
|
VADemon has quit IRC (Quit: left4dead) |
02:26
🔗
|
|
atrocity has quit IRC (Ping timeout: 260 seconds) |
02:28
🔗
|
|
atrocity has joined #archiveteam-bs |
02:29
🔗
|
atrocity |
FUCK |
02:29
🔗
|
atrocity |
power went out here, so lost my openwith shit |
02:51
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
02:57
🔗
|
|
tomwsmf-a has quit IRC (Read error: Operation timed out) |
02:59
🔗
|
xmc |
D: |
03:12
🔗
|
Yoshimura |
The doom with url shorteners is terrible. Not sure what is worse the shorteners or the new ones with custom urls and long crap. |
03:13
🔗
|
Yoshimura |
Also ads, and more. There are lot of uncrawled ones also. *looks at JesseW with smile* |
03:14
🔗
|
* |
xmc smiles creepily |
03:14
🔗
|
xmc |
er, i'm not that creepy |
03:14
🔗
|
Yoshimura |
Is there anywhere matadata about megawars on archive? Would like to systematically go through some files, indexes or pages. |
03:15
🔗
|
xmc |
megawarcs are just big warcs |
03:15
🔗
|
xmc |
what are you looking for? |
03:15
🔗
|
Yoshimura |
I know. I meant I do not have to click on each page on AI. |
03:15
🔗
|
|
tomwsmf-a has joined #archiveteam-bs |
03:15
🔗
|
Yoshimura |
I meant IA... looking for HTML pages to extract data from. |
03:16
🔗
|
Yoshimura |
I got both AT related, and two/three different projects related. So it would be handy. |
03:16
🔗
|
Yoshimura |
First step would be metadata, second index, last sections of warcs by range requests to get only the HTML. |
03:17
🔗
|
Yoshimura |
And only the more fresh, and depending on content. Some vast sites are kind of useless (except the very index or comments) |
03:27
🔗
|
|
RichardG has quit IRC (Ping timeout: 260 seconds) |
03:32
🔗
|
|
tomwsmf-a has quit IRC (Read error: Operation timed out) |
03:48
🔗
|
|
ErkDog has quit IRC (Read error: Operation timed out) |
04:01
🔗
|
|
godane has quit IRC (Quit: Leaving.) |
04:05
🔗
|
* |
Yoshimura found one problem when (ordinary) people get stuff for free... they then expect everything to be super f.... nice and free at least, if not give them gifts. |
04:06
🔗
|
|
ErkDog has joined #archiveteam-bs |
04:21
🔗
|
|
Crocatowa has joined #archiveteam-bs |
04:28
🔗
|
|
ErkDog has quit IRC (Read error: Operation timed out) |
04:33
🔗
|
|
ErkDog has joined #archiveteam-bs |
04:40
🔗
|
Frogging |
dem ordinary people |
04:41
🔗
|
|
bwn has joined #archiveteam-bs |
04:50
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
04:55
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
05:02
🔗
|
|
Sk1d has joined #archiveteam-bs |
06:22
🔗
|
|
hawc145 has joined #archiveteam-bs |
06:27
🔗
|
|
HCross has quit IRC (Read error: Operation timed out) |
06:36
🔗
|
|
schbirid has joined #archiveteam-bs |
06:49
🔗
|
|
JesseW has quit IRC (Quit: Leaving.) |
06:49
🔗
|
|
JesseW has joined #archiveteam-bs |
07:02
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
07:06
🔗
|
joepie91 |
damn |
07:07
🔗
|
joepie91 |
new EU privacy laws give European privacy watchdog the authority to impose fines of up to 20 million euro or 4% of the *global* revenue for significant violations, and 10 million / 2% for more 'formal' violations |
07:25
🔗
|
|
metalcamp has joined #archiveteam-bs |
07:33
🔗
|
|
Medowar has joined #archiveteam-bs |
07:34
🔗
|
|
VADemon has joined #archiveteam-bs |
07:59
🔗
|
|
mismatch_ has quit IRC (Remote host closed the connection) |
08:01
🔗
|
|
mismatch_ has joined #archiveteam-bs |
08:29
🔗
|
|
godane has joined #archiveteam-bs |
08:36
🔗
|
|
hawc145 is now known as HCross |
09:24
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
09:56
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
10:05
🔗
|
|
bwn has joined #archiveteam-bs |
10:55
🔗
|
|
metalcamp has joined #archiveteam-bs |
10:56
🔗
|
Atluxity |
Would this truck blend in anywhere? https://twitter.com/textfiles/status/722094405931397121/photo/1 |
11:36
🔗
|
|
RichardG has joined #archiveteam-bs |
11:38
🔗
|
|
Medowar has quit IRC (Quit: Connection closed for inactivity) |
11:50
🔗
|
atrocity |
who would steal from a library... |
11:52
🔗
|
Atluxity |
maybe they just borrowed it |
11:56
🔗
|
|
Lord_Nigh has quit IRC (Read error: Operation timed out) |
12:00
🔗
|
|
Lord_Nigh has joined #archiveteam-bs |
12:04
🔗
|
|
Lord_Nigh has quit IRC (Read error: Operation timed out) |
12:18
🔗
|
|
Medowar has joined #archiveteam-bs |
12:24
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
12:41
🔗
|
|
Lord_Nigh has joined #archiveteam-bs |
12:45
🔗
|
|
vitzli has joined #archiveteam-bs |
13:30
🔗
|
|
tomwsmf-a has joined #archiveteam-bs |
13:31
🔗
|
|
RichardG has quit IRC (Ping timeout: 272 seconds) |
13:34
🔗
|
SketchCow |
That truck |
13:42
🔗
|
|
hook54321 has joined #archiveteam-bs |
13:47
🔗
|
|
tomwsmf-a has quit IRC (Read error: Operation timed out) |
14:11
🔗
|
|
RichardG has joined #archiveteam-bs |
14:16
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
15:03
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
15:03
🔗
|
|
RichardG has joined #archiveteam-bs |
15:09
🔗
|
|
tomwsmf-a has joined #archiveteam-bs |
15:20
🔗
|
|
RichardG has quit IRC (Ping timeout: 250 seconds) |
15:21
🔗
|
|
RichardG has joined #archiveteam-bs |
15:29
🔗
|
|
JesseW has joined #archiveteam-bs |
15:30
🔗
|
|
godane has quit IRC (Read error: Operation timed out) |
15:31
🔗
|
|
Start has joined #archiveteam-bs |
15:44
🔗
|
|
RichardG has quit IRC (Ping timeout: 244 seconds) |
15:50
🔗
|
|
RichardG has joined #archiveteam-bs |
15:50
🔗
|
|
hook54321 has quit IRC (Quit: Connection closed for inactivity) |
15:51
🔗
|
HCross2 |
Sorry again, but newsbuddy could do with more power and grabbers please |
15:55
🔗
|
|
Start has quit IRC (Ping timeout: 260 seconds) |
15:56
🔗
|
Yoshimura |
HCross2: Define power and grabber? |
15:57
🔗
|
Yoshimura |
Grabber = pipe, power = cpu? |
15:57
🔗
|
HCross2 |
Yeah, bandwidth and CPU really |
16:00
🔗
|
|
Start has joined #archiveteam-bs |
16:01
🔗
|
Yoshimura |
I could serve BW, or CPU, not both at same time currently. |
16:01
🔗
|
Yoshimura |
Well, both, but not mutually, aka diff location. |
16:02
🔗
|
Yoshimura |
If you do not need dedicated box, I can give you container, I use 1/100 - 5/100 of available bw atm. |
16:05
🔗
|
|
Start has quit IRC (Remote host closed the connection) |
16:05
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
16:05
🔗
|
Medowar |
>Container |
16:05
🔗
|
Medowar |
We really should get a docker image. If I have time, I can create one. |
16:05
🔗
|
Medowar |
but right now, time is an issue. |
16:07
🔗
|
Yoshimura |
Medowar: Yeah, I can provide HCross2 a Docker with ubuntu stuff (phusion/baseimage). |
16:08
🔗
|
Medowar |
yeah i have done the same. |
16:08
🔗
|
Yoshimura |
Btw, anyone knows about high bandwidth pipeline? |
16:08
🔗
|
Medowar |
But with a debian base please |
16:08
🔗
|
Yoshimura |
Maybe not neded at all, the hltv is going off in few days. |
16:08
🔗
|
Yoshimura |
And wayback lacks tons of it |
16:08
🔗
|
Medowar |
you mean high bandwith servers? |
16:09
🔗
|
Yoshimura |
Nah archivebot. noone seems to care + the site seems to be loaded so warrior would make no sense |
16:09
🔗
|
Yoshimura |
Someone might try to reach them or something, but saying they will shutdown in few days sucks. |
16:10
🔗
|
Yoshimura |
After people realized that I guess more people crawl them personally or something. |
16:10
🔗
|
Yoshimura |
http://www.hltv.org/?pageid=86&galleryid=7880 |
16:10
🔗
|
Yoshimura |
Example page. Load speed, and I think this one is not in wayback either. |
16:11
🔗
|
Medowar |
hltv is going offline? |
16:11
🔗
|
Yoshimura |
I announced that twice at least on main channel |
16:11
🔗
|
Yoshimura |
Noone cared or noticed. Yes on 23rd april. |
16:12
🔗
|
Medowar |
wow. Is there an official announcement anywhere? |
16:12
🔗
|
Yoshimura |
Yes on twitter it was I think |
16:12
🔗
|
Yoshimura |
https://twitter.com/hltvorg_/status/722083587357544448 |
16:13
🔗
|
Yoshimura |
But it may or not be hoax, I do not know. |
16:13
🔗
|
Medowar |
fake. Wrong twitter account. |
16:13
🔗
|
Yoshimura |
The twitter handle sounds sketchy. But someone already on wiki said its valuable. So if it is confirmed hoax, we should still crawl it after the 23rd. |
16:14
🔗
|
Medowar |
it has literarly nothing on it other than the announcement. |
16:14
🔗
|
Medowar |
https://twitter.com/HLTVORG |
16:14
🔗
|
Medowar |
this is the original account |
16:14
🔗
|
Medowar |
also, the creator announced 9 months ago, that he is going fulltime hltv, so I dont think, that it is shutting down |
16:14
🔗
|
Medowar |
http://www.hltv.org/?pageid=135&userid=1&blogid=10102 |
16:16
🔗
|
Medowar |
and it is the most important CSGO news site. Has dedicated staff to do interviews on events and stuff |
16:17
🔗
|
Medowar |
afk 30 min, driving home |
16:22
🔗
|
Yoshimura |
Alright, then I guess best strategy would be to wait and fetch the site once a year. |
16:22
🔗
|
Yoshimura |
bot would have space problems maybe, due to galeries, I do not know. |
16:23
🔗
|
|
SimpBrain has joined #archiveteam-bs |
16:47
🔗
|
|
bwn_ has joined #archiveteam-bs |
16:59
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
17:05
🔗
|
atrocity |
newsgrabber on warrior? if so, i can give you like 40/40 |
17:06
🔗
|
HCross |
NOPE |
17:06
🔗
|
HCross |
It isnty |
17:06
🔗
|
HCross |
isnt |
17:07
🔗
|
atrocity |
:/ |
17:08
🔗
|
atrocity |
yuku it is, lol |
17:11
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
17:14
🔗
|
ivan` |
HCross: had a chat with "Michael" who tells me "service will be live by Friday" because that's when they set up their drives in a batch |
17:14
🔗
|
ivan` |
and still no response to tickets or emails |
17:14
🔗
|
ivan` |
pretty sure I'm going to be out $130 on my drive |
17:15
🔗
|
HCross |
Ouch :/ |
17:18
🔗
|
HCross |
Are the drives over at their DC now? |
17:18
🔗
|
ivan` |
I have no idea. they received the drive last Wednesday |
17:19
🔗
|
ivan` |
maybe it's already been sold for their hookers-and-blow fund |
17:19
🔗
|
HCross |
Then surely it would be in last Friday's batch if they do it weekly |
17:19
🔗
|
HCross |
nah, more like their "Downtime poptart fund" |
17:30
🔗
|
|
jspiros has quit IRC (Read error: Operation timed out) |
17:34
🔗
|
|
jspiros has joined #archiveteam-bs |
17:45
🔗
|
|
Start has joined #archiveteam-bs |
17:46
🔗
|
|
Honno has joined #archiveteam-bs |
17:53
🔗
|
|
JW_work1 has joined #archiveteam-bs |
17:59
🔗
|
|
JW_work has quit IRC (Ping timeout: 370 seconds) |
18:08
🔗
|
ivan` |
I opened a ticket to get them to cancel and return my drive |
18:08
🔗
|
ivan` |
fuckers will probably try to bill me $25 for packing the drive |
18:10
🔗
|
HCross |
pay it, then speak to your CC company |
18:10
🔗
|
ivan` |
yeah |
18:10
🔗
|
HCross |
but wait until the drive is in your hand before |
18:13
🔗
|
ivan` |
I'm out $39 just for shipping back and forth |
18:13
🔗
|
ivan` |
last month I was out $28 for shipping smoke-filled PS3s back and forth |
18:13
🔗
|
ivan` |
it's good to be a shipping co |
18:14
🔗
|
HCross |
yeah, that does sound a tad expensive though. I sent a 2.5inch disk from London to LA for £14 the other month |
18:15
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
18:16
🔗
|
HCross |
Took less than 48 hours to reach LA, but then another 2 weeks to get through customs |
18:16
🔗
|
ivan` |
heh |
18:17
🔗
|
HCross |
Yep |
18:18
🔗
|
|
Start has joined #archiveteam-bs |
18:19
🔗
|
HCross |
something to do with sending HDDs from the EU being risky or something |
18:53
🔗
|
Yoshimura |
Yeah, if it goes air or ship ... air means radiation from cosmos. |
18:54
🔗
|
Yoshimura |
Ship might be ok but slow, but temperatures. |
18:54
🔗
|
Yoshimura |
Transport over Wire with special purpose application and protocol (scientists have that and they are free or oss) over UDP work sbest. |
19:00
🔗
|
Kazzy |
no, probably more like the contents of the drive |
19:00
🔗
|
Kazzy |
pretty sure they're fine with the whole air travel bit in general |
19:02
🔗
|
HCross |
it was empty too |
19:03
🔗
|
Yoshimura |
Kazzy: Cosmic radiation = damaging the bits on the magnetic surface? |
19:04
🔗
|
Kazzy |
Yoshimura: wrap it in tin foil, that blocks all the rads |
19:05
🔗
|
Yoshimura |
Nope. |
19:05
🔗
|
xmc |
lead foil |
19:05
🔗
|
Kazzy |
i don't lose my bits when i go on a plane, why does a metal thing |
19:06
🔗
|
Yoshimura |
Density. |
19:06
🔗
|
* |
bwn_ makes a foil hat |
19:06
🔗
|
Yoshimura |
Also cosmic radiation is fast as hell shielding does not work much. |
19:06
🔗
|
Kazzy |
wait that's rude |
19:09
🔗
|
arkiver |
Eelectrical field around the drive :D |
19:15
🔗
|
Kazzy |
Yoshimura: planes are fast as hell |
19:18
🔗
|
HCross |
^ nearly 11 hours from London to LA |
19:23
🔗
|
|
bwn_ has quit IRC (Read error: Operation timed out) |
19:27
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
19:32
🔗
|
Frogging |
Yoshimura: I don't see what the speed of the particles has to do with shielding |
19:32
🔗
|
Frogging |
things can and are shielded from cosmic rays, otherwise the satellites orbiting Earth would have issues |
19:33
🔗
|
Yoshimura |
Frogging: It goes through, only mountains help. Yeah, can but costly. |
19:33
🔗
|
Frogging |
pretty sure they don't have mountains in orbit |
19:33
🔗
|
Yoshimura |
Nope, but they got storage media resistent made for that |
19:33
🔗
|
Frogging |
they have shielding |
19:34
🔗
|
Frogging |
the microchips aren't special, they're just shielded |
19:36
🔗
|
Yoshimura |
Shield your disk and send it instead of upload then |
19:37
🔗
|
|
godane has joined #archiveteam-bs |
19:44
🔗
|
Frogging |
well, yes. that's what was being suggested. Your objection was that "the radiation is too fast so shielding doesn't work", remember? |
19:45
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
19:46
🔗
|
|
bwn_ has joined #archiveteam-bs |
19:47
🔗
|
|
tomwsmf-a has quit IRC (Read error: Operation timed out) |
20:13
🔗
|
|
Start has joined #archiveteam-bs |
20:24
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
20:30
🔗
|
|
powerKite has joined #archiveteam-bs |
20:33
🔗
|
* |
zino is trying to remember what free forum hosting service his lost forum was on. |
20:35
🔗
|
xmc |
invisionfree? |
20:36
🔗
|
powerKite |
I think the worst thing about archiving an ARG |
20:36
🔗
|
powerKite |
is that you end up having to ***DO THE PUZZLES AGAIN*** to find out what you need to archive |
20:37
🔗
|
zino |
Heh. |
20:38
🔗
|
zino |
xmc: I think the domain for the forum contained "easyforum.com or something. |
20:40
🔗
|
joepie91 |
forumotion? |
20:40
🔗
|
zino |
Hmm. Nope. |
20:41
🔗
|
powerKite |
anyway, is there a Megaswf archive I just don't know about or somthing? |
20:41
🔗
|
powerKite |
or am I just fucked in regards to getting those SWFs |
20:46
🔗
|
powerKite |
judging by the lake of responses, it's probably the latter |
20:47
🔗
|
zino |
Quite possibly |
20:48
🔗
|
|
Medowar has quit IRC (Quit: Connection closed for inactivity) |
20:49
🔗
|
Atluxity |
:P |
21:00
🔗
|
|
powerKite has quit IRC (Quit: Page closed) |
21:10
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
22:17
🔗
|
JW_work1 |
https://twitter.com/textfiles/status/722530539006214146 |
22:19
🔗
|
|
ErkDog has quit IRC (Read error: Operation timed out) |
22:20
🔗
|
zino |
JW_work1: Great! I choose to belive that it was my retweet that did the differance... |
22:21
🔗
|
JW_work1 |
I'm just curious what damage, if any, there will be to it. |
22:21
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
22:21
🔗
|
JW_work1 |
Hopefully if there's damage to the paint job, they can get the artist to fix it |
22:23
🔗
|
Yoshimura |
Is there anywhere a picture of the van? |
22:24
🔗
|
Kazzy |
https://twitter.com/textfiles/status/722094405931397121 |
22:24
🔗
|
Yoshimura |
Thanks ;) |
22:26
🔗
|
|
ErkDog has joined #archiveteam-bs |
22:32
🔗
|
Yoshimura |
Ok, pipeline, would like to run one. |
22:34
🔗
|
Kazzy |
vbox + https://archive.org/download/archiveteam-warrior/archiveteam-warrior-v2-20121008.ova |
22:34
🔗
|
Yoshimura |
Who could help or provide more info, would be glad. I did not care till now, when apparently pipes are loaded, stalling etc. If they all work there would be enough BW. |
22:34
🔗
|
Yoshimura |
Kazzy: Archivebot :P Alerady running all projects on warrior simultaneously at concurrency 6 (which IRL is lower thanks to lack of work) |
22:35
🔗
|
Kazzy |
archivebot is a ton more involved |
22:35
🔗
|
Kazzy |
basically don't bother even trying unless you can provide 50/50 (ideally 100/100 line) for 2-3 months minimum at 100% uptime, guaranteed with no filtering |
22:36
🔗
|
Kazzy |
if you pass all that, proceed to https://github.com/ArchiveTeam/ArchiveBot/blob/master/INSTALL.pipeline |
22:37
🔗
|
Yoshimura |
Kazzy: 100/100 |
22:37
🔗
|
Yoshimura |
Atm, at least once. |
22:38
🔗
|
Yoshimura |
Not even SLAs have 100%, but 99.9 |
22:39
🔗
|
Yoshimura |
And filtering is needed, and used almost everywhere, people just pretend to think its not (IPS, IDS) |
22:39
🔗
|
godane |
SketchCow: we are up to 2008-07-05 with funny or die archive videos |
22:40
🔗
|
Yoshimura |
But the providers do it, to lower DDoS, while retaining the real bandwidth, plus residual DoS. |
22:49
🔗
|
Yoshimura |
If you want your pipeline to only handle !ao/!archiveonly jobs, run it with the AO_ONLY environment variable set. |
22:50
🔗
|
Yoshimura |
Sounds like a job for me, starting small. Sounds great. |
22:56
🔗
|
|
Honno has quit IRC (Quit: Leaving) |
23:23
🔗
|
|
Rickster has quit IRC (Ping timeout: 260 seconds) |
23:34
🔗
|
|
Rickster has joined #archiveteam-bs |
23:38
🔗
|
|
VADemon has quit IRC (Quit: left4dead) |
23:50
🔗
|
|
JesseW has joined #archiveteam-bs |